add_dates
This is an old revision of the document!
Table of Contents
Add dates
with pandas; multiple dates
Given
In [1]: import pandas as pd df = pd.DataFrame({'begin': ['20201107', '20201121'], 'offset': [5, -6]}) df Out[1]: begin offset 0 20201107 5 1 20201121 -6 In [2]: df.dtypes Out[2]: begin object offset int64 dtype: object
we want
begin offset end 0 20201107 5 20201112 1 20201121 -6 20201115
where end date = begin date + offset
Solution:
In [3]: fmt='%Y%m%d' from datetime import datetime df['end'] = (df['begin'].apply(lambda x: datetime.strptime(x, fmt)) + pd.to_timedelta(df['offset'], 'd')).dt.strftime(fmt) df Out[3]: begin offset end 0 20201107 5 20201112 1 20201121 -6 20201115
Intermediate steps:
In [4]: df['begin'].apply(lambda x: datetime.strptime(x, fmt)) Out[4]: 0 2020-11-07 1 2020-11-21 Name: begin, dtype: datetime64[ns] In [5]: df['begin'].apply(lambda x: datetime.strptime(x, fmt)) + pd.to_timedelta(df['offset'], 'd') Out[5]: 0 2020-11-12 1 2020-11-15 dtype: datetime64[ns] In [6]: (df['begin'].apply(lambda x: datetime.strptime(x, fmt)) + pd.to_timedelta(df['offset'], 'd')).dt.strftime(fmt) Out[6]: 0 20201112 1 20201115 dtype: object
without pandas; single date
For a single date, we can do this without pandas also
In [1]: from datetime import datetime, timedelta print(datetime.now()) 2020-11-21 23:54:10.971599 In [2]: print(datetime.now() + timedelta(days=10)) 2020-12-01 23:54:32.393776
Related links
offset calculation
If the offsets are computed from floating point numbers that were originally derived from integers, watch out for round off errors.
For example, let's say:
- There are two applications App1 and App2 where App1 provides input to App2
- App1 converts the number of days (an integer value) into number of years (a floating point number) and rounds it to 9 digits and gives that to App2.
In [1]: days_original = [158, 928, 882, 341, 596, 878, 526] years_given = [round(x/365.0, 9) for x in days_original] print(years_given) [0.432876712, 2.542465753, 2.416438356, 0.934246575, 1.632876712, 2.405479452, 1.44109589]
Now if App2 tries to compute the offsets by multiplying with 365 and converting back to integers, it will give wrong results (due to limitations in floating point representation)
In [2]: wrong_offsets = [int(x*365.0) for x in years_given] print(wrong_offsets) [157, 927, 881, 340, 595, 877, 525]
The correct approach is to multiply by 365, apply round() and convert that into integers.
In [3]: correct_offsets = [int(round(x*365.0)) for x in years_given] print(correct_offsets) [158, 928, 882, 341, 596, 878, 526]
add_dates.1607020404.txt.gz · Last modified: 2020/12/03 18:33 by raju