add_dates
Table of Contents
Add dates
single start date; multiple offsets; with pandas
prepare the input
In [1]: start_yyyymmdd = '20201203' fmt = '%Y%m%d' from datetime import datetime start_date = datetime.strptime(start_yyyymmdd, fmt) In [2]: print(start_date) 2020-12-03 00:00:00 In [3]: import pandas as pd days_original = [158, 928, 882, 341, 596, 878, 526] offset_years = [round(x/365.0, 9) for x in days_original] df = pd.DataFrame({'offset_years': offset_years}) In [4]: print(df) offset_years 0 0.432877 1 2.542466 2 2.416438 3 0.934247 4 1.632877 5 2.405479 6 1.441096
Convert the offset to days
In [5]: import numpy as np df['offset_days'] = (df['offset_years']*365.0).round().astype(np.int) df Out[5]: offset_years offset_days 0 0.432877 158 1 2.542466 928 2 2.416438 882 3 0.934247 341 4 1.632877 596 5 2.405479 878 6 1.441096 526
Apply the offset to get end dates
In [6]: df['end_yyyymmdd'] = (start_date + pd.to_timedelta(df['offset_days'], 'd')).dt.strftime(fmt) df Out[6]: offset_years offset_days end_yyyymmdd 0 0.432877 158 20210510 1 2.542466 928 20230619 2 2.416438 882 20230504 3 0.934247 341 20211109 4 1.632877 596 20220722 5 2.405479 878 20230430 6 1.441096 526 20220513
Used | Python 3.8.5, ipython 7.18.1, pandas 1.1.3, numpy 1.19.2
demonstrates | round float to int
multiple dates; different offsets; with pandas
Given
In [1]: import pandas as pd df = pd.DataFrame({'begin': ['20201107', '20201121'], 'offset': [5, -6]}) df Out[1]: begin offset 0 20201107 5 1 20201121 -6 In [2]: df.dtypes Out[2]: begin object offset int64 dtype: object
we want
begin offset end 0 20201107 5 20201112 1 20201121 -6 20201115
where end date = begin date + offset
Solution:
In [3]: fmt='%Y%m%d' from datetime import datetime df['end'] = (df['begin'].apply(lambda x: datetime.strptime(x, fmt)) + pd.to_timedelta(df['offset'], 'd')).dt.strftime(fmt) df Out[3]: begin offset end 0 20201107 5 20201112 1 20201121 -6 20201115
Intermediate steps:
In [4]: df['begin'].apply(lambda x: datetime.strptime(x, fmt)) Out[4]: 0 2020-11-07 1 2020-11-21 Name: begin, dtype: datetime64[ns] In [5]: df['begin'].apply(lambda x: datetime.strptime(x, fmt)) + pd.to_timedelta(df['offset'], 'd') Out[5]: 0 2020-11-12 1 2020-11-15 dtype: datetime64[ns] In [6]: (df['begin'].apply(lambda x: datetime.strptime(x, fmt)) + pd.to_timedelta(df['offset'], 'd')).dt.strftime(fmt) Out[6]: 0 20201112 1 20201115 dtype: object
single date; single offset; without pandas
In [1]: from datetime import datetime, timedelta print(datetime.now()) 2020-11-21 23:54:10.971599 In [2]: print(datetime.now() + timedelta(days=10)) 2020-12-01 23:54:32.393776
Related links
offset calculation
If the offsets are computed from floating point numbers that were originally derived from integers, watch out for round off errors.
For example, let's say:
- There are two applications App1 and App2
- App1 converts the number of days (an integer value) into number of years (a floating point number) and rounds it to 9 digits
- App2 reads that as input
In [1]: days_original = [158, 928, 882, 341, 596, 878, 526] years_given = [round(x/365.0, 9) for x in days_original] print(years_given) [0.432876712, 2.542465753, 2.416438356, 0.934246575, 1.632876712, 2.405479452, 1.44109589]
Now if App2 tries to compute the offsets by multiplying with 365 and converting that back to integer, it will give wrong results (due to limitations in floating point representation).
In [2]: wrong_offsets = [int(x*365.0) for x in years_given] print(wrong_offsets) [157, 927, 881, 340, 595, 877, 525]
The correct approach is to multiply by 365, apply round() and then convert it back to integer.
In [3]: correct_offsets = [int(round(x*365.0)) for x in years_given] print(correct_offsets) [158, 928, 882, 341, 596, 878, 526]
add_dates.txt · Last modified: 2021/01/13 14:41 by raju