User Tools

Site Tools


task_boiler

This is an old revision of the document!


task boiler

Place to cook the tasks. Once they are boiled enough, move them to a separate plate.

Start the first heading with “=====”

convert a dictionary of dataframes to a big dataframe

Task

Given a dictionary strings to dataframes, create an expanded dataframe by putting the keys into their own column. For example, given

{'COST':    fiscal_quarter_end  reported_date
 0              202102       20210304
 1              202105       20210527
 2              202108       20210923
 3              202111       20211209,
 'CPRT':    fiscal_quarter_end  reported_date
 0              202104       20210519
 1              202107       20210908
 2              202110       20211117
 3              202201       20220216}

We want

  ticker  fiscal_quarter_end  reported_date
0   COST              202102       20210304
1   COST              202105       20210527
2   COST              202108       20210923
3   COST              202111       20211209
4   CPRT              202104       20210519
5   CPRT              202107       20210908
6   CPRT              202110       20211117
7   CPRT              202201       20220216

Solution

def dict_df_to_df(dict_df, key_col_name):
    df = (
        pd.concat(dict_df, axis=0)\
        .reset_index()\
        .drop('level_1', axis=1)\
        .rename({'level_0':key_col_name}, axis=1)
    )
    return df

Using the example above

$ ipython

In [1]:
import pandas as pd
costco_earnings = pd.DataFrame({
  'fiscal_quarter_end': [202102, 202105, 202108, 202111],
  'reported_date': [20210304, 20210527, 20210923, 20211209]
})
costco_earnings
Out[1]:
   fiscal_quarter_end  reported_date
0              202102       20210304
1              202105       20210527
2              202108       20210923
3              202111       20211209

In [2]:
copart_earnings = pd.DataFrame({
  'fiscal_quarter_end': [202104, 202107, 202110, 202201],
  'reported_date': [20210519, 20210908, 20211117, 20220216]
})
copart_earnings
Out[2]:
   fiscal_quarter_end  reported_date
0              202104       20210519
1              202107       20210908
2              202110       20211117
3              202201       20220216

In [3]:
dict_df = {'COST': costco_earnings, 'CPRT': copart_earnings}
dict_df
Out[3]:
{'COST':    fiscal_quarter_end  reported_date
 0              202102       20210304
 1              202105       20210527
 2              202108       20210923
 3              202111       20211209,
 'CPRT':    fiscal_quarter_end  reported_date
 0              202104       20210519
 1              202107       20210908
 2              202110       20211117
 3              202201       20220216}
def dict_df_to_df(dict_df, key_col_name):
    df = (
        pd.concat(dict_df, axis=0)\
        .reset_index()\
        .drop('level_1', axis=1)\
        .rename({'level_0':key_col_name}, axis=1)
    )
    return df
In [5]:
expanded_df = dict_df_to_df(dict_df, 'ticker')
expanded_df
Out[5]:
  ticker  fiscal_quarter_end  reported_date
0   COST              202102       20210304
1   COST              202105       20210527
2   COST              202108       20210923
3   COST              202111       20211209
4   CPRT              202104       20210519
5   CPRT              202107       20210908
6   CPRT              202110       20211117
7   CPRT              202201       20220216

To see how it works

In [6]:
pd.concat(dict_df, axis=0)
Out[6]:
        fiscal_quarter_end  reported_date
COST 0              202102       20210304
     1              202105       20210527
     2              202108       20210923
     3              202111       20211209
CPRT 0              202104       20210519
     1              202107       20210908
     2              202110       20211117
     3              202201       20220216

In [7]:
pd.concat(dict_df, axis=0)\
.reset_index()
Out[7]:
  level_0  level_1  fiscal_quarter_end  reported_date
0    COST        0              202102       20210304
1    COST        1              202105       20210527
2    COST        2              202108       20210923
3    COST        3              202111       20211209
4    CPRT        0              202104       20210519
5    CPRT        1              202107       20210908
6    CPRT        2              202110       20211117
7    CPRT        3              202201       20220216

In [8]:
pd.concat(dict_df, axis=0)\
.reset_index()\
.drop('level_1', axis=1)
Out[8]:
  level_0  fiscal_quarter_end  reported_date
0    COST              202102       20210304
1    COST              202105       20210527
2    COST              202108       20210923
3    COST              202111       20211209
4    CPRT              202104       20210519
5    CPRT              202107       20210908
6    CPRT              202110       20211117
7    CPRT              202201       20220216

In [9]:
pd.concat(dict_df, axis=0)\
.reset_index()\
.drop('level_1', axis=1)\
.rename({'level_0':'ticker'}, axis=1)
Out[9]:
  ticker  fiscal_quarter_end  reported_date
0   COST              202102       20210304
1   COST              202105       20210527
2   COST              202108       20210923
3   COST              202111       20211209
4   CPRT              202104       20210519
5   CPRT              202107       20210908
6   CPRT              202110       20211117
7   CPRT              202201       20220216
task_boiler.1647645978.txt.gz · Last modified: 2022/03/18 23:26 by raju