User Tools

Site Tools


task_boiler

This is an old revision of the document!


task boiler

Place to cook the tasks. Once they are boiled enough, move them to a separate plate.

Start the first heading with “=====”

convert a dictionary of dataframes to a big dataframe

Task

Given a dictionary strings to dataframes, create an expanded dataframe by putting the keys into their own column. For example, given

{'COST':    fiscal_quarter_end  reported_date
 0              202102       20210304
 1              202105       20210527
 2              202108       20210923
 3              202111       20211209,
 'CPRT':    fiscal_quarter_end  reported_date
 0              202104       20210519
 1              202107       20210908
 2              202110       20211117
 3              202201       20220216}

We want

  ticker  fiscal_quarter_end  reported_date
0   COST              202102       20210304
1   COST              202105       20210527
2   COST              202108       20210923
3   COST              202111       20211209
4   CPRT              202104       20210519
5   CPRT              202107       20210908
6   CPRT              202110       20211117
7   CPRT              202201       20220216

Solution

pd.concat(dict_df, axis=0).reset_index().drop('level_1', axis=1).rename({'level_0':'key_name'}, axis=1)

Using the example above

$ ipython
Python 3.6.12 |Anaconda, Inc.| (default, Sep  9 2020, 00:29:25) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]:
import pandas as pd
costco_earnings = pd.DataFrame({
  'fiscal_quarter_end': [202102, 202105, 202108, 202111],
  'reported_date': [20210304, 20210527, 20210923, 20211209]
})
costco_earnings
Out[1]:
   fiscal_quarter_end  reported_date
0              202102       20210304
1              202105       20210527
2              202108       20210923
3              202111       20211209

In [2]:
copart_earnings = pd.DataFrame({
  'fiscal_quarter_end': [202104, 202107, 202110, 202201],
  'reported_date': [20210519, 20210908, 20211117, 20220216]
})
copart_earnings
Out[2]:
   fiscal_quarter_end  reported_date
0              202104       20210519
1              202107       20210908
2              202110       20211117
3              202201       20220216

In [3]:
dict_df = {'COST': costco_earnings, 'CPRT': copart_earnings}
dict_df
Out[3]:
{'COST':    fiscal_quarter_end  reported_date
 0              202102       20210304
 1              202105       20210527
 2              202108       20210923
 3              202111       20211209,
 'CPRT':    fiscal_quarter_end  reported_date
 0              202104       20210519
 1              202107       20210908
 2              202110       20211117
 3              202201       20220216}

In [4]:
expanded_df = pd.concat(dict_df, axis=0).reset_index().drop('level_1', axis=1).rename({'level_0':'ticker'}, axis=1)
expanded_df
Out[4]:
  ticker  fiscal_quarter_end  reported_date
0   COST              202102       20210304
1   COST              202105       20210527
2   COST              202108       20210923
3   COST              202111       20211209
4   CPRT              202104       20210519
5   CPRT              202107       20210908
6   CPRT              202110       20211117
7   CPRT              202201       20220216

To see how it works

In [5]:
pd.concat(dict_df, axis=0)
Out[5]:
        fiscal_quarter_end  reported_date
COST 0              202102       20210304
     1              202105       20210527
     2              202108       20210923
     3              202111       20211209
CPRT 0              202104       20210519
     1              202107       20210908
     2              202110       20211117
     3              202201       20220216

In [6]:
pd.concat(dict_df, axis=0).reset_index()
Out[6]:
  level_0  level_1  fiscal_quarter_end  reported_date
0    COST        0              202102       20210304
1    COST        1              202105       20210527
2    COST        2              202108       20210923
3    COST        3              202111       20211209
4    CPRT        0              202104       20210519
5    CPRT        1              202107       20210908
6    CPRT        2              202110       20211117
7    CPRT        3              202201       20220216

In [7]:
pd.concat(dict_df, axis=0).reset_index().drop('level_1', axis=1)
Out[7]:
  level_0  fiscal_quarter_end  reported_date
0    COST              202102       20210304
1    COST              202105       20210527
2    COST              202108       20210923
3    COST              202111       20211209
4    CPRT              202104       20210519
5    CPRT              202107       20210908
6    CPRT              202110       20211117
7    CPRT              202201       20220216

In [8]:
pd.concat(dict_df, axis=0).reset_index().drop('level_1', axis=1).rename({'level_0':'ticker'}, axis=1)
Out[8]:
  ticker  fiscal_quarter_end  reported_date
0   COST              202102       20210304
1   COST              202105       20210527
2   COST              202108       20210923
3   COST              202111       20211209
4   CPRT              202104       20210519
5   CPRT              202107       20210908
6   CPRT              202110       20211117
7   CPRT              202201       20220216
task_boiler.1647644383.txt.gz · Last modified: 2022/03/18 22:59 by raju