task_boiler
This is an old revision of the document!
Table of Contents
task boiler
Place to cook the tasks. Once they are boiled enough, move them to a separate plate.
Start the first heading with “=====”
convert a dictionary of dataframes to a big dataframe
Task
Given a dictionary strings to dataframes, create an expanded dataframe by putting the keys into their own column. For example, given
{'COST': fiscal_quarter_end reported_date 0 202102 20210304 1 202105 20210527 2 202108 20210923 3 202111 20211209, 'CPRT': fiscal_quarter_end reported_date 0 202104 20210519 1 202107 20210908 2 202110 20211117 3 202201 20220216}
We want
ticker fiscal_quarter_end reported_date 0 COST 202102 20210304 1 COST 202105 20210527 2 COST 202108 20210923 3 COST 202111 20211209 4 CPRT 202104 20210519 5 CPRT 202107 20210908 6 CPRT 202110 20211117 7 CPRT 202201 20220216
Solution
pd.concat(dict_df, axis=0).reset_index().drop('level_1', axis=1).rename({'level_0':'key_name'}, axis=1)
Using the example above
$ ipython Python 3.6.12 |Anaconda, Inc.| (default, Sep 9 2020, 00:29:25) [MSC v.1916 64 bit (AMD64)] Type 'copyright', 'credits' or 'license' for more information IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: import pandas as pd costco_earnings = pd.DataFrame({ 'fiscal_quarter_end': [202102, 202105, 202108, 202111], 'reported_date': [20210304, 20210527, 20210923, 20211209] }) costco_earnings Out[1]: fiscal_quarter_end reported_date 0 202102 20210304 1 202105 20210527 2 202108 20210923 3 202111 20211209 In [2]: copart_earnings = pd.DataFrame({ 'fiscal_quarter_end': [202104, 202107, 202110, 202201], 'reported_date': [20210519, 20210908, 20211117, 20220216] }) copart_earnings Out[2]: fiscal_quarter_end reported_date 0 202104 20210519 1 202107 20210908 2 202110 20211117 3 202201 20220216 In [3]: dict_df = {'COST': costco_earnings, 'CPRT': copart_earnings} dict_df Out[3]: {'COST': fiscal_quarter_end reported_date 0 202102 20210304 1 202105 20210527 2 202108 20210923 3 202111 20211209, 'CPRT': fiscal_quarter_end reported_date 0 202104 20210519 1 202107 20210908 2 202110 20211117 3 202201 20220216} In [4]: expanded_df = pd.concat(dict_df, axis=0).reset_index().drop('level_1', axis=1).rename({'level_0':'ticker'}, axis=1) expanded_df Out[4]: ticker fiscal_quarter_end reported_date 0 COST 202102 20210304 1 COST 202105 20210527 2 COST 202108 20210923 3 COST 202111 20211209 4 CPRT 202104 20210519 5 CPRT 202107 20210908 6 CPRT 202110 20211117 7 CPRT 202201 20220216
To see how it works
In [5]: pd.concat(dict_df, axis=0) Out[5]: fiscal_quarter_end reported_date COST 0 202102 20210304 1 202105 20210527 2 202108 20210923 3 202111 20211209 CPRT 0 202104 20210519 1 202107 20210908 2 202110 20211117 3 202201 20220216 In [6]: pd.concat(dict_df, axis=0).reset_index() Out[6]: level_0 level_1 fiscal_quarter_end reported_date 0 COST 0 202102 20210304 1 COST 1 202105 20210527 2 COST 2 202108 20210923 3 COST 3 202111 20211209 4 CPRT 0 202104 20210519 5 CPRT 1 202107 20210908 6 CPRT 2 202110 20211117 7 CPRT 3 202201 20220216 In [7]: pd.concat(dict_df, axis=0).reset_index().drop('level_1', axis=1) Out[7]: level_0 fiscal_quarter_end reported_date 0 COST 202102 20210304 1 COST 202105 20210527 2 COST 202108 20210923 3 COST 202111 20211209 4 CPRT 202104 20210519 5 CPRT 202107 20210908 6 CPRT 202110 20211117 7 CPRT 202201 20220216 In [8]: pd.concat(dict_df, axis=0).reset_index().drop('level_1', axis=1).rename({'level_0':'ticker'}, axis=1) Out[8]: ticker fiscal_quarter_end reported_date 0 COST 202102 20210304 1 COST 202105 20210527 2 COST 202108 20210923 3 COST 202111 20211209 4 CPRT 202104 20210519 5 CPRT 202107 20210908 6 CPRT 202110 20211117 7 CPRT 202201 20220216
task_boiler.1647644383.txt.gz · Last modified: 2022/03/18 22:59 by raju