task_boiler
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
task_boiler [2022/03/18 23:29] – [convert a dictionary of dataframes to a big dataframe] raju | task_boiler [2024/01/23 22:55] (current) – raju | ||
---|---|---|---|
Line 4: | Line 4: | ||
Start the first heading with " | Start the first heading with " | ||
- | ===== Convert a dictionary of dataframes to a big dataframe ===== | ||
- | ==== Task ==== | ||
- | Given a dictionary strings to dataframes, create an expanded dataframe by putting the keys into their own column. For example, given | ||
- | < | ||
- | {' | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | ' | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | </ | ||
- | We want | ||
- | < | ||
- | ticker | ||
- | 0 | ||
- | 1 | ||
- | 2 | ||
- | 3 | ||
- | 4 | ||
- | 5 | ||
- | 6 | ||
- | 7 | ||
- | </ | ||
- | |||
- | ==== Solution ==== | ||
- | < | ||
- | def dict_df_to_df(dict_df, | ||
- | df = ( | ||
- | pd.concat(dict_df, | ||
- | .reset_index()\ | ||
- | .drop(' | ||
- | .rename({' | ||
- | ) | ||
- | return df | ||
- | </ | ||
- | Using the example above | ||
- | < | ||
- | $ ipython | ||
- | |||
- | In [1]: | ||
- | import pandas as pd | ||
- | costco_earnings = pd.DataFrame({ | ||
- | ' | ||
- | ' | ||
- | }) | ||
- | costco_earnings | ||
- | Out[1]: | ||
- | | ||
- | 0 202102 | ||
- | 1 202105 | ||
- | 2 202108 | ||
- | 3 202111 | ||
- | |||
- | In [2]: | ||
- | copart_earnings = pd.DataFrame({ | ||
- | ' | ||
- | ' | ||
- | }) | ||
- | copart_earnings | ||
- | Out[2]: | ||
- | | ||
- | 0 202104 | ||
- | 1 202107 | ||
- | 2 202110 | ||
- | 3 202201 | ||
- | |||
- | In [3]: | ||
- | dict_df = {' | ||
- | dict_df | ||
- | Out[3]: | ||
- | {' | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | ' | ||
- | | ||
- | | ||
- | | ||
- | | ||
- | </ | ||
- | |||
- | < | ||
- | def dict_df_to_df(dict_df, | ||
- | df = ( | ||
- | pd.concat(dict_df, | ||
- | .reset_index()\ | ||
- | .drop(' | ||
- | .rename({' | ||
- | ) | ||
- | return df | ||
- | </ | ||
- | |||
- | < | ||
- | In [5]: | ||
- | expanded_df = dict_df_to_df(dict_df, | ||
- | expanded_df | ||
- | Out[5]: | ||
- | ticker | ||
- | 0 | ||
- | 1 | ||
- | 2 | ||
- | 3 | ||
- | 4 | ||
- | 5 | ||
- | 6 | ||
- | 7 | ||
- | </ | ||
- | |||
- | To see how it works | ||
- | < | ||
- | In [6]: | ||
- | pd.concat(dict_df, | ||
- | Out[6]: | ||
- | fiscal_quarter_end | ||
- | COST 0 202102 | ||
- | | ||
- | | ||
- | | ||
- | CPRT 0 202104 | ||
- | | ||
- | | ||
- | | ||
- | |||
- | In [7]: | ||
- | pd.concat(dict_df, | ||
- | .reset_index() | ||
- | Out[7]: | ||
- | level_0 | ||
- | 0 COST 0 202102 | ||
- | 1 COST 1 202105 | ||
- | 2 COST 2 202108 | ||
- | 3 COST 3 202111 | ||
- | 4 CPRT 0 202104 | ||
- | 5 CPRT 1 202107 | ||
- | 6 CPRT 2 202110 | ||
- | 7 CPRT 3 202201 | ||
- | |||
- | In [8]: | ||
- | pd.concat(dict_df, | ||
- | .reset_index()\ | ||
- | .drop(' | ||
- | Out[8]: | ||
- | level_0 | ||
- | 0 COST 202102 | ||
- | 1 COST 202105 | ||
- | 2 COST 202108 | ||
- | 3 COST 202111 | ||
- | 4 CPRT 202104 | ||
- | 5 CPRT 202107 | ||
- | 6 CPRT 202110 | ||
- | 7 CPRT 202201 | ||
- | |||
- | In [9]: | ||
- | pd.concat(dict_df, | ||
- | .reset_index()\ | ||
- | .drop(' | ||
- | .rename({' | ||
- | Out[9]: | ||
- | ticker | ||
- | 0 | ||
- | 1 | ||
- | 2 | ||
- | 3 | ||
- | 4 | ||
- | 5 | ||
- | 6 | ||
- | 7 | ||
- | </ |
task_boiler.1647646159.txt.gz · Last modified: 2022/03/18 23:29 by raju