task_boiler
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
task_boiler [2022/03/18 23:29] – [convert a dictionary of dataframes to a big dataframe] raju | task_boiler [2024/01/22 22:59] – [Issue] raju | ||
---|---|---|---|
Line 4: | Line 4: | ||
Start the first heading with " | Start the first heading with " | ||
- | ===== Convert a dictionary of dataframes to a big dataframe | + | ===== blocked by barrons |
- | ==== Task ==== | + | ==== Issue ==== |
- | Given a dictionary strings to dataframes, create an expanded dataframe by putting | + | barrons.com is giving |
< | < | ||
- | {' | + | You have been blocked. |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | ' | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | </ | + | |
- | We want | + | |
- | < | + | |
- | ticker | + | |
- | 0 | + | |
- | 1 | + | |
- | 2 | + | |
- | 3 | + | |
- | 4 | + | |
- | 5 | + | |
- | 6 | + | |
- | 7 | + | |
- | </ | + | |
- | ==== Solution ==== | + | Why? |
- | < | + | Something about the behaviour of the browser has caught our |
- | def dict_df_to_df(dict_df, | + | attention. |
- | df = ( | + | |
- | pd.concat(dict_df, | + | |
- | .reset_index()\ | + | |
- | .drop(' | + | |
- | .rename({' | + | |
- | ) | + | |
- | return df | + | |
- | </ | + | |
- | Using the example above | + | |
- | < | + | |
- | $ ipython | + | |
- | In [1]: | + | There are various possible explanations for this: |
- | import pandas as pd | + | * You are browsing and clicking at a speed much faster than |
- | costco_earnings = pd.DataFrame({ | + | |
- | ' | + | * Something is preventing JavaScript from working on your |
- | | + | |
- | }) | + | * There is a robot on the same network (IP WW.XX.YY.ZZ) as you. |
- | costco_earnings | + | |
- | Out[1]: | + | |
- | | + | |
- | 0 202102 | + | |
- | 1 202105 | + | |
- | 2 202108 | + | |
- | 3 202111 | + | |
- | In [2]: | + | Having problems accessing the site? Submit feedback. |
- | copart_earnings = pd.DataFrame({ | + | ID: a-b-c-d-e |
- | ' | + | |
- | ' | + | |
- | }) | + | |
- | copart_earnings | + | |
- | Out[2]: | + | |
- | | + | |
- | 0 202104 | + | |
- | 1 202107 | + | |
- | 2 202110 | + | |
- | 3 202201 | + | |
- | + | ||
- | In [3]: | + | |
- | dict_df = {' | + | |
- | dict_df | + | |
- | Out[3]: | + | |
- | {' | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | ' | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
</ | </ | ||
- | < | + | ==== Solution ==== |
- | def dict_df_to_df(dict_df, | + | Click on the extensions button in the top right |
- | df = ( | + | -> Bypass Paywalls Clean -> settings button -> Options |
- | | + | -> scroll down to "BPC settings" |
- | | + | -> check " |
- | .drop('level_1', | + | -> Save |
- | .rename({' | + | |
- | ) | + | |
- | return df | + | |
- | </code> | + | |
- | < | + | ==== System Information ==== |
- | In [5]: | + | * bypass paywalls clean extension version: 3.5.2.1 (latest asof 2024-01-22) |
- | expanded_df = dict_df_to_df(dict_df, ' | + | * google chrome: 120.0.6099.200 |
- | expanded_df | + | |
- | Out[5]: | + | |
- | | + | |
- | 0 | + | |
- | 1 | + | |
- | 2 | + | |
- | 3 | + | |
- | 4 | + | |
- | 5 | + | |
- | 6 | + | |
- | 7 | + | |
- | </ | + | |
- | To see how it works | ||
- | < | ||
- | In [6]: | ||
- | pd.concat(dict_df, | ||
- | Out[6]: | ||
- | fiscal_quarter_end | ||
- | COST 0 202102 | ||
- | | ||
- | | ||
- | | ||
- | CPRT 0 202104 | ||
- | | ||
- | | ||
- | | ||
- | |||
- | In [7]: | ||
- | pd.concat(dict_df, | ||
- | .reset_index() | ||
- | Out[7]: | ||
- | level_0 | ||
- | 0 COST 0 202102 | ||
- | 1 COST 1 202105 | ||
- | 2 COST 2 202108 | ||
- | 3 COST 3 202111 | ||
- | 4 CPRT 0 202104 | ||
- | 5 CPRT 1 202107 | ||
- | 6 CPRT 2 202110 | ||
- | 7 CPRT 3 202201 | ||
- | |||
- | In [8]: | ||
- | pd.concat(dict_df, | ||
- | .reset_index()\ | ||
- | .drop(' | ||
- | Out[8]: | ||
- | level_0 | ||
- | 0 COST 202102 | ||
- | 1 COST 202105 | ||
- | 2 COST 202108 | ||
- | 3 COST 202111 | ||
- | 4 CPRT 202104 | ||
- | 5 CPRT 202107 | ||
- | 6 CPRT 202110 | ||
- | 7 CPRT 202201 | ||
- | |||
- | In [9]: | ||
- | pd.concat(dict_df, | ||
- | .reset_index()\ | ||
- | .drop(' | ||
- | .rename({' | ||
- | Out[9]: | ||
- | ticker | ||
- | 0 | ||
- | 1 | ||
- | 2 | ||
- | 3 | ||
- | 4 | ||
- | 5 | ||
- | 6 | ||
- | 7 | ||
- | </ |
task_boiler.txt · Last modified: 2024/01/23 22:55 by raju