pandas_dataframe
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
pandas_dataframe [2023/03/13 15:59] – [convert column names to lower case] admin | pandas_dataframe [2023/07/21 22:16] – [Create a dataframe from list of lists] raju | ||
---|---|---|---|
Line 59: | Line 59: | ||
* https:// | * https:// | ||
+ | tags | row by row | ||
==== Create a dataframe by splitting strings ==== | ==== Create a dataframe by splitting strings ==== | ||
Given a list of strings, the idea here is to create a data frame by splitting them into multiple columns. | Given a list of strings, the idea here is to create a data frame by splitting them into multiple columns. | ||
Line 114: | Line 115: | ||
also demonstrates | assign column names to a dataframe | also demonstrates | assign column names to a dataframe | ||
+ | |||
+ | ==== Create a dataframe from a series of lists ==== | ||
+ | tags | convert series with lists to dataframe | ||
+ | < | ||
+ | df = pd.DataFrame(s.to_list()) | ||
+ | </ | ||
+ | For example | ||
+ | < | ||
+ | In [1]: | ||
+ | import pandas as pd | ||
+ | s = pd.Series([[1, | ||
+ | s | ||
+ | Out[1]: | ||
+ | 0 [1, 2, 3] | ||
+ | 1 [4, 5, 6, 7] | ||
+ | 2 [8, 9] | ||
+ | dtype: object | ||
+ | |||
+ | In [2]: | ||
+ | df = pd.DataFrame(s.to_list()) | ||
+ | df | ||
+ | Out[2]: | ||
+ | | ||
+ | 0 1 2 3.0 NaN | ||
+ | 1 4 5 6.0 7.0 | ||
+ | 2 8 9 NaN NaN | ||
+ | </ | ||
+ | |||
+ | If the number of elements in each list is same, np.vstack() can be used but otherwise it will not work. For example | ||
+ | < | ||
+ | In [5]: | ||
+ | s | ||
+ | Out[5]: | ||
+ | 0 [1, 2, 3] | ||
+ | 1 [4, 5, 6, 7] | ||
+ | 2 [8, 9] | ||
+ | dtype: object | ||
+ | |||
+ | In [6]: | ||
+ | import numpy as np | ||
+ | df = pd.DataFrame(np.vstack(s)) | ||
+ | --------------------------------------------------------------------------- | ||
+ | ValueError | ||
+ | Cell In[6], line 2 | ||
+ | 1 import numpy as np | ||
+ | ----> 2 df = pd.DataFrame(np.vstack(s)) | ||
+ | |||
+ | File < | ||
+ | |||
+ | File ~\AppData\Local\conda\conda\envs\py311\Lib\site-packages\numpy\core\shape_base.py: | ||
+ | 294 if not isinstance(arrs, | ||
+ | 295 arrs = [arrs] | ||
+ | --> 296 return _nx.concatenate(arrs, | ||
+ | |||
+ | File < | ||
+ | |||
+ | ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4 | ||
+ | </ | ||
+ | |||
+ | But with | ||
+ | < | ||
+ | In [10]: | ||
+ | s = pd.Series([[1, | ||
+ | s | ||
+ | Out[10]: | ||
+ | 0 [1, 2, 3] | ||
+ | 1 [4, 5, 6] | ||
+ | dtype: object | ||
+ | |||
+ | In [11]: | ||
+ | import numpy as np | ||
+ | df = pd.DataFrame(np.vstack(s)) | ||
+ | df | ||
+ | Out[11]: | ||
+ | | ||
+ | 0 1 2 3 | ||
+ | 1 4 5 6 | ||
+ | |||
+ | In [12]: | ||
+ | df = pd.DataFrame(s.to_list()) | ||
+ | df | ||
+ | Out[12]: | ||
+ | | ||
+ | 0 1 2 3 | ||
+ | 1 4 5 6 | ||
+ | </ | ||
+ | |||
+ | See also: | ||
+ | * https:// | ||
+ | |||
+ | ==== Create a dataframe from a bunch of variables ==== | ||
+ | < | ||
+ | import pandas as pd | ||
+ | df = pd.DataFrame({ | ||
+ | ' | ||
+ | ' | ||
+ | }) | ||
+ | </ | ||
+ | |||
+ | For example | ||
+ | < | ||
+ | $ ipython | ||
+ | |||
+ | In [1]: | ||
+ | year = 2023; month = 6; date = 15 | ||
+ | |||
+ | In [2]: | ||
+ | import pandas as pd | ||
+ | df = pd.DataFrame({ | ||
+ | ' | ||
+ | ' | ||
+ | }) | ||
+ | |||
+ | In [3]: | ||
+ | df | ||
+ | Out[3]: | ||
+ | | ||
+ | 0 | ||
+ | 1 month 6 | ||
+ | 2 | ||
+ | |||
+ | In [4]: | ||
+ | df.dtypes | ||
+ | Out[4]: | ||
+ | key object | ||
+ | value int64 | ||
+ | dtype: object | ||
+ | </ | ||
+ | |||
+ | It works even if the variables are not of the same type. | ||
+ | < | ||
+ | In [5]: | ||
+ | year = 2023; month = ' | ||
+ | |||
+ | In [6]: | ||
+ | df = pd.DataFrame({ | ||
+ | ' | ||
+ | ' | ||
+ | }) | ||
+ | |||
+ | In [7]: | ||
+ | df | ||
+ | Out[7]: | ||
+ | key value | ||
+ | 0 | ||
+ | 1 month June | ||
+ | 2 | ||
+ | |||
+ | In [8]: | ||
+ | df.dtypes | ||
+ | Out[8]: | ||
+ | key object | ||
+ | value object | ||
+ | dtype: object | ||
+ | </ | ||
+ | |||
+ | Tested with Python 3.11.3, IPython 8.12.0 | ||
===== selection related ===== | ===== selection related ===== | ||
Line 193: | Line 351: | ||
[[ matches to coplays | Solution ]] | [[ matches to coplays | Solution ]] | ||
- | ====== difference between ===== | + | ===== difference between ===== |
* [[astype vs. to_numeric]] | * [[astype vs. to_numeric]] | ||
Line 199: | Line 357: | ||
===== convert stuff ===== | ===== convert stuff ===== | ||
- | ==== convert column names to lower case ==== | + | ==== convert column names to lower case and replace space with underscore |
- | Approach 1: | + | |
< | < | ||
- | df.columns = df.columns.str.lower() | + | df.columns = df.columns.str.lower().str.replace(' |
</ | </ | ||
Line 225: | Line 382: | ||
In [2]: | In [2]: | ||
- | df.columns = df.columns.str.lower() | + | df.columns = df.columns.str.lower().str.replace(' |
In [3]: | In [3]: | ||
df | df | ||
Out[3]: | Out[3]: | ||
- | | + | |
0 2023-03-02 | 0 2023-03-02 | ||
1 2022-12-08 | 1 2022-12-08 | ||
+ | </ | ||
+ | |||
+ | ==== convert column names to lower case ==== | ||
+ | Approach 1: | ||
+ | < | ||
+ | df.columns = df.columns.str.lower() | ||
</ | </ | ||
Line 240: | Line 403: | ||
</ | </ | ||
- | Example: | + | Notes: |
+ | * I prefer approach1 - simple syntax, easy to remember. | ||
+ | |||
+ | Use case: While merging data from two data frames using DataFrame.merge(), | ||
+ | |||
+ | Example | ||
+ | < | ||
+ | $ ipython | ||
+ | Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)] | ||
+ | Type ' | ||
+ | IPython 8.8.0 -- An enhanced Interactive Python. Type '?' | ||
+ | |||
+ | In [1]: | ||
+ | import pandas as pd | ||
+ | df = pd.DataFrame( | ||
+ | [[' | ||
+ | | ||
+ | columns = [" | ||
+ | df | ||
+ | Out[1]: | ||
+ | Announcement Date Fiscal Quarter End Estimated EPS Actual EPS | ||
+ | 0 2023-03-02 | ||
+ | 1 2022-12-08 | ||
+ | |||
+ | In [2]: | ||
+ | df.columns = df.columns.str.lower() | ||
+ | |||
+ | In [3]: | ||
+ | df | ||
+ | Out[3]: | ||
+ | announcement date fiscal quarter end estimated eps actual eps | ||
+ | 0 2023-03-02 | ||
+ | 1 2022-12-08 | ||
+ | </ | ||
+ | |||
+ | Example (using approach 2): | ||
< | < | ||
$ ipython | $ ipython | ||
Line 269: | Line 467: | ||
</ | </ | ||
- | Use case: While merging data from two data frames using DataFrame.merge(), | ||
==== convert all categorical columns to string columns ==== | ==== convert all categorical columns to string columns ==== |
pandas_dataframe.txt · Last modified: 2024/05/22 15:47 by admin