pandas_dataframe
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
pandas_dataframe [2023/05/05 21:00] – [Create a dataframe by splitting strings] admin | pandas_dataframe [2023/07/21 22:16] – [Create a dataframe from list of lists] raju | ||
---|---|---|---|
Line 59: | Line 59: | ||
* https:// | * https:// | ||
+ | tags | row by row | ||
==== Create a dataframe by splitting strings ==== | ==== Create a dataframe by splitting strings ==== | ||
Given a list of strings, the idea here is to create a data frame by splitting them into multiple columns. | Given a list of strings, the idea here is to create a data frame by splitting them into multiple columns. | ||
Line 117: | Line 118: | ||
==== Create a dataframe from a series of lists ==== | ==== Create a dataframe from a series of lists ==== | ||
tags | convert series with lists to dataframe | tags | convert series with lists to dataframe | ||
+ | < | ||
+ | df = pd.DataFrame(s.to_list()) | ||
+ | </ | ||
+ | For example | ||
< | < | ||
In [1]: | In [1]: | ||
Line 137: | Line 142: | ||
2 8 9 NaN NaN | 2 8 9 NaN NaN | ||
</ | </ | ||
+ | |||
+ | If the number of elements in each list is same, np.vstack() can be used but otherwise it will not work. For example | ||
+ | < | ||
+ | In [5]: | ||
+ | s | ||
+ | Out[5]: | ||
+ | 0 [1, 2, 3] | ||
+ | 1 [4, 5, 6, 7] | ||
+ | 2 [8, 9] | ||
+ | dtype: object | ||
+ | |||
+ | In [6]: | ||
+ | import numpy as np | ||
+ | df = pd.DataFrame(np.vstack(s)) | ||
+ | --------------------------------------------------------------------------- | ||
+ | ValueError | ||
+ | Cell In[6], line 2 | ||
+ | 1 import numpy as np | ||
+ | ----> 2 df = pd.DataFrame(np.vstack(s)) | ||
+ | |||
+ | File < | ||
+ | |||
+ | File ~\AppData\Local\conda\conda\envs\py311\Lib\site-packages\numpy\core\shape_base.py: | ||
+ | 294 if not isinstance(arrs, | ||
+ | 295 arrs = [arrs] | ||
+ | --> 296 return _nx.concatenate(arrs, | ||
+ | |||
+ | File < | ||
+ | |||
+ | ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4 | ||
+ | </ | ||
+ | |||
+ | But with | ||
+ | < | ||
+ | In [10]: | ||
+ | s = pd.Series([[1, | ||
+ | s | ||
+ | Out[10]: | ||
+ | 0 [1, 2, 3] | ||
+ | 1 [4, 5, 6] | ||
+ | dtype: object | ||
+ | |||
+ | In [11]: | ||
+ | import numpy as np | ||
+ | df = pd.DataFrame(np.vstack(s)) | ||
+ | df | ||
+ | Out[11]: | ||
+ | | ||
+ | 0 1 2 3 | ||
+ | 1 4 5 6 | ||
+ | |||
+ | In [12]: | ||
+ | df = pd.DataFrame(s.to_list()) | ||
+ | df | ||
+ | Out[12]: | ||
+ | | ||
+ | 0 1 2 3 | ||
+ | 1 4 5 6 | ||
+ | </ | ||
+ | |||
+ | See also: | ||
+ | * https:// | ||
+ | |||
+ | ==== Create a dataframe from a bunch of variables ==== | ||
+ | < | ||
+ | import pandas as pd | ||
+ | df = pd.DataFrame({ | ||
+ | ' | ||
+ | ' | ||
+ | }) | ||
+ | </ | ||
+ | |||
+ | For example | ||
+ | < | ||
+ | $ ipython | ||
+ | |||
+ | In [1]: | ||
+ | year = 2023; month = 6; date = 15 | ||
+ | |||
+ | In [2]: | ||
+ | import pandas as pd | ||
+ | df = pd.DataFrame({ | ||
+ | ' | ||
+ | ' | ||
+ | }) | ||
+ | |||
+ | In [3]: | ||
+ | df | ||
+ | Out[3]: | ||
+ | | ||
+ | 0 | ||
+ | 1 month 6 | ||
+ | 2 | ||
+ | |||
+ | In [4]: | ||
+ | df.dtypes | ||
+ | Out[4]: | ||
+ | key object | ||
+ | value int64 | ||
+ | dtype: object | ||
+ | </ | ||
+ | |||
+ | It works even if the variables are not of the same type. | ||
+ | < | ||
+ | In [5]: | ||
+ | year = 2023; month = ' | ||
+ | |||
+ | In [6]: | ||
+ | df = pd.DataFrame({ | ||
+ | ' | ||
+ | ' | ||
+ | }) | ||
+ | |||
+ | In [7]: | ||
+ | df | ||
+ | Out[7]: | ||
+ | key value | ||
+ | 0 | ||
+ | 1 month June | ||
+ | 2 | ||
+ | |||
+ | In [8]: | ||
+ | df.dtypes | ||
+ | Out[8]: | ||
+ | key object | ||
+ | value object | ||
+ | dtype: object | ||
+ | </ | ||
+ | |||
+ | Tested with Python 3.11.3, IPython 8.12.0 | ||
+ | |||
===== selection related ===== | ===== selection related ===== | ||
==== split columns ==== | ==== split columns ==== |
pandas_dataframe.txt · Last modified: 2024/05/22 15:47 by admin