Differences

This shows you the differences between two versions of the page.

--- pandas_dataframe [2023/05/05 21:00] – [Create a dataframe by splitting strings] admin
+++ pandas_dataframe [2023/07/21 22:16] – [Create a dataframe from list of lists] raju
@@ Line 59: / Line 59: @@
   * https://www.geeksforgeeks.org/add-column-names-to-dataframe-in-pandas/ - got the idea on zip from here.
+tags | row by row
 ==== Create a dataframe by splitting strings ====
 Given a list of strings, the idea here is to create a data frame by splitting them into multiple columns.
@@ Line 117: / Line 118: @@
 ==== Create a dataframe from a series of lists ====
 tags | convert series with lists to dataframe
+<code>
+df = pd.DataFrame(s.to_list())
+</code>
+For example
 <code>
 In [1]:
@@ Line 137: / Line 142: @@
   8  9  NaN  NaN
 </code>
+If the number of elements in each list is same, np.vstack() can be used but otherwise it will not work. For example
+<code>
+In [5]:
+s
+Out[5]:
+       [1, 2, 3]
+    [4, 5, 6, 7]
+          [8, 9]
+dtype: object
+In [6]:
+import numpy as np
+df = pd.DataFrame(np.vstack(s))
+---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+Cell In[6], line 2
+import numpy as np
+----> 2 df = pd.DataFrame(np.vstack(s))
+File <__array_function__ internals>:200, in vstack(*args, **kwargs)
+File ~\AppData\Local\conda\conda\envs\py311\Lib\site-packages\numpy\core\shape_base.py:296, in vstack(tup, dtype, casting)
+if not isinstance(arrs, list):
+     arrs = [arrs]
+--> 296 return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
+File <__array_function__ internals>:200, in concatenate(*args, **kwargs)
+ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4
+</code>
+But with
+<code>
+In [10]:
+s = pd.Series([[1, 2, 3], [4, 5, 6]])
+s
+Out[10]:
+    [1, 2, 3]
+    [4, 5, 6]
+dtype: object
+In [11]:
+import numpy as np
+df = pd.DataFrame(np.vstack(s))
+df
+Out[11]:
+  1  2
+  1  2  3
+  4  5  6
+In [12]:
+df = pd.DataFrame(s.to_list())
+df
+Out[12]:
+  1  2
+  1  2  3
+  4  5  6
+</code>
+See also:
+  * https://stackoverflow.com/questions/45901018/convert-pandas-series-of-lists-to-dataframe
+==== Create a dataframe from a bunch of variables ====
+<code>
+import pandas as pd
+df = pd.DataFrame({
+  'key': ['var1', 'var2', 'var3'],
+  'value': [var1, var2, var3]
+})
+</code>
+For example
+<code>
+$ ipython
+In [1]:
+year = 2023; month = 6; date = 15
+In [2]:
+import pandas as pd
+df = pd.DataFrame({
+  'key': ['year', 'month', 'date'],
+  'value': [year, month, date]
+})
+In [3]:
+df
+Out[3]:
+     key  value
+   year   2023
+  month      6
+   date     15
+In [4]:
+df.dtypes
+Out[4]:
+key      object
+value     int64
+dtype: object
+</code>
+It works even if the variables are not of the same type.
+<code>
+In [5]:
+year = 2023; month = 'June'; date = 15
+In [6]:
+df = pd.DataFrame({
+  'key': ['year', 'month', 'date'],
+  'value': [year, month, date]
+})
+In [7]:
+df
+Out[7]:
+     key value
+   year  2023
+  month  June
+   date    15
+In [8]:
+df.dtypes
+Out[8]:
+key      object
+value    object
+dtype: object
+</code>
+Tested with Python 3.11.3, IPython 8.12.0
 ===== selection related =====
 ==== split columns ====