Differences

This shows you the differences between two versions of the page.

--- pandas_dataframe [2023/05/05 21:01] – [Create a dataframe from a series of lists] admin
+++ pandas_dataframe [2023/09/07 21:44] – [split columns] raju
@@ Line 59: / Line 59: @@
   * https://www.geeksforgeeks.org/add-column-names-to-dataframe-in-pandas/ - got the idea on zip from here.
+tags | row by row
 ==== Create a dataframe by splitting strings ====
 Given a list of strings, the idea here is to create a data frame by splitting them into multiple columns.
@@ Line 141: / Line 142: @@
   8  9  NaN  NaN
 </code>
+If the number of elements in each list is same, np.vstack() can be used but otherwise it will not work. For example
+<code>
+In [5]:
+s
+Out[5]:
+       [1, 2, 3]
+    [4, 5, 6, 7]
+          [8, 9]
+dtype: object
+In [6]:
+import numpy as np
+df = pd.DataFrame(np.vstack(s))
+---------------------------------------------------------------------------
+ValueError                                Traceback (most recent call last)
+Cell In[6], line 2
+import numpy as np
+----> 2 df = pd.DataFrame(np.vstack(s))
+File <__array_function__ internals>:200, in vstack(*args, **kwargs)
+File ~\AppData\Local\conda\conda\envs\py311\Lib\site-packages\numpy\core\shape_base.py:296, in vstack(tup, dtype, casting)
+if not isinstance(arrs, list):
+     arrs = [arrs]
+--> 296 return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
+File <__array_function__ internals>:200, in concatenate(*args, **kwargs)
+ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4
+</code>
+But with
+<code>
+In [10]:
+s = pd.Series([[1, 2, 3], [4, 5, 6]])
+s
+Out[10]:
+    [1, 2, 3]
+    [4, 5, 6]
+dtype: object
+In [11]:
+import numpy as np
+df = pd.DataFrame(np.vstack(s))
+df
+Out[11]:
+  1  2
+  1  2  3
+  4  5  6
+In [12]:
+df = pd.DataFrame(s.to_list())
+df
+Out[12]:
+  1  2
+  1  2  3
+  4  5  6
+</code>
+See also:
+  * https://stackoverflow.com/questions/45901018/convert-pandas-series-of-lists-to-dataframe
+==== Create a dataframe from a bunch of variables ====
+<code>
+import pandas as pd
+df = pd.DataFrame({
+  'key': ['var1', 'var2', 'var3'],
+  'value': [var1, var2, var3]
+})
+</code>
+For example
+<code>
+$ ipython
+In [1]:
+year = 2023; month = 6; date = 15
+In [2]:
+import pandas as pd
+df = pd.DataFrame({
+  'key': ['year', 'month', 'date'],
+  'value': [year, month, date]
+})
+In [3]:
+df
+Out[3]:
+     key  value
+   year   2023
+  month      6
+   date     15
+In [4]:
+df.dtypes
+Out[4]:
+key      object
+value     int64
+dtype: object
+</code>
+It works even if the variables are not of the same type.
+<code>
+In [5]:
+year = 2023; month = 'June'; date = 15
+In [6]:
+df = pd.DataFrame({
+  'key': ['year', 'month', 'date'],
+  'value': [year, month, date]
+})
+In [7]:
+df
+Out[7]:
+     key value
+   year  2023
+  month  June
+   date    15
+In [8]:
+df.dtypes
+Out[8]:
+key      object
+value    object
+dtype: object
+</code>
+Tested with Python 3.11.3, IPython 8.12.0
 ===== selection related =====
 ==== split columns ====
@@ Line 151: / Line 283: @@
 tags | uses [http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.split.html pandas.Series.str.split]
+==== lookup value ====
+To pick the first value in column 'A' for rows where column B is FOO
+<code>
+df.loc[df['B'] == 'FOO', 'A'].iloc[0]
+</code>
+Example:
+<code>
+$ ipython
+In [1]:
+import pandas as pd
+df = pd.DataFrame({'A': ['p1', 'p2', 'p3', 'p4'], 'B': [1, 3, 3, 2]})
+print(df)
+    A  B
+  p1  1
+  p2  3
+  p3  3
+  p4  2
+In [2]:
+df.loc[df['B'] == 3, 'A']
+Out[2]:
+    p2
+    p3
+Name: A, dtype: object
+In [3]:
+df.loc[df['B'] == 3, 'A'].iloc[0]
+Out[3]:
+'p2'
+</code>
+search tags | value of one column when another column equals something
 ===== Series related =====
 ==== view the index, type, length and name of the series ====