User Tools

Site Tools


pandas_dataframe

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
pandas_dataframe [2023/05/05 21:01] – [Create a dataframe from a series of lists] adminpandas_dataframe [2023/09/07 21:44] – [split columns] raju
Line 59: Line 59:
   * https://www.geeksforgeeks.org/add-column-names-to-dataframe-in-pandas/ - got the idea on zip from here.   * https://www.geeksforgeeks.org/add-column-names-to-dataframe-in-pandas/ - got the idea on zip from here.
  
 +tags | row by row
 ==== Create a dataframe by splitting strings ==== ==== Create a dataframe by splitting strings ====
 Given a list of strings, the idea here is to create a data frame by splitting them into multiple columns. Given a list of strings, the idea here is to create a data frame by splitting them into multiple columns.
Line 141: Line 142:
 2  8  9  NaN  NaN 2  8  9  NaN  NaN
 </code> </code>
 +
 +If the number of elements in each list is same, np.vstack() can be used but otherwise it will not work. For example
 +<code>
 +In [5]:
 +s
 +Out[5]:
 +0       [1, 2, 3]
 +1    [4, 5, 6, 7]
 +2          [8, 9]
 +dtype: object
 +
 +In [6]:
 +import numpy as np
 +df = pd.DataFrame(np.vstack(s))
 +---------------------------------------------------------------------------
 +ValueError                                Traceback (most recent call last)
 +Cell In[6], line 2
 +      1 import numpy as np
 +----> 2 df = pd.DataFrame(np.vstack(s))
 +
 +File <__array_function__ internals>:200, in vstack(*args, **kwargs)
 +
 +File ~\AppData\Local\conda\conda\envs\py311\Lib\site-packages\numpy\core\shape_base.py:296, in vstack(tup, dtype, casting)
 +    294 if not isinstance(arrs, list):
 +    295     arrs = [arrs]
 +--> 296 return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
 +
 +File <__array_function__ internals>:200, in concatenate(*args, **kwargs)
 +
 +ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4
 +</code>
 +
 +But with
 +<code>
 +In [10]:
 +s = pd.Series([[1, 2, 3], [4, 5, 6]])
 +s
 +Out[10]:
 +0    [1, 2, 3]
 +1    [4, 5, 6]
 +dtype: object
 +
 +In [11]:
 +import numpy as np
 +df = pd.DataFrame(np.vstack(s))
 +df
 +Out[11]:
 +    1  2
 +0  1  2  3
 +1  4  5  6
 +
 +In [12]:
 +df = pd.DataFrame(s.to_list())
 +df
 +Out[12]:
 +    1  2
 +0  1  2  3
 +1  4  5  6
 +</code>
 +
 +See also:
 +  * https://stackoverflow.com/questions/45901018/convert-pandas-series-of-lists-to-dataframe
 +
 +==== Create a dataframe from a bunch of variables ====
 +<code>
 +import pandas as pd
 +df = pd.DataFrame({
 +  'key': ['var1', 'var2', 'var3'],
 +  'value': [var1, var2, var3]
 +})
 +</code>
 +
 +For example
 +<code>
 +$ ipython
 +
 +In [1]:
 +year = 2023; month = 6; date = 15
 +
 +In [2]:
 +import pandas as pd
 +df = pd.DataFrame({
 +  'key': ['year', 'month', 'date'],
 +  'value': [year, month, date]
 +})
 +
 +In [3]:
 +df
 +Out[3]:
 +     key  value
 +0   year   2023
 +1  month      6
 +2   date     15
 +
 +In [4]:
 +df.dtypes
 +Out[4]:
 +key      object
 +value     int64
 +dtype: object
 +</code>
 +
 +It works even if the variables are not of the same type.
 +<code>
 +In [5]:
 +year = 2023; month = 'June'; date = 15
 +
 +In [6]:
 +df = pd.DataFrame({
 +  'key': ['year', 'month', 'date'],
 +  'value': [year, month, date]
 +})
 +
 +In [7]:
 +df
 +Out[7]:
 +     key value
 +0   year  2023
 +1  month  June
 +2   date    15
 +
 +In [8]:
 +df.dtypes
 +Out[8]:
 +key      object
 +value    object
 +dtype: object
 +</code>
 +
 +Tested with Python 3.11.3, IPython 8.12.0
 +
 ===== selection related ===== ===== selection related =====
 ==== split columns ==== ==== split columns ====
Line 151: Line 283:
 tags | uses [http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.split.html pandas.Series.str.split] tags | uses [http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.split.html pandas.Series.str.split]
  
 +==== lookup value ====
 +To pick the first value in column 'A' for rows where column B is FOO
 +<code>
 +df.loc[df['B'] == 'FOO', 'A'].iloc[0]
 +</code>
 +
 +Example:
 +<code>
 +$ ipython
 +In [1]:
 +import pandas as pd
 +df = pd.DataFrame({'A': ['p1', 'p2', 'p3', 'p4'], 'B': [1, 3, 3, 2]})
 +print(df)
 +    A  B
 +0  p1  1
 +1  p2  3
 +2  p3  3
 +3  p4  2
 +
 +In [2]:
 +df.loc[df['B'] == 3, 'A']
 +Out[2]:
 +1    p2
 +2    p3
 +Name: A, dtype: object
 +
 +In [3]:
 +df.loc[df['B'] == 3, 'A'].iloc[0]
 +Out[3]:
 +'p2'
 +</code>
  
 +search tags | value of one column when another column equals something
 ===== Series related ===== ===== Series related =====
 ==== view the index, type, length and name of the series ==== ==== view the index, type, length and name of the series ====
pandas_dataframe.txt · Last modified: 2023/09/07 21:45 by raju