get_the_first_non_null_value_in_each_column
This is an old revision of the document!
Table of Contents
Get the first non null value in each column
Task
Get the first non null value in each column
Corner cases:
- If a column is all NaNs, return a NaN.
For example, given
jim joe jolie jack 0 1.0 NaN NaN 0 NaN 2.0 NaN 1 3.0 NaN NaN 1 NaN 4.0 NaN
We want
jim joe jolie jack 0 1.0 2.0 NaN 1 3.0 4.0 NaN
Solution
$ ipython In [1]: import pandas as pd import numpy as np df = pd.DataFrame({'jim': [0, 0], 'joe': [1, np.nan], 'jolie': [np.nan, 2], 'jack': [np.nan, np.nan]}) df Out[1]: jim joe jolie jack 0 0 1.0 NaN NaN 1 0 NaN 2.0 NaN In [2]: def get_first_non_nan(s): values = s.loc[~s.isnull()] value = values.iloc[0] if not values.empty else np.nan return value In [3]: df.groupby('jim').agg(get_first_non_nan) Out[3]: joe jolie jack jim 0 1.0 2.0 NaN In [4]: df.groupby('jim').agg(get_first_non_nan).reset_index() Out[4]: jim joe jolie jack 0 0 1.0 2.0 NaN
meta
Used Python 3.9.4 and IPython 7.22.0
demonstrates | apply a function on each column of a dataframe after doing a groupby
get_the_first_non_null_value_in_each_column.1631740864.txt.gz · Last modified: 2021/09/15 21:21 by raju