User Tools

Site Tools


get_the_first_non_null_value_in_each_column

This is an old revision of the document!


Get the first non null value in each column

Task

Get the first non null value in each column

Corner cases:

  • If a column is all NaNs, return a NaN.

For example, given

   jim  joe  jolie  jack
     0  1.0    NaN   NaN
     0  NaN    2.0   NaN

We want

   jim  joe  jolie  jack
     0  1.0    2.0   NaN

Solution

$ ipython

In [1]:
import pandas as pd
import numpy as np
df = pd.DataFrame({'jim': [0, 0], 'joe': [1, np.nan],
                   'jolie': [np.nan, 2], 'jack': [np.nan, np.nan]})
df
Out[1]:
   jim  joe  jolie  jack
0    0  1.0    NaN   NaN
1    0  NaN    2.0   NaN

In [2]:
def get_first_non_nan(s):
    values = s.loc[~s.isnull()]
    value = values.iloc[0] if not values.empty else np.nan
    return value

In [3]:
df.groupby('jim').agg(get_first_non_nan)
Out[3]:
     joe  jolie  jack
jim
0    1.0    2.0   NaN

In [4]:
df.groupby('jim').agg(get_first_non_nan).reset_index()
Out[4]:
   jim  joe  jolie  jack
0    0  1.0    2.0   NaN

meta

Used Python 3.9.4 and IPython 7.22.0

demonstrates | apply a function on each column of a dataframe

get_the_first_non_null_value_in_each_column.1631740585.txt.gz · Last modified: 2021/09/15 21:16 by raju