pandas_groupby
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
pandas_groupby [2021/07/06 22:50] – admin | pandas_groupby [2024/03/26 22:25] – [groupby slicing] raju | ||
---|---|---|---|
Line 1: | Line 1: | ||
==== preserve the highest value entries in each group ==== | ==== preserve the highest value entries in each group ==== | ||
tags | filter by value | tags | filter by value | ||
+ | |||
Given | Given | ||
< | < | ||
Line 74: | Line 75: | ||
Ref:- https:// | Ref:- https:// | ||
+ | |||
+ | ==== preserve the highest odd value in each group ==== | ||
+ | tags | pandas groupby transform maximum odd number, maxodd | ||
+ | |||
+ | Given | ||
+ | < | ||
+ | | ||
+ | 0 | ||
+ | 1 | ||
+ | 2 | ||
+ | 3 | ||
+ | 4 | ||
+ | 5 | ||
+ | 6 | ||
+ | 7 | ||
+ | 8 | ||
+ | 9 | ||
+ | 10 MM6 S6 ea 2 | ||
+ | 11 MM7 S7 | ||
+ | </ | ||
+ | |||
+ | We want | ||
+ | < | ||
+ | | ||
+ | 2 | ||
+ | 3 | ||
+ | 6 | ||
+ | 8 | ||
+ | 9 | ||
+ | 11 MM7 S7 | ||
+ | </ | ||
+ | That is get all the rows with highest odd ' | ||
+ | If there is a group with only even ' | ||
+ | |||
+ | Solution | ||
+ | < | ||
+ | In [1]: | ||
+ | import pandas as pd | ||
+ | df = pd.DataFrame({' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | df | ||
+ | Out[1]: | ||
+ | | ||
+ | 0 | ||
+ | 1 | ||
+ | 2 | ||
+ | 3 | ||
+ | 4 | ||
+ | 5 | ||
+ | 6 | ||
+ | 7 | ||
+ | 8 | ||
+ | 9 | ||
+ | 10 MM6 S6 ea 2 | ||
+ | 11 MM7 S7 | ||
+ | |||
+ | In [2]: | ||
+ | def max_odd(s): | ||
+ | value = s.loc[s % 2 == 1].max() | ||
+ | return value | ||
+ | |||
+ | |||
+ | In [3]: | ||
+ | idx = df.groupby([' | ||
+ | df[idx] | ||
+ | Out[3]: | ||
+ | | ||
+ | 2 | ||
+ | 3 | ||
+ | 6 | ||
+ | 8 | ||
+ | 9 | ||
+ | 11 MM7 S7 | ||
+ | </ | ||
+ | |||
+ | Breakdown of how it works: | ||
+ | < | ||
+ | In [4]: | ||
+ | df.groupby([' | ||
+ | Out[4]: | ||
+ | 0 3.0 | ||
+ | 1 3.0 | ||
+ | 2 3.0 | ||
+ | 3 1.0 | ||
+ | 4 1.0 | ||
+ | 5 3.0 | ||
+ | 6 3.0 | ||
+ | 7 3.0 | ||
+ | 8 3.0 | ||
+ | 9 1.0 | ||
+ | 10 NaN | ||
+ | 11 3.0 | ||
+ | Name: count, dtype: float64 | ||
+ | |||
+ | In [5]: | ||
+ | idx = df.groupby([' | ||
+ | idx | ||
+ | Out[5]: | ||
+ | 0 False | ||
+ | 1 False | ||
+ | 2 True | ||
+ | 3 True | ||
+ | 4 False | ||
+ | 5 False | ||
+ | 6 True | ||
+ | 7 False | ||
+ | 8 True | ||
+ | 9 True | ||
+ | 10 False | ||
+ | 11 True | ||
+ | Name: count, dtype: bool | ||
+ | </ | ||
==== level ==== | ==== level ==== | ||
If a dataframe has multiple indices but you need to groupby on only of them, use level. So, level=0 groups it on the first index, level=1 on the second index, level=-1 on the last index etc., | If a dataframe has multiple indices but you need to groupby on only of them, use level. So, level=0 groups it on the first index, level=1 on the second index, level=-1 on the last index etc., | ||
Line 119: | Line 237: | ||
==== extract groupby object by key ==== | ==== extract groupby object by key ==== | ||
+ | tags | pandas groupby filter a group | ||
+ | |||
* groups.get_group(key_value) if grouping on a single column | * groups.get_group(key_value) if grouping on a single column | ||
* groups.get_group(key_value_tuple) if grouping on multiple columns. | * groups.get_group(key_value_tuple) if grouping on multiple columns. | ||
Line 199: | Line 319: | ||
5 bar 0 6 | 5 bar 0 6 | ||
</ | </ | ||
+ | |||
+ | ==== groupby slicing ==== | ||
+ | Consider | ||
+ | < | ||
+ | In [1]: | ||
+ | import pandas as pd | ||
+ | import numpy as np | ||
+ | rand = np.random.RandomState(1) | ||
+ | df = pd.DataFrame({' | ||
+ | ' | ||
+ | ' | ||
+ | |||
+ | In [2]: | ||
+ | df | ||
+ | Out[2]: | ||
+ | | ||
+ | 0 foo 1.624345 | ||
+ | 1 bar -0.611756 | ||
+ | 2 foo -0.528172 | ||
+ | 3 bar -1.072969 | ||
+ | 4 foo 0.865408 | ||
+ | 5 bar -2.301539 | ||
+ | </ | ||
+ | |||
+ | Group by on column ' | ||
+ | < | ||
+ | In [3]: | ||
+ | gb = df.groupby([' | ||
+ | </ | ||
+ | |||
+ | You can use get_group() to get a single group | ||
+ | < | ||
+ | In [4]: | ||
+ | gb.get_group(' | ||
+ | Out[4]: | ||
+ | | ||
+ | 0 foo 1.624345 | ||
+ | 2 foo -0.528172 | ||
+ | 4 foo 0.865408 | ||
+ | </ | ||
+ | |||
+ | You can select different columns using the groupby slicing: | ||
+ | < | ||
+ | In [5]: | ||
+ | gb[[' | ||
+ | Out[5]: | ||
+ | | ||
+ | 0 foo 1.624345 | ||
+ | 2 foo -0.528172 | ||
+ | 4 foo 0.865408 | ||
+ | |||
+ | In [6]: | ||
+ | gb[[' | ||
+ | Out[6]: | ||
+ | C | ||
+ | 0 5 | ||
+ | 2 11 | ||
+ | 4 14 | ||
+ | </ | ||
+ | |||
+ | Ref: | ||
+ | * https:// | ||
==== apply a function on each group ==== | ==== apply a function on each group ==== | ||
Line 245: | Line 427: | ||
tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1 | tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1 | ||
- | |||
- | ==== preserve the highest odd value in each group ==== | ||
pandas_groupby.txt · Last modified: 2024/05/07 20:47 by raju