pandas_groupby
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
pandas_groupby [2021/07/06 22:57] – [preserve the highest odd value in each group] admin | pandas_groupby [2024/05/07 20:46] – [extract groupby object by key] raju | ||
---|---|---|---|
Line 77: | Line 77: | ||
==== preserve the highest odd value in each group ==== | ==== preserve the highest odd value in each group ==== | ||
+ | tags | pandas groupby transform maximum odd number, maxodd | ||
+ | |||
Given | Given | ||
< | < | ||
Line 93: | Line 95: | ||
11 MM7 S7 | 11 MM7 S7 | ||
</ | </ | ||
- | get all the rows with highest odd ' | + | |
+ | We want | ||
< | < | ||
| | ||
Line 103: | Line 106: | ||
11 MM7 S7 | 11 MM7 S7 | ||
</ | </ | ||
+ | That is get all the rows with highest odd ' | ||
+ | If there is a group with only even ' | ||
+ | |||
Solution | Solution | ||
< | < | ||
In [1]: | In [1]: | ||
import pandas as pd | import pandas as pd | ||
- | df = pd.DataFrame({' | + | df = pd.DataFrame({' |
- | ' | + | |
- | ' | + | ' |
+ | | ||
+ | ' | ||
+ | ' | ||
' | ' | ||
df | df | ||
Line 144: | Line 153: | ||
9 | 9 | ||
11 MM7 S7 | 11 MM7 S7 | ||
+ | </ | ||
+ | |||
+ | Breakdown of how it works: | ||
+ | < | ||
+ | In [4]: | ||
+ | df.groupby([' | ||
+ | Out[4]: | ||
+ | 0 3.0 | ||
+ | 1 3.0 | ||
+ | 2 3.0 | ||
+ | 3 1.0 | ||
+ | 4 1.0 | ||
+ | 5 3.0 | ||
+ | 6 3.0 | ||
+ | 7 3.0 | ||
+ | 8 3.0 | ||
+ | 9 1.0 | ||
+ | 10 NaN | ||
+ | 11 3.0 | ||
+ | Name: count, dtype: float64 | ||
+ | |||
+ | In [5]: | ||
+ | idx = df.groupby([' | ||
+ | idx | ||
+ | Out[5]: | ||
+ | 0 False | ||
+ | 1 False | ||
+ | 2 True | ||
+ | 3 True | ||
+ | 4 False | ||
+ | 5 False | ||
+ | 6 True | ||
+ | 7 False | ||
+ | 8 True | ||
+ | 9 True | ||
+ | 10 False | ||
+ | 11 True | ||
+ | Name: count, dtype: bool | ||
</ | </ | ||
==== level ==== | ==== level ==== | ||
Line 188: | Line 235: | ||
Ref: https:// | Ref: https:// | ||
+ | |||
+ | ==== filter elements from groups that dont satisfy a criterion ==== | ||
+ | tags | pandas groupby filter groups | ||
+ | < | ||
+ | In [2]: | ||
+ | import pandas as pd | ||
+ | df = pd.DataFrame({ | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | df | ||
+ | Out[2]: | ||
+ | | ||
+ | 0 foo 1 2.0 | ||
+ | 1 bar 2 5.0 | ||
+ | 2 foo 3 8.0 | ||
+ | 3 bar 4 1.0 | ||
+ | 4 foo 5 2.0 | ||
+ | 5 bar 6 9.0 | ||
+ | |||
+ | In [3]: | ||
+ | grouped = df.groupby(' | ||
+ | |||
+ | In [4]: | ||
+ | grouped.filter(lambda x: x[' | ||
+ | Out[4]: | ||
+ | | ||
+ | 1 bar 2 5.0 | ||
+ | 3 bar 4 1.0 | ||
+ | 5 bar 6 9.0 | ||
+ | </ | ||
==== extract groupby object by key ==== | ==== extract groupby object by key ==== | ||
+ | tags | pandas groupby filter a group | ||
+ | |||
* groups.get_group(key_value) if grouping on a single column | * groups.get_group(key_value) if grouping on a single column | ||
* groups.get_group(key_value_tuple) if grouping on multiple columns. | * groups.get_group(key_value_tuple) if grouping on multiple columns. | ||
Line 270: | Line 350: | ||
5 bar 0 6 | 5 bar 0 6 | ||
</ | </ | ||
+ | |||
+ | ==== groupby slicing ==== | ||
+ | Consider | ||
+ | < | ||
+ | In [1]: | ||
+ | import pandas as pd | ||
+ | import numpy as np | ||
+ | rand = np.random.RandomState(1) | ||
+ | df = pd.DataFrame({' | ||
+ | ' | ||
+ | ' | ||
+ | |||
+ | In [2]: | ||
+ | df | ||
+ | Out[2]: | ||
+ | | ||
+ | 0 foo 1.624345 | ||
+ | 1 bar -0.611756 | ||
+ | 2 foo -0.528172 | ||
+ | 3 bar -1.072969 | ||
+ | 4 foo 0.865408 | ||
+ | 5 bar -2.301539 | ||
+ | </ | ||
+ | |||
+ | Group by on column ' | ||
+ | < | ||
+ | In [3]: | ||
+ | gb = df.groupby([' | ||
+ | </ | ||
+ | |||
+ | You can use get_group() to get a single group | ||
+ | < | ||
+ | In [4]: | ||
+ | gb.get_group(' | ||
+ | Out[4]: | ||
+ | | ||
+ | 0 foo 1.624345 | ||
+ | 2 foo -0.528172 | ||
+ | 4 foo 0.865408 | ||
+ | </ | ||
+ | |||
+ | You can select different columns using the groupby slicing: | ||
+ | < | ||
+ | In [5]: | ||
+ | gb[[' | ||
+ | Out[5]: | ||
+ | | ||
+ | 0 foo 1.624345 | ||
+ | 2 foo -0.528172 | ||
+ | 4 foo 0.865408 | ||
+ | |||
+ | In [6]: | ||
+ | gb[[' | ||
+ | Out[6]: | ||
+ | C | ||
+ | 0 5 | ||
+ | 2 11 | ||
+ | 4 14 | ||
+ | </ | ||
+ | |||
+ | Ref: | ||
+ | * https:// | ||
==== apply a function on each group ==== | ==== apply a function on each group ==== | ||
Line 316: | Line 458: | ||
tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1 | tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1 | ||
- | |||
pandas_groupby.txt · Last modified: 2024/05/07 20:47 by raju