User Tools

Site Tools


pandas_groupby

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
pandas_groupby [2021/07/06 23:00] – [preserve the highest odd value in each group] adminpandas_groupby [2024/03/26 22:25] (current) – [groupby slicing] raju
Line 77: Line 77:
  
 ==== preserve the highest odd value in each group ==== ==== preserve the highest odd value in each group ====
 +tags | pandas groupby transform maximum odd number, maxodd
 +
 Given Given
 <code> <code>
Line 93: Line 95:
 11  MM7  S7          3 11  MM7  S7          3
 </code> </code>
-get all the rows with highest odd 'count' for each ['Sp', 'Mt'] combination. That is, we want+ 
 +We want
 <code> <code>
      Sp  Mt Value  count      Sp  Mt Value  count
Line 103: Line 106:
 11  MM7  S7          3 11  MM7  S7          3
 </code> </code>
 +That is get all the rows with highest odd 'count' for each ['Sp', 'Mt'] combination.
 +If there is a group with only even 'count' values, discard it.
 +
 Solution Solution
 <code> <code>
Line 147: Line 153:
 9   MM5  S5          1 9   MM5  S5          1
 11  MM7  S7          3 11  MM7  S7          3
 +</code>
 +
 +Breakdown of how it works:
 +<code>
 +In [4]:
 +df.groupby(['Sp', 'Mt'])['count'].transform(max_odd)
 +Out[4]:
 +0     3.0
 +1     3.0
 +2     3.0
 +3     1.0
 +4     1.0
 +5     3.0
 +6     3.0
 +7     3.0
 +8     3.0
 +9     1.0
 +10    NaN
 +11    3.0
 +Name: count, dtype: float64
 +
 +In [5]:
 +idx = df.groupby(['Sp', 'Mt'])['count'].transform(max_odd) == df['count']
 +idx
 +Out[5]:
 +0     False
 +1     False
 +2      True
 +3      True
 +4     False
 +5     False
 +6      True
 +7     False
 +8      True
 +9      True
 +10    False
 +11     True
 +Name: count, dtype: bool
 </code> </code>
 ==== level ==== ==== level ====
Line 193: Line 237:
  
 ==== extract groupby object by key ==== ==== extract groupby object by key ====
 +tags | pandas groupby filter a group
 +
   * groups.get_group(key_value) if grouping on a single column   * groups.get_group(key_value) if grouping on a single column
   * groups.get_group(key_value_tuple) if grouping on multiple columns.   * groups.get_group(key_value_tuple) if grouping on multiple columns.
Line 273: Line 319:
 5  bar  0  6 5  bar  0  6
 </code> </code>
 +
 +==== groupby slicing ====
 +Consider
 +<code>
 +In [1]: 
 +import pandas as pd
 +import numpy as np
 +rand = np.random.RandomState(1)
 +df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
 +                   'B': rand.randn(6),
 +                   'C': rand.randint(0, 20, 6)})
 +
 +In [2]: 
 +df
 +Out[2]: 
 +               C
 +0  foo  1.624345   5
 +1  bar -0.611756  18
 +2  foo -0.528172  11
 +3  bar -1.072969  10
 +4  foo  0.865408  14
 +5  bar -2.301539  18
 +</code>
 +
 +Group by on column 'A'
 +<code>
 +In [3]: 
 +gb = df.groupby(['A'])
 +</code>
 +
 +You can use get_group() to get a single group
 +<code>
 +In [4]: 
 +gb.get_group('foo')
 +Out[4]: 
 +               C
 +0  foo  1.624345   5
 +2  foo -0.528172  11
 +4  foo  0.865408  14
 +</code>
 +
 +You can select different columns using the groupby slicing:
 +<code>
 +In [5]: 
 +gb[['A', 'B']].get_group('foo')
 +Out[5]: 
 +             B
 +0  foo  1.624345
 +2  foo -0.528172
 +4  foo  0.865408
 +
 +In [6]: 
 +gb[['C']].get_group('foo')
 +Out[6]: 
 +    C
 +0   5
 +2  11
 +4  14
 +</code>
 +
 +Ref:
 +  * https://stackoverflow.com/questions/14734533/how-to-access-subdataframes-of-pandas-groupby-by-key
  
 ==== apply a function on each group ==== ==== apply a function on each group ====
Line 319: Line 427:
  
 tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1 tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1
- 
  
pandas_groupby.1625612418.txt.gz · Last modified: 2021/07/06 23:00 by admin