User Tools

Site Tools


pandas_groupby

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
pandas_groupby [2021/07/06 23:02] – [preserve the highest odd value in each group] adminpandas_groupby [2024/03/26 22:25] – [groupby slicing] raju
Line 77: Line 77:
  
 ==== preserve the highest odd value in each group ==== ==== preserve the highest odd value in each group ====
 +tags | pandas groupby transform maximum odd number, maxodd
 +
 Given Given
 <code> <code>
Line 93: Line 95:
 11  MM7  S7          3 11  MM7  S7          3
 </code> </code>
-get all the rows with highest odd 'count' for each ['Sp', 'Mt'] combination. That is, we want+ 
 +We want
 <code> <code>
      Sp  Mt Value  count      Sp  Mt Value  count
Line 103: Line 106:
 11  MM7  S7          3 11  MM7  S7          3
 </code> </code>
 +That is get all the rows with highest odd 'count' for each ['Sp', 'Mt'] combination.
 +If there is a group with only even 'count' values, discard it.
 +
 Solution Solution
 <code> <code>
Line 169: Line 175:
  
 In [5]: In [5]:
-df.groupby(['Sp', 'Mt'])['count'].transform(max_odd) == df['count']+idx = df.groupby(['Sp', 'Mt'])['count'].transform(max_odd) == df['count'] 
 +idx
 Out[5]: Out[5]:
 0     False 0     False
Line 230: Line 237:
  
 ==== extract groupby object by key ==== ==== extract groupby object by key ====
 +tags | pandas groupby filter a group
 +
   * groups.get_group(key_value) if grouping on a single column   * groups.get_group(key_value) if grouping on a single column
   * groups.get_group(key_value_tuple) if grouping on multiple columns.   * groups.get_group(key_value_tuple) if grouping on multiple columns.
Line 310: Line 319:
 5  bar  0  6 5  bar  0  6
 </code> </code>
 +
 +==== groupby slicing ====
 +Consider
 +<code>
 +In [1]: 
 +import pandas as pd
 +import numpy as np
 +rand = np.random.RandomState(1)
 +df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
 +                   'B': rand.randn(6),
 +                   'C': rand.randint(0, 20, 6)})
 +
 +In [2]: 
 +df
 +Out[2]: 
 +               C
 +0  foo  1.624345   5
 +1  bar -0.611756  18
 +2  foo -0.528172  11
 +3  bar -1.072969  10
 +4  foo  0.865408  14
 +5  bar -2.301539  18
 +</code>
 +
 +Group by on column 'A'
 +<code>
 +In [3]: 
 +gb = df.groupby(['A'])
 +</code>
 +
 +You can use get_group() to get a single group
 +<code>
 +In [4]: 
 +gb.get_group('foo')
 +Out[4]: 
 +               C
 +0  foo  1.624345   5
 +2  foo -0.528172  11
 +4  foo  0.865408  14
 +</code>
 +
 +You can select different columns using the groupby slicing:
 +<code>
 +In [5]: 
 +gb[['A', 'B']].get_group('foo')
 +Out[5]: 
 +             B
 +0  foo  1.624345
 +2  foo -0.528172
 +4  foo  0.865408
 +
 +In [6]: 
 +gb[['C']].get_group('foo')
 +Out[6]: 
 +    C
 +0   5
 +2  11
 +4  14
 +</code>
 +
 +Ref:
 +  * https://stackoverflow.com/questions/14734533/how-to-access-subdataframes-of-pandas-groupby-by-key
  
 ==== apply a function on each group ==== ==== apply a function on each group ====
Line 356: Line 427:
  
 tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1 tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1
- 
  
pandas_groupby.txt · Last modified: 2024/05/07 20:47 by raju