Differences

This shows you the differences between two versions of the page.

--- pandas_groupby [2021/07/06 23:00] – [preserve the highest odd value in each group] admin
+++ pandas_groupby [2024/03/26 22:25] (current) – [groupby slicing] raju
@@ Line 77: / Line 77: @@
 ==== preserve the highest odd value in each group ====
+tags | pandas groupby transform maximum odd number, maxodd
 Given
 <code>
@@ Line 93: / Line 95: @@
   MM7  S7     t      3
 </code>
-get all the rows with highest odd 'count' for each ['Sp', 'Mt'] combination. That is, we want
+We want
 <code>
      Sp  Mt Value  count
@@ Line 103: / Line 106: @@
   MM7  S7     t      3
 </code>
+That is get all the rows with highest odd 'count' for each ['Sp', 'Mt'] combination.
+If there is a group with only even 'count' values, discard it.
 Solution
 <code>
@@ Line 147: / Line 153: @@
    MM5  S5     w      1
   MM7  S7     t      3
+</code>
+Breakdown of how it works:
+<code>
+In [4]:
+df.groupby(['Sp', 'Mt'])['count'].transform(max_odd)
+Out[4]:
+     3.0
+     3.0
+     3.0
+     1.0
+     1.0
+     3.0
+     3.0
+     3.0
+     3.0
+     1.0
+    NaN
+    3.0
+Name: count, dtype: float64
+In [5]:
+idx = df.groupby(['Sp', 'Mt'])['count'].transform(max_odd) == df['count']
+idx
+Out[5]:
+     False
+     False
+      True
+      True
+     False
+     False
+      True
+     False
+      True
+      True
+    False
+     True
+Name: count, dtype: bool
 </code>
 ==== level ====
@@ Line 193: / Line 237: @@
 ==== extract groupby object by key ====
+tags | pandas groupby filter a group
   * groups.get_group(key_value) if grouping on a single column
   * groups.get_group(key_value_tuple) if grouping on multiple columns.
@@ Line 273: / Line 319: @@
   bar  0  6
 </code>
+==== groupby slicing ====
+Consider
+<code>
+In [1]:
+import pandas as pd
+import numpy as np
+rand = np.random.RandomState(1)
+df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
+                   'B': rand.randn(6),
+                   'C': rand.randint(0, 20, 6)})
+In [2]:
+df
+Out[2]:
+     A         B   C
+  foo  1.624345   5
+  bar -0.611756  18
+  foo -0.528172  11
+  bar -1.072969  10
+  foo  0.865408  14
+  bar -2.301539  18
+</code>
+Group by on column 'A'
+<code>
+In [3]:
+gb = df.groupby(['A'])
+</code>
+You can use get_group() to get a single group
+<code>
+In [4]:
+gb.get_group('foo')
+Out[4]:
+     A         B   C
+  foo  1.624345   5
+  foo -0.528172  11
+  foo  0.865408  14
+</code>
+You can select different columns using the groupby slicing:
+<code>
+In [5]:
+gb[['A', 'B']].get_group('foo')
+Out[5]:
+     A         B
+  foo  1.624345
+  foo -0.528172
+  foo  0.865408
+In [6]:
+gb[['C']].get_group('foo')
+Out[6]:
+    C
+   5
+  11
+  14
+</code>
+Ref:
+  * https://stackoverflow.com/questions/14734533/how-to-access-subdataframes-of-pandas-groupby-by-key
 ==== apply a function on each group ====
@@ Line 319: / Line 427: @@
 tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1