Differences

This shows you the differences between two versions of the page.

--- pandas_groupby [2021/07/06 22:57] – [preserve the highest odd value in each group] admin
+++ pandas_groupby [2024/05/07 20:46] – [extract groupby object by key] raju
@@ Line 77: / Line 77: @@
 ==== preserve the highest odd value in each group ====
+tags | pandas groupby transform maximum odd number, maxodd
 Given
 <code>
@@ Line 93: / Line 95: @@
   MM7  S7     t      3
 </code>
-get all the rows with highest odd 'count' for each ['Sp', 'Mt'] combination. That is, we want
+We want
 <code>
      Sp  Mt Value  count
@@ Line 103: / Line 106: @@
   MM7  S7     t      3
 </code>
+That is get all the rows with highest odd 'count' for each ['Sp', 'Mt'] combination.
+If there is a group with only even 'count' values, discard it.
 Solution
 <code>
 In [1]:
 import pandas as pd
-df = pd.DataFrame({'Sp': ['MM1', 'MM1', 'MM1', 'MM2', 'MM2', 'MM3', 'MM3', 'MM4', 'MM4', 'MM5', 'MM6', 'MM7'],
+df = pd.DataFrame({'Sp': ['MM1', 'MM1', 'MM1', 'MM2', 'MM2', 'MM3', 'MM3',
-                   'Mt': ['S1', 'S1', 'S1', 'S2', 'S2', 'S3', 'S3', 'S4', 'S4', 'S5', 'S6', 'S7'],
+                          'MM4', 'MM4', 'MM5', 'MM6', 'MM7'],
-                   'Value': ['a', 'n', 'cb', 'mk', 'bg', 'dgd', 'rd', 'cb', 'uyi', 'w', 'ea', 't'],
+                   'Mt': ['S1', 'S1', 'S1', 'S2', 'S2', 'S3', 'S3',
+                          'S4', 'S4', 'S5', 'S6', 'S7'],
+                   'Value': ['a', 'n', 'cb', 'mk', 'bg', 'dgd', 'rd',
+                             'cb', 'uyi', 'w', 'ea', 't'],
                    'count': [1, 2, 3, 1, 2, 2, 3, 1, 3, 1, 2, 3]})
 df
@@ Line 144: / Line 153: @@
    MM5  S5     w      1
   MM7  S7     t      3
+</code>
+Breakdown of how it works:
+<code>
+In [4]:
+df.groupby(['Sp', 'Mt'])['count'].transform(max_odd)
+Out[4]:
+     3.0
+     3.0
+     3.0
+     1.0
+     1.0
+     3.0
+     3.0
+     3.0
+     3.0
+     1.0
+    NaN
+    3.0
+Name: count, dtype: float64
+In [5]:
+idx = df.groupby(['Sp', 'Mt'])['count'].transform(max_odd) == df['count']
+idx
+Out[5]:
+     False
+     False
+      True
+      True
+     False
+     False
+      True
+     False
+      True
+      True
+    False
+     True
+Name: count, dtype: bool
 </code>
 ==== level ====
@@ Line 188: / Line 235: @@
 Ref: https://stackoverflow.com/questions/49859182/understanding-level-0-and-group-keys
+==== filter elements from groups that dont satisfy a criterion ====
+tags | pandas groupby filter groups
+<code>
+In [2]:
+import pandas as pd
+df = pd.DataFrame({
+    'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
+    'B' : [1, 2, 3, 4, 5, 6],
+    'C' : [2.0, 5., 8., 1., 2., 9.]})
+df
+Out[2]:
+     A  B    C
+  foo  1  2.0
+  bar  2  5.0
+  foo  3  8.0
+  bar  4  1.0
+  foo  5  2.0
+  bar  6  9.0
+In [3]:
+grouped = df.groupby('A')
+In [4]:
+grouped.filter(lambda x: x['B'].mean() > 3.)
+Out[4]:
+     A  B    C
+  bar  2  5.0
+  bar  4  1.0
+  bar  6  9.0
+</code>
 ==== extract groupby object by key ====
+tags | pandas groupby filter a group
   * groups.get_group(key_value) if grouping on a single column
   * groups.get_group(key_value_tuple) if grouping on multiple columns.
@@ Line 270: / Line 350: @@
   bar  0  6
 </code>
+==== groupby slicing ====
+Consider
+<code>
+In [1]:
+import pandas as pd
+import numpy as np
+rand = np.random.RandomState(1)
+df = pd.DataFrame({'A': ['foo', 'bar'] * 3,
+                   'B': rand.randn(6),
+                   'C': rand.randint(0, 20, 6)})
+In [2]:
+df
+Out[2]:
+     A         B   C
+  foo  1.624345   5
+  bar -0.611756  18
+  foo -0.528172  11
+  bar -1.072969  10
+  foo  0.865408  14
+  bar -2.301539  18
+</code>
+Group by on column 'A'
+<code>
+In [3]:
+gb = df.groupby(['A'])
+</code>
+You can use get_group() to get a single group
+<code>
+In [4]:
+gb.get_group('foo')
+Out[4]:
+     A         B   C
+  foo  1.624345   5
+  foo -0.528172  11
+  foo  0.865408  14
+</code>
+You can select different columns using the groupby slicing:
+<code>
+In [5]:
+gb[['A', 'B']].get_group('foo')
+Out[5]:
+     A         B
+  foo  1.624345
+  foo -0.528172
+  foo  0.865408
+In [6]:
+gb[['C']].get_group('foo')
+Out[6]:
+    C
+   5
+  11
+  14
+</code>
+Ref:
+  * https://stackoverflow.com/questions/14734533/how-to-access-subdataframes-of-pandas-groupby-by-key
 ==== apply a function on each group ====
@@ Line 316: / Line 458: @@
 tags | reset_index remove level_1 column, apply function to multiple columns and rename result, groupby apply name the result, groupby apply remove level_1