Differences

This shows you the differences between two versions of the page.

--- matches_to_coplays [2023/02/24 16:37] – created admin
+++ matches_to_coplays [2023/02/24 17:03] (current) – [Solution] admin
@@ Line 1: / Line 1: @@
 ===== Matches to Coplays =====
-==== problem description ====
+==== Problem description ====
 Consider the dataframe
 <code>
@@ Line 29: / Line 29: @@
   * the number of games each player played with another player when player_id1 $ \neq $ player_id2 (ex:- a and b played in 2 games, a and c played in 1 game)
   * the number of games a player played if player_id1 $ == $ player_id2
+==== Solution ====
+Inner merge the initial ''df'' with itself on ''match_id''. Then group by on ''player_1'' and ''player_2''. Aggregate using size() to get the weighted-edges dataframe.
+<code>
+$ ipython
+Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
+IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
+In [1]:
+import pandas as pd
+a, b, c = 'a', 'b', 'c'
+df = pd.DataFrame(
+{
+    'match_id':  [0, 0, 0, 1, 1, 2],
+    'player_id': [a, b, c, a, b, c],
+})
+print(df)
+   match_id player_id
+         0         a
+         0         b
+         0         c
+         1         a
+         1         b
+         2         c
+</code>
+Do an inner merge on itself
+<code>
+In [2]:
+df.merge(df, on='match_id', how='inner')
+Out[2]:
+    match_id player_id_x player_id_y
+          0           a           a
+          0           a           b
+          0           a           c
+          0           b           a
+          0           b           b
+          0           b           c
+          0           c           a
+          0           c           b
+          0           c           c
+          1           a           a
+         1           a           b
+         1           b           a
+         1           b           b
+         2           c           c
+</code>
+We want the columns to be player_id1, player_id2 instead of player_id_x, player_id_y
+<code>
+In [3]:
+df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))
+Out[3]:
+    match_id player_id1 player_id2
+          0          a          a
+          0          a          b
+          0          a          c
+          0          b          a
+          0          b          b
+          0          b          c
+          0          c          a
+          0          c          b
+          0          c          c
+          1          a          a
+         1          a          b
+         1          b          a
+         1          b          b
+         2          c          c
+</code>
+Groupby on player_id1, player_id2 and get the size of each group
+<code>
+In [9]:
+df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\
+.groupby(['player_id1', 'player_id2'])\
+.size()
+Out[9]:
+player_id1  player_id2
+a           a             2
+            b             2
+            c             1
+b           a             2
+            b             2
+            c             1
+c           a             1
+            b             1
+            c             2
+dtype: int64
+</code>
+We want player_id1 and player_id2 as columns instead of as index.
+<code>
+In [11]:
+df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\
+.groupby(['player_id1', 'player_id2'], as_index=False)\
+.size()
+Out[11]:
+  player_id1 player_id2  size
+          a          a     2
+          a          b     2
+          a          c     1
+          b          a     2
+          b          b     2
+          b          c     1
+          c          a     1
+          c          b     1
+          c          c     2
+</code>
+See also:
+  * https://stackoverflow.com/questions/75537816/transform-a-dataframe-for-network-analysis-using-pandas
+    * It is worth going through this page in its entirety.
+    * I got the answer from here. I just added some intermediate steps to understand what is going on behind the scenes.
+    * It shows some alternative solutions which are worth exploring.
+    * It shows how to get the adjacency matrix
+    * It also shows how to visualize the result with some cool graphs produced by the networkx package.