===== Matches to Coplays ===== ==== Problem description ==== Consider the dataframe match_id player_id 0 0 a 1 0 b 2 0 c 3 1 a 4 1 b 5 2 c which shows the players played in a given match. For example, it shows that [a, b, c] played in match 0, [a, b] played in match 1, and c alone played in match 2. We want to get a coplay count player_id1 player_id2 size 0 a a 2 1 a b 2 2 a c 1 3 b a 2 4 b b 2 5 b c 1 6 c a 1 7 c b 1 8 c c 2 which shows * the number of games each player played with another player when player_id1 $ \neq $ player_id2 (ex:- a and b played in 2 games, a and c played in 1 game) * the number of games a player played if player_id1 $ == $ player_id2 ==== Solution ==== Inner merge the initial ''df'' with itself on ''match_id''. Then group by on ''player_1'' and ''player_2''. Aggregate using size() to get the weighted-edges dataframe. $ ipython Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)] IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import pandas as pd a, b, c = 'a', 'b', 'c' df = pd.DataFrame( { 'match_id': [0, 0, 0, 1, 1, 2], 'player_id': [a, b, c, a, b, c], }) print(df) match_id player_id 0 0 a 1 0 b 2 0 c 3 1 a 4 1 b 5 2 c Do an inner merge on itself In [2]: df.merge(df, on='match_id', how='inner') Out[2]: match_id player_id_x player_id_y 0 0 a a 1 0 a b 2 0 a c 3 0 b a 4 0 b b 5 0 b c 6 0 c a 7 0 c b 8 0 c c 9 1 a a 10 1 a b 11 1 b a 12 1 b b 13 2 c c We want the columns to be player_id1, player_id2 instead of player_id_x, player_id_y In [3]: df.merge(df, on='match_id', how='inner', suffixes=('1', '2')) Out[3]: match_id player_id1 player_id2 0 0 a a 1 0 a b 2 0 a c 3 0 b a 4 0 b b 5 0 b c 6 0 c a 7 0 c b 8 0 c c 9 1 a a 10 1 a b 11 1 b a 12 1 b b 13 2 c c Groupby on player_id1, player_id2 and get the size of each group In [9]: df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\ .groupby(['player_id1', 'player_id2'])\ .size() Out[9]: player_id1 player_id2 a a 2 b 2 c 1 b a 2 b 2 c 1 c a 1 b 1 c 2 dtype: int64 We want player_id1 and player_id2 as columns instead of as index. In [11]: df.merge(df, on='match_id', how='inner', suffixes=('1', '2'))\ .groupby(['player_id1', 'player_id2'], as_index=False)\ .size() Out[11]: player_id1 player_id2 size 0 a a 2 1 a b 2 2 a c 1 3 b a 2 4 b b 2 5 b c 1 6 c a 1 7 c b 1 8 c c 2 See also: * https://stackoverflow.com/questions/75537816/transform-a-dataframe-for-network-analysis-using-pandas * It is worth going through this page in its entirety. * I got the answer from here. I just added some intermediate steps to understand what is going on behind the scenes. * It shows some alternative solutions which are worth exploring. * It shows how to get the adjacency matrix * It also shows how to visualize the result with some cool graphs produced by the networkx package.