matches_to_coplays
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
matches_to_coplays [2023/02/24 16:37] – created admin | matches_to_coplays [2023/02/24 17:03] (current) – [Solution] admin | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== Matches to Coplays ===== | ===== Matches to Coplays ===== | ||
- | ==== problem | + | ==== Problem |
Consider the dataframe | Consider the dataframe | ||
< | < | ||
Line 29: | Line 29: | ||
* the number of games each player played with another player when player_id1 $ \neq $ player_id2 (ex:- a and b played in 2 games, a and c played in 1 game) | * the number of games each player played with another player when player_id1 $ \neq $ player_id2 (ex:- a and b played in 2 games, a and c played in 1 game) | ||
* the number of games a player played if player_id1 $ == $ player_id2 | * the number of games a player played if player_id1 $ == $ player_id2 | ||
+ | |||
+ | ==== Solution ==== | ||
+ | Inner merge the initial '' | ||
+ | |||
+ | < | ||
+ | $ ipython | ||
+ | Python 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)] | ||
+ | IPython 8.8.0 -- An enhanced Interactive Python. Type '?' | ||
+ | |||
+ | In [1]: | ||
+ | import pandas as pd | ||
+ | |||
+ | a, b, c = ' | ||
+ | |||
+ | df = pd.DataFrame( | ||
+ | { | ||
+ | ' | ||
+ | ' | ||
+ | }) | ||
+ | print(df) | ||
+ | | ||
+ | 0 | ||
+ | 1 | ||
+ | 2 | ||
+ | 3 | ||
+ | 4 | ||
+ | 5 | ||
+ | </ | ||
+ | |||
+ | Do an inner merge on itself | ||
+ | < | ||
+ | In [2]: | ||
+ | df.merge(df, | ||
+ | Out[2]: | ||
+ | match_id player_id_x player_id_y | ||
+ | 0 0 | ||
+ | 1 0 | ||
+ | 2 0 | ||
+ | 3 0 | ||
+ | 4 0 | ||
+ | 5 0 | ||
+ | 6 0 | ||
+ | 7 0 | ||
+ | 8 0 | ||
+ | 9 1 | ||
+ | 10 | ||
+ | 11 | ||
+ | 12 | ||
+ | 13 | ||
+ | </ | ||
+ | |||
+ | We want the columns to be player_id1, player_id2 instead of player_id_x, | ||
+ | < | ||
+ | In [3]: | ||
+ | df.merge(df, | ||
+ | Out[3]: | ||
+ | match_id player_id1 player_id2 | ||
+ | 0 0 a a | ||
+ | 1 0 a b | ||
+ | 2 0 a c | ||
+ | 3 0 b a | ||
+ | 4 0 b b | ||
+ | 5 0 b c | ||
+ | 6 0 c a | ||
+ | 7 0 c b | ||
+ | 8 0 c c | ||
+ | 9 1 a a | ||
+ | 10 | ||
+ | 11 | ||
+ | 12 | ||
+ | 13 | ||
+ | </ | ||
+ | |||
+ | Groupby on player_id1, player_id2 and get the size of each group | ||
+ | < | ||
+ | In [9]: | ||
+ | df.merge(df, | ||
+ | .groupby([' | ||
+ | .size() | ||
+ | Out[9]: | ||
+ | player_id1 | ||
+ | a | ||
+ | b 2 | ||
+ | c 1 | ||
+ | b | ||
+ | b 2 | ||
+ | c 1 | ||
+ | c | ||
+ | b 1 | ||
+ | c 2 | ||
+ | dtype: int64 | ||
+ | </ | ||
+ | |||
+ | We want player_id1 and player_id2 as columns instead of as index. | ||
+ | < | ||
+ | In [11]: | ||
+ | df.merge(df, | ||
+ | .groupby([' | ||
+ | .size() | ||
+ | Out[11]: | ||
+ | player_id1 player_id2 | ||
+ | 0 a a 2 | ||
+ | 1 a b 2 | ||
+ | 2 a c 1 | ||
+ | 3 b a 2 | ||
+ | 4 b b 2 | ||
+ | 5 b c 1 | ||
+ | 6 c a 1 | ||
+ | 7 c b 1 | ||
+ | 8 c c 2 | ||
+ | </ | ||
+ | |||
+ | See also: | ||
+ | * https:// | ||
+ | * It is worth going through this page in its entirety. | ||
+ | * I got the answer from here. I just added some intermediate steps to understand what is going on behind the scenes. | ||
+ | * It shows some alternative solutions which are worth exploring. | ||
+ | * It shows how to get the adjacency matrix | ||
+ | * It also shows how to visualize the result with some cool graphs produced by the networkx package. | ||
matches_to_coplays.1677256670.txt.gz · Last modified: 2023/02/24 16:37 by admin