pandas_series
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
pandas_series [2021/02/04 19:55] – created raju | pandas_series [2024/02/06 05:18] (current) – [return a random element] raju | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ===== creating a series ===== | ||
+ | ==== create a series from a list ==== | ||
+ | < | ||
+ | >>> | ||
+ | >>> | ||
+ | 0 sun | ||
+ | 1 mon | ||
+ | 2 tue | ||
+ | dtype: object | ||
+ | </ | ||
+ | |||
+ | To assign an index | ||
+ | < | ||
+ | >>> | ||
+ | >>> | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | dtype: object | ||
+ | </ | ||
+ | |||
+ | To assign a name to the column | ||
+ | < | ||
+ | >>> | ||
+ | >>> | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | Name: day, dtype: object | ||
+ | </ | ||
+ | |||
+ | To assign a name to the index | ||
+ | < | ||
+ | >>> | ||
+ | >>> | ||
+ | >>> | ||
+ | letter | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | Name: day, dtype: object | ||
+ | </ | ||
+ | |||
+ | Column name is useful when converting the series to dataframe. | ||
+ | < | ||
+ | >>> | ||
+ | 0 | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | |||
+ | >>> | ||
+ | day | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | </ | ||
+ | |||
+ | If the series did not have a name to begin with but we desire to have one while converting to the dataframe | ||
+ | < | ||
+ | >>> | ||
+ | days | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | </ | ||
+ | |||
+ | The index name comes in handy while resetting the index | ||
+ | < | ||
+ | >>> | ||
+ | index day | ||
+ | 0 | ||
+ | 1 | ||
+ | 2 | ||
+ | >>> | ||
+ | letter | ||
+ | 0 s sun | ||
+ | 1 m mon | ||
+ | 2 t tue | ||
+ | </ | ||
+ | |||
+ | ===== dummy ===== | ||
==== append element to series ==== | ==== append element to series ==== | ||
< | < | ||
Line 19: | Line 101: | ||
' | ' | ||
</ | </ | ||
+ | |||
+ | ==== return a random element ==== | ||
+ | Use pandas.Series.sample | ||
+ | |||
+ | Ref:- | ||
+ | * https:// | ||
+ | |||
+ | ==== expand a series ==== | ||
+ | tags | using reindex, change index | ||
+ | |||
+ | Given two series S, I of length n, and an integer N which is >= n, the idea here is to expand S into an N-element vector, E so that E[I[:]] = S[:]. | ||
+ | |||
+ | For example if S is [3.4, 1.8], I is [3, 5] and N is 10, we want E to be [0, 0, 0, 3.4, 0, 1.8, 0, 0, 0, 0] | ||
+ | |||
+ | < | ||
+ | import pandas as pd | ||
+ | import numpy as np | ||
+ | |||
+ | def expand_series(S, | ||
+ | E = pd.Series(S.values, | ||
+ | return E | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | df = pd.DataFrame({' | ||
+ | print(df) | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | | ||
+ | 0 | ||
+ | 1 | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | unravelled_series = expand_series(df[' | ||
+ | print(unravelled_series) | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | id | ||
+ | 0 0.0 | ||
+ | 1 0.0 | ||
+ | 2 0.0 | ||
+ | 3 3.4 | ||
+ | 4 0.0 | ||
+ | 5 1.8 | ||
+ | 6 0.0 | ||
+ | 7 0.0 | ||
+ | 8 0.0 | ||
+ | 9 0.0 | ||
+ | Name: val, dtype: float64 | ||
+ | </ | ||
+ | |||
+ | Sample code: https:// | ||
+ | |||
+ | Ref: | ||
+ | |||
+ | * https:// | ||
+ | * https:// | ||
+ | * https:// | ||
+ | |||
+ | ==== Convert series to a dataframe ==== | ||
+ | |||
+ | Use to_frame(). By default, it will use the series name to set the column name in the dataframe. But you can also assign one while calling the to_frame function. | ||
+ | |||
+ | < | ||
+ | >>> | ||
+ | >>> | ||
+ | >>> | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | dtype: object | ||
+ | >>> | ||
+ | 0 | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | >>> | ||
+ | >>> | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | Name: day, dtype: object | ||
+ | >>> | ||
+ | day | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | >>> | ||
+ | days | ||
+ | s sun | ||
+ | m mon | ||
+ | t tue | ||
+ | </ | ||
+ | ===== check if ===== | ||
+ | ==== check if a series is empty ==== | ||
+ | Use pandas.Series.empty . | ||
+ | |||
+ | < | ||
+ | $ ipython | ||
+ | |||
+ | In [1]: | ||
+ | import pandas as pd | ||
+ | import numpy as np | ||
+ | df1 = pd.DataFrame({' | ||
+ | df1 | ||
+ | Out[1]: | ||
+ | Empty DataFrame | ||
+ | Columns: [A] | ||
+ | Index: [] | ||
+ | |||
+ | In [2]: | ||
+ | df1[' | ||
+ | Out[2]: | ||
+ | True | ||
+ | </ | ||
+ | |||
+ | A series with just NaNs is considered " | ||
+ | < | ||
+ | In [3]: | ||
+ | df2 = pd.DataFrame({' | ||
+ | df2 | ||
+ | Out[3]: | ||
+ | A | ||
+ | 0 NaN | ||
+ | |||
+ | In [4]: | ||
+ | df2[' | ||
+ | Out[4]: | ||
+ | False | ||
+ | |||
+ | In [5]: | ||
+ | df2[' | ||
+ | Out[5]: | ||
+ | True | ||
+ | </ | ||
+ | |||
+ | Used Python 3.9.4 and IPython 7.22.0 | ||
+ | |||
+ | tags | check if a series has at least one element | ||
+ | |||
+ | ==== check if all elements in a series are unique ==== | ||
+ | Use pandas.Series.is_unique | ||
+ | |||
+ | < | ||
+ | In [1]: | ||
+ | import pandas as pd | ||
+ | |||
+ | In [2]: | ||
+ | pd.Series([1, | ||
+ | Out[2]: | ||
+ | True | ||
+ | |||
+ | In [3]: | ||
+ | pd.Series([1, | ||
+ | Out[3]: | ||
+ | False | ||
+ | </ | ||
+ | |||
+ | Missing values are treated as any other value. So if there are multiple NaNs, it will return True. If this is not desired, drop the NaNs first. | ||
+ | < | ||
+ | In [4]: | ||
+ | import numpy as np | ||
+ | pd.Series([1, | ||
+ | Out[4]: | ||
+ | False | ||
+ | |||
+ | In [5]: | ||
+ | pd.Series([1, | ||
+ | Out[5]: | ||
+ | True | ||
+ | </ | ||
+ | |||
+ | For completeness | ||
+ | < | ||
+ | In [6]: | ||
+ | pd.Series([1, | ||
+ | Out[6]: | ||
+ | False | ||
+ | |||
+ | In [7]: | ||
+ | pd.Series([1, | ||
+ | Out[7]: | ||
+ | False | ||
+ | </ | ||
+ | |||
+ | Using | pandas 1.5.3, python 3.11.4, ipython 8.12.0 | ||
+ | |||
+ | Ref:- | ||
+ | * https:// | ||
+ | * https:// | ||
+ |
pandas_series.1612468510.txt.gz · Last modified: 2021/02/04 19:55 by raju