===== creating a series =====
==== create a series from a list ====
>>> a = pd.Series(['sun', 'mon', 'tue'])
>>> a
0 sun
1 mon
2 tue
dtype: object
To assign an index
>>> b = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'])
>>> b
s sun
m mon
t tue
dtype: object
To assign a name to the column
>>> c = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'], name='day')
>>> c
s sun
m mon
t tue
Name: day, dtype: object
To assign a name to the index
>>> d = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'], name='day')
>>> d.index.name = 'letter'
>>> d
letter
s sun
m mon
t tue
Name: day, dtype: object
Column name is useful when converting the series to dataframe.
>>> b.to_frame()
0
s sun
m mon
t tue
>>> c.to_frame()
day
s sun
m mon
t tue
If the series did not have a name to begin with but we desire to have one while converting to the dataframe
>>> b.to_frame(name='days')
days
s sun
m mon
t tue
The index name comes in handy while resetting the index
>>> c.reset_index()
index day
0 s sun
1 m mon
2 t tue
>>> d.reset_index()
letter day
0 s sun
1 m mon
2 t tue
===== dummy =====
==== append element to series ====
In [1]:
import pandas as pd
s = pd.Series(dtype='int')
N = 4
for i in range(N):
s.at[i**2] = i
print(s)
0 0
1 1
4 2
9 3
dtype: int64
In [2]:
pd.__version__
Out[2]:
'1.2.1'
==== return a random element ====
Use pandas.Series.sample
Ref:-
* https://pandas.pydata.org/docs/reference/api/pandas.Series.sample.html
==== expand a series ====
tags | using reindex, change index
Given two series S, I of length n, and an integer N which is >= n, the idea here is to expand S into an N-element vector, E so that E[I[:]] = S[:].
For example if S is [3.4, 1.8], I is [3, 5] and N is 10, we want E to be [0, 0, 0, 3.4, 0, 1.8, 0, 0, 0, 0]
import pandas as pd
import numpy as np
def expand_series(S, I, N, id='val'):
E = pd.Series(S.values, index=I, name=id).reindex(np.arange(0, N)).fillna(0)
return E
df = pd.DataFrame({'id': [3,5], 'val': [3.4, 1.8]})
print(df)
id val
0 3 3.4
1 5 1.8
unravelled_series = expand_series(df['val'], df['id'], 10)
print(unravelled_series)
id
0 0.0
1 0.0
2 0.0
3 3.4
4 0.0
5 1.8
6 0.0
7 0.0
8 0.0
9 0.0
Name: val, dtype: float64
Sample code: https://github.com/KamarajuKusumanchi/notebooks/blob/master/pandas/expand%20a%20series.ipynb
Ref:
* https://stackoverflow.com/questions/40029071/setting-series-as-index
* https://chrisalbon.com/python/data_wrangling/pandas_dataframe_reindexing/ - contains some examples on using pandas.Series.reindex
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html - API
==== Convert series to a dataframe ====
Use to_frame(). By default, it will use the series name to set the column name in the dataframe. But you can also assign one while calling the to_frame function.
>>> import pandas as pd
>>> b = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'])
>>> b
s sun
m mon
t tue
dtype: object
>>> b.to_frame()
0
s sun
m mon
t tue
>>> c = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'], name='day')
>>> c
s sun
m mon
t tue
Name: day, dtype: object
>>> c.to_frame()
day
s sun
m mon
t tue
>>> b.to_frame(name='days')
days
s sun
m mon
t tue
===== check if =====
==== check if a series is empty ====
Use pandas.Series.empty .
$ ipython
In [1]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': []})
df1
Out[1]:
Empty DataFrame
Columns: [A]
Index: []
In [2]:
df1['A'].empty
Out[2]:
True
A series with just NaNs is considered "non-empty". Drop the NaNs to make it "empty".
In [3]:
df2 = pd.DataFrame({'A': [np.nan]})
df2
Out[3]:
A
0 NaN
In [4]:
df2['A'].empty
Out[4]:
False
In [5]:
df2['A'].dropna().empty
Out[5]:
True
Used Python 3.9.4 and IPython 7.22.0
tags | check if a series has at least one element
==== check if all elements in a series are unique ====
Use pandas.Series.is_unique
In [1]:
import pandas as pd
In [2]:
pd.Series([1, 2, 3]).is_unique
Out[2]:
True
In [3]:
pd.Series([1, 2, 2]).is_unique
Out[3]:
False
Missing values are treated as any other value. So if there are multiple NaNs, it will return True. If this is not desired, drop the NaNs first.
In [4]:
import numpy as np
pd.Series([1, 2, 3, np.nan, np.nan]).is_unique
Out[4]:
False
In [5]:
pd.Series([1, 2, 3, np.nan, np.nan]).dropna().is_unique
Out[5]:
True
For completeness
In [6]:
pd.Series([1, 2, 2, np.nan, np.nan]).is_unique
Out[6]:
False
In [7]:
pd.Series([1, 2, 2, np.nan, np.nan]).dropna().is_unique
Out[7]:
False
Using | pandas 1.5.3, python 3.11.4, ipython 8.12.0
Ref:-
* https://pandas.pydata.org/docs/reference/api/pandas.Series.is_unique.html
* https://stackoverflow.com/questions/48838247/how-to-check-every-pandas-series-value-is-unique