User Tools

Site Tools


pandas_series

creating a series

create a series from a list

>>> a = pd.Series(['sun', 'mon', 'tue'])
>>> a
0    sun
1    mon
2    tue
dtype: object

To assign an index

>>> b = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'])
>>> b
s    sun
m    mon
t    tue
dtype: object

To assign a name to the column

>>> c = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'], name='day')
>>> c
s    sun
m    mon
t    tue
Name: day, dtype: object

To assign a name to the index

>>> d = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'], name='day')
>>> d.index.name = 'letter'
>>> d
letter
s    sun
m    mon
t    tue
Name: day, dtype: object

Column name is useful when converting the series to dataframe.

>>> b.to_frame()
     0
s  sun
m  mon
t  tue

>>> c.to_frame()
   day
s  sun
m  mon
t  tue

If the series did not have a name to begin with but we desire to have one while converting to the dataframe

>>> b.to_frame(name='days')
  days
s  sun
m  mon
t  tue

The index name comes in handy while resetting the index

>>> c.reset_index()
  index  day
0     s  sun
1     m  mon
2     t  tue
>>> d.reset_index()
  letter  day
0      s  sun
1      m  mon
2      t  tue

dummy

append element to series

In [1]:
import pandas as pd
s = pd.Series(dtype='int')
N = 4
for i in range(N):
    s.at[i**2] = i
print(s)
0    0
1    1
4    2
9    3
dtype: int64

In [2]:
pd.__version__
Out[2]:
'1.2.1'

return a random element

expand a series

tags | using reindex, change index

Given two series S, I of length n, and an integer N which is >= n, the idea here is to expand S into an N-element vector, E so that E[I[:]] = S[:].

For example if S is [3.4, 1.8], I is [3, 5] and N is 10, we want E to be [0, 0, 0, 3.4, 0, 1.8, 0, 0, 0, 0]

import pandas as pd
import numpy as np

def expand_series(S, I, N, id='val'):
    E = pd.Series(S.values, index=I, name=id).reindex(np.arange(0, N)).fillna(0)
    return E
df = pd.DataFrame({'id': [3,5], 'val': [3.4, 1.8]})
print(df)
   id  val
0   3  3.4
1   5  1.8
unravelled_series = expand_series(df['val'], df['id'], 10)
print(unravelled_series)
id
0    0.0
1    0.0
2    0.0
3    3.4
4    0.0
5    1.8
6    0.0
7    0.0
8    0.0
9    0.0
Name: val, dtype: float64

Sample code: https://github.com/KamarajuKusumanchi/notebooks/blob/master/pandas/expand%20a%20series.ipynb

Ref:

Convert series to a dataframe

Use to_frame(). By default, it will use the series name to set the column name in the dataframe. But you can also assign one while calling the to_frame function.

>>> import pandas as pd
>>> b = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'])
>>> b
s    sun
m    mon
t    tue
dtype: object
>>> b.to_frame()
     0
s  sun
m  mon
t  tue
>>> c = pd.Series(['sun', 'mon', 'tue'], index=['s', 'm', 't'], name='day')
>>> c
s    sun
m    mon
t    tue
Name: day, dtype: object
>>> c.to_frame()
   day
s  sun
m  mon
t  tue
>>> b.to_frame(name='days')
  days
s  sun
m  mon
t  tue

check if

check if a series is empty

Use pandas.Series.empty .

$ ipython

In [1]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A':  []})
df1
Out[1]:
Empty DataFrame
Columns: [A]
Index: []

In [2]:
df1['A'].empty
Out[2]:
True

A series with just NaNs is considered “non-empty”. Drop the NaNs to make it “empty”.

In [3]:
df2 = pd.DataFrame({'A':  [np.nan]})
df2
Out[3]:
    A
0 NaN

In [4]:
df2['A'].empty
Out[4]:
False

In [5]:
df2['A'].dropna().empty
Out[5]:
True

Used Python 3.9.4 and IPython 7.22.0

tags | check if a series has at least one element

check if all elements in a series are unique

Use pandas.Series.is_unique

In [1]: 
import pandas as pd

In [2]: 
pd.Series([1, 2, 3]).is_unique
Out[2]: 
True

In [3]: 
pd.Series([1, 2, 2]).is_unique
Out[3]: 
False

Missing values are treated as any other value. So if there are multiple NaNs, it will return True. If this is not desired, drop the NaNs first.

In [4]: 
import numpy as np
pd.Series([1, 2, 3, np.nan, np.nan]).is_unique
Out[4]: 
False

In [5]: 
pd.Series([1, 2, 3, np.nan, np.nan]).dropna().is_unique
Out[5]: 
True

For completeness

In [6]: 
pd.Series([1, 2, 2, np.nan, np.nan]).is_unique
Out[6]: 
False

In [7]: 
pd.Series([1, 2, 2, np.nan, np.nan]).dropna().is_unique
Out[7]: 
False

Using | pandas 1.5.3, python 3.11.4, ipython 8.12.0

Ref:-

pandas_series.txt · Last modified: 2024/02/06 05:18 by raju