Employing Human Order in pandas DataFrame Sorting: Risk Factors and Tenors

There are various Python projects which require sorting but not the ones that employ a default alphanumeric functionality. We talk about manually specified order or human-order, in short. One of such examples is the case study presented below.

Imagine that your risk system provides you with a list of various risk factors and the corresponding risk tenors. The latter may differ among risk factors and your task is to quickly convert an input data row into a matrix of tenor-series assuming that all your tenors follow a human-order defined as:


1W, 2W, 3W, 1M, 2M, 3M, …, 11M, 1Y, 2Y, 3Y, …, 49Y, 50Y




Let’s analyse an exemplary input dataset:

# Employing Human-Order in pandas DataFrame Sorting: Risk Factors and Tenors
# (c) 2019 by QuantAtRisk.com

import numpy as np
import pandas as pd

data = pd.read_csv('Libo3.csv', header=0)
display(data)

human order sorting pandas pd dataframe

The trouble with standard sorting order is that usually it will end up with 10Y placed before 1Y or week-tenors (W) will follow monthly tenors since M appears before W, and… we don’t want the outcome to be that way. We can see that if we attempt to perform:

for risk_factor in data.Risk_Factor.unique():
    print(risk_factor)
    lst = data[data.Risk_Factor == risk_factor].Tenor.tolist()
    print(sorted(lst))
    print()
Libo3.GBP.ON.SWAP
['10M', '1M', '1W', '2Y', '30Y', '3W', '5M', '7M']

Libo3.EUR.ON.SWAP
['10M', '1M', '1W', '21Y', '30Y', '3W', '4M', '7M']

Solution

First we need to specify a human-order sorting key:

ho = ['ON']  # overnight
ho.extend([str(i) + 'W' for i in range(1,4)])  # weekly tenors
ho.extend([str(i) + 'M' for i in range(1,12)])  # monthly tenors
ho.extend([str(i) + 'Y' for i in range(1,51)])  # yearly tenors

what will store:

print(ho)
['ON', '1W', '2W', '3W', '1M', '2M', '3M', '4M', '5M', '6M', '7M', '8M', '9M', '10M', 
'11M', '1Y', '2Y', '3Y', '4Y', '5Y', '6Y', '7Y', '8Y', '9Y', '10Y', '11Y', '12Y', 
'13Y', '14Y', '15Y', '16Y', '17Y', '18Y', '19Y', '20Y', '21Y', '22Y', '23Y', '24Y', 
'25Y', '26Y', '27Y', '28Y', '29Y', '30Y', '31Y', '32Y', '33Y', '34Y', '35Y', '36Y', 
'37Y', '38Y', '39Y', '40Y', '41Y', '42Y', '43Y', '44Y', '45Y', '46Y', '47Y', '48Y', 
'49Y', '50Y']

Next, we need to set Tenors as “category” type (line #23-24) and sort values according to requested order (line #25). Have a look:

data.Tenor = data.Tenor.astype('category')
data.Tenor.cat.set_categories(ho, inplace=True)
data = data.sort_values(["Risk_Factor", "Tenor"])
data.reset_index(inplace=True, drop=True)

display(data)  # let's inspect the result

human order sorting pandas pd dataframe
That accomplishes our task. From now on, you can manipulate the data in any way, for example:

fi = True  # first iteration

for risk_factor in data.Risk_Factor.unique():
    tmp = data[data.Risk_Factor == risk_factor]
    tmp.set_index('Tenor', drop=True, inplace=True)
    tmp = tmp.rename(columns={tmp.columns[1] : risk_factor})
    del tmp[tmp.columns[0]]
    if fi:
        df = tmp
        fi = False
    else:
        df = df.join(tmp, how='outer', sort=True)  # applying 'sort' keyword
                                                   # here preserves a correct 
                                                   # human-order
display(df)

will allow you for every risk factor to join the data using all available Tenors set as a common index:
human order sorting pandas pd dataframe

Disclaimer: The input data were randomly generated.




Leave a Reply

Your email address will not be published. Required fields are marked *