There are various Python projects which require sorting but not the ones that employ a default alphanumeric functionality. We talk about manually specified order or human-order, in short. One of such examples is the case study presented below.
Imagine that your risk system provides you with a list of various risk factors and the corresponding risk tenors. The latter may differ among risk factors and your task is to quickly convert an input data row into a matrix of tenor-series assuming that all your tenors follow a human-order defined as:
1W, 2W, 3W, 1M, 2M, 3M, …, 11M, 1Y, 2Y, 3Y, …, 49Y, 50Y
Let’s analyse an exemplary input dataset:
# Employing Human-Order in pandas DataFrame Sorting: Risk Factors and Tenors # (c) 2019 by QuantAtRisk.com import numpy as np import pandas as pd data = pd.read_csv('Libo3.csv', header=0) display(data)
The trouble with standard sorting order is that usually it will end up with 10Y placed before 1Y or week-tenors (W) will follow monthly tenors since M appears before W, and… we don’t want the outcome to be that way. We can see that if we attempt to perform:
for risk_factor in data.Risk_Factor.unique(): print(risk_factor) lst = data[data.Risk_Factor == risk_factor].Tenor.tolist() print(sorted(lst)) print()
Libo3.GBP.ON.SWAP ['10M', '1M', '1W', '2Y', '30Y', '3W', '5M', '7M'] Libo3.EUR.ON.SWAP ['10M', '1M', '1W', '21Y', '30Y', '3W', '4M', '7M']
Solution
First we need to specify a human-order sorting key:
ho = ['ON'] # overnight ho.extend([str(i) + 'W' for i in range(1,4)]) # weekly tenors ho.extend([str(i) + 'M' for i in range(1,12)]) # monthly tenors ho.extend([str(i) + 'Y' for i in range(1,51)]) # yearly tenors
what will store:
print(ho)
['ON', '1W', '2W', '3W', '1M', '2M', '3M', '4M', '5M', '6M', '7M', '8M', '9M', '10M', '11M', '1Y', '2Y', '3Y', '4Y', '5Y', '6Y', '7Y', '8Y', '9Y', '10Y', '11Y', '12Y', '13Y', '14Y', '15Y', '16Y', '17Y', '18Y', '19Y', '20Y', '21Y', '22Y', '23Y', '24Y', '25Y', '26Y', '27Y', '28Y', '29Y', '30Y', '31Y', '32Y', '33Y', '34Y', '35Y', '36Y', '37Y', '38Y', '39Y', '40Y', '41Y', '42Y', '43Y', '44Y', '45Y', '46Y', '47Y', '48Y', '49Y', '50Y']
Next, we need to set Tenors as “category” type (line #23-24) and sort values according to requested order (line #25). Have a look:
data.Tenor = data.Tenor.astype('category') data.Tenor.cat.set_categories(ho, inplace=True) data = data.sort_values(["Risk_Factor", "Tenor"]) data.reset_index(inplace=True, drop=True) display(data) # let's inspect the result
That accomplishes our task. From now on, you can manipulate the data in any way, for example:
fi = True # first iteration for risk_factor in data.Risk_Factor.unique(): tmp = data[data.Risk_Factor == risk_factor] tmp.set_index('Tenor', drop=True, inplace=True) tmp = tmp.rename(columns={tmp.columns[1] : risk_factor}) del tmp[tmp.columns[0]] if fi: df = tmp fi = False else: df = df.join(tmp, how='outer', sort=True) # applying 'sort' keyword # here preserves a correct # human-order display(df)
will allow you for every risk factor to join the data using all available Tenors set as a common index:
Disclaimer: The input data were randomly generated.