Computation of the Loss Distribution in Python

In the Operational Risk Management, given a number/type of risks or/and business line combinations, the quest is all about providing the risk management board with an estimation of the losses the bank (or any other financial institution, hedge-fund, etc.) can suffer from. Hence, they form a loss distribution. If you think for a second, the spectrum of things that might go wrong is wide, e.g. the failure of a computer system, an internal or external fraud, clients, products, business practices, a damage to physical goods, and so on. These ones blended with business lines, e.g. corporate finance, trading and sales, retail banking, commercial banking, payment and settlement, agency services, asset management, or retail brokerage return over 50 combinations of the operational risk factors one needs to consider. Separately and carefully. And it’s a tough one.

Why? A good question “why?”! Simply because of two main reasons. For an operational risk manager the sample of data describing the risk is usually insufficient (statistically speaking: the sample is small over the life period of the financial organ). Secondly, when something goes wrong, the next (of the same kind) event may take place in not-to-distant future or in far-distant future. The biggest problem the operational risk manager meets in his/her daily work regards the prediction of all/any losses due to operational failures. Therefore, the time of the (next) event comes in as an independent variable into that equation: the loss frequency distribution. The second player in the game is: the loss severity distribution, i.e. if the worst strikes, how much the bank/financial body/an investor/a trader might lose?!




From a perspective of a trader we well know that Value-at-Risk (VaR) and the Expected Shortfall are two quantitative risk measures that address similar questions. But from the viewpoint of the operational risk, the estimation of losses requires a different approach.

In this post, after Hull (2015), we present an algorithm in Python for computation of the loss distribution given the best estimation of the loss frequency and loss severity distributions. Though designed for operation risk analysts in mind, in the end we argue its usefulness for any algo-trader and/or portfolio risk manager.

1. Operational Losses: Case Study of the Vanderloo Bank

An access to operational loss data is much much harder than in case of stocks traded in the exchange. They usually stay within the walls of the bank, with an internal access only. A recommended practice for operational risk managers around the world is to share those unique data despite confidentiality. Only in that instance we can build a broader knowledge and understanding of risks and incurred losses due to operational activities.

Let’s consider a case study of a hypothetical Vanderloo Bank. The bank had been found in 1988 in Netherlands and its main line of business was concentrated around building unique customer relationships and loans for small local businesses. Despite a vivid vision and firmly set goals for the future, Vanderloo Bank could not avoid a number of operational roadblocks that led to a substantial operational losses:

Year Month Day Business Line Risk Category Loss ($M)
0 1989.0 1.0 13.0 Trading and Sales Internal Fraud 0.530597
1 1989.0 2.0 9.0 Retail Brokerage Process Failure 0.726702
2 1989.0 4.0 14.0 Trading and Sales System Failure 1.261619
3 1989.0 6.0 11.0 Asset Managment Process Failure 1.642279
4 1989.0 7.0 23.0 Corporate Finance Process Failure 1.094545
5 1990.0 10.0 21.0 Trading and Sales Employment Practices 0.562122
6 1990.0 12.0 24.0 Payment and Settlement Process Failure 4.009160
7 1991.0 8.0 23.0 Asset Managment Business Practices 0.495025
8 1992.0 1.0 28.0 Asset Managment Business Practices 0.857785
9 1992.0 3.0 14.0 Commercial Banking Damage to Assets 1.257536
10 1992.0 5.0 26.0 Retail Banking Internal Fraud 1.591007
11 1992.0 8.0 9.0 Corporate Finance Employment Practices 0.847832
12 1993.0 1.0 11.0 Corporate Finance System Failure 1.314225
13 1993.0 1.0 19.0 Retail Banking Internal Fraud 0.882371
14 1993.0 2.0 24.0 Retail Banking Internal Fraud 1.213686
15 1993.0 6.0 12.0 Commercial Banking System Failure 1.231784
16 1993.0 6.0 16.0 Agency Services Damage to Assets 1.316528
17 1993.0 7.0 11.0 Retail Banking Process Failure 0.834648
18 1993.0 9.0 21.0 Retail Brokerage Process Failure 0.541243
19 1993.0 11.0 11.0 Asset Managment Internal Fraud 1.380636
20 1994.0 11.0 22.0 Retail Banking External Fraud 1.426433
21 1995.0 2.0 14.0 Commercial Banking Process Failure 1.051281
22 1995.0 11.0 21.0 Commercial Banking External Fraud 2.654861
23 1996.0 8.0 17.0 Agency Services Process Failure 0.837237
24 1997.0 7.0 13.0 Retail Brokerage Internal Fraud 1.107019
25 1997.0 7.0 24.0 Agency Services External Fraud 1.513146
26 1997.0 8.0 8.0 Retail Banking Process Failure 1.002040
27 1997.0 9.0 2.0 Agency Services Damage to Assets 0.646596
28 1997.0 9.0 12.0 Retail Banking Employment Practices 0.966086
29 1998.0 1.0 8.0 Retail Banking Internal Fraud 0.938803
30 1998.0 1.0 12.0 Retail Banking System Failure 0.922069
31 1998.0 2.0 5.0 Asset Managment Process Failure 1.042259
32 1998.0 4.0 18.0 Commercial Banking External Fraud 0.969562
33 1998.0 5.0 12.0 Retail Banking External Fraud 0.683715
34 1999.0 1.0 3.0 Trading and Sales Internal Fraud 2.035785
35 1999.0 4.0 27.0 Retail Brokerage Business Practices 1.074277
36 1999.0 5.0 8.0 Retail Banking Employment Practices 0.667655
37 1999.0 7.0 10.0 Agency Services System Failure 0.499982
38 1999.0 7.0 17.0 Retail Brokerage Process Failure 0.803826
39 2000.0 1.0 26.0 Commercial Banking Business Practices 0.714091
40 2000.0 7.0 23.0 Trading and Sales System Failure 1.479367
41 2001.0 6.0 16.0 Retail Brokerage System Failure 1.233686
42 2001.0 11.0 5.0 Agency Services Process Failure 0.926593
43 2002.0 5.0 14.0 Payment and Settlement Damage to Assets 1.321291
44 2002.0 11.0 11.0 Retail Banking External Fraud 1.830254
45 2003.0 1.0 14.0 Corporate Finance System Failure 1.056228
46 2003.0 1.0 28.0 Asset Managment System Failure 1.684986
47 2003.0 2.0 28.0 Commercial Banking Damage to Assets 0.680675
48 2004.0 1.0 11.0 Asset Managment Process Failure 0.559822
49 2004.0 6.0 19.0 Commercial Banking Internal Fraud 1.388681
50 2004.0 7.0 3.0 Retail Banking Internal Fraud 0.886769
51 2004.0 7.0 21.0 Retail Brokerage Employment Practices 0.606049
52 2004.0 7.0 27.0 Asset Managment Employment Practices 1.634348
53 2004.0 11.0 26.0 Asset Managment Damage to Assets 0.983355
54 2005.0 1.0 9.0 Corporate Finance Damage to Assets 0.969710
55 2005.0 9.0 17.0 Commercial Banking System Failure 0.634609
56 2006.0 2.0 24.0 Agency Services Business Practices 0.637760
57 2006.0 3.0 21.0 Retail Banking Employment Practices 1.072489
58 2006.0 6.0 25.0 Payment and Settlement System Failure 0.896459
59 2006.0 12.0 25.0 Trading and Sales Process Failure 0.731953
60 2007.0 6.0 9.0 Commercial Banking System Failure 0.918233
61 2008.0 1.0 5.0 Corporate Finance External Fraud 0.929702
62 2008.0 2.0 14.0 Retail Brokerage System Failure 0.640201
63 2008.0 2.0 14.0 Commercial Banking Internal Fraud 1.580574
64 2008.0 3.0 18.0 Corporate Finance Process Failure 0.731046
65 2009.0 2.0 1.0 Agency Services System Failure 0.630870
66 2009.0 2.0 6.0 Retail Banking External Fraud 0.639761
67 2009.0 4.0 14.0 Payment and Settlement Internal Fraud 1.022987
68 2009.0 5.0 25.0 Retail Banking Business Practices 1.415880
69 2009.0 7.0 8.0 Retail Banking Business Practices 0.906526
70 2009.0 12.0 26.0 Agency Services System Failure 1.463529
71 2010.0 2.0 13.0 Asset Managment Damage to Assets 0.664935
72 2010.0 3.0 24.0 Payment and Settlement Process Failure 1.848318
73 2010.0 10.0 16.0 Commercial Banking External Fraud 1.020736
74 2010.0 12.0 27.0 Retail Banking Employment Practices 1.126265
75 2011.0 2.0 5.0 Retail Brokerage Process Failure 1.549890
76 2011.0 6.0 24.0 Corporate Finance Damage to Assets 2.153238
77 2011.0 11.0 6.0 Asset Managment System Failure 0.601332
78 2011.0 12.0 1.0 Payment and Settlement External Fraud 0.551183
79 2012.0 2.0 21.0 Corporate Finance External Fraud 1.866740
80 2013.0 4.0 22.0 Retail Brokerage External Fraud 0.672756
81 2013.0 6.0 27.0 Payment and Settlement Employment Practices 1.119233
82 2013.0 8.0 17.0 Commercial Banking System Failure 1.034078
83 2014.0 3.0 1.0 Asset Managment Employment Practices 2.099957
84 2014.0 4.0 4.0 Retail Brokerage External Fraud 0.929928
85 2014.0 6.0 5.0 Retail Banking System Failure 1.399936
86 2014.0 11.0 17.0 Asset Managment Process Failure 1.299063
87 2014.0 12.0 3.0 Agency Services System Failure 1.787205
88 2015.0 2.0 2.0 Payment and Settlement System Failure 0.742544
89 2015.0 6.0 23.0 Commercial Banking Employment Practices 2.139426
90 2015.0 7.0 18.0 Trading and Sales System Failure 0.499308
91 2015.0 9.0 9.0 Retail Banking Employment Practices 1.320201
92 2015.0 9.0 18.0 Corporate Finance Business Practices 2.901466
93 2015.0 10.0 21.0 Commercial Banking Internal Fraud 0.808329
94 2016.0 1.0 9.0 Retail Banking Internal Fraud 1.314893
95 2016.0 3.0 28.0 Asset Managment Business Practices 0.702811
96 2016.0 3.0 25.0 Payment and Settlement Internal Fraud 0.840262
97 2016.0 4.0 6.0 Retail Banking Process Failure 0.465896

 

Having a record of 97 events, now we can begin building a statistical picture on loss frequency and loss severity distribution.

2. Loss Frequency Distribution

For loss frequency, the natural probability distribution to use is a Poisson distribution. It assumes that losses happen randomly through time so that in any short period of time $\Delta t$ there is a probability of $\lambda \Delta t$ of a loss occurring. The probability of $n$ losses in time $T$ [years] is:
$$
\mbox{Pr} = \exp{(-\lambda T)} \frac{(\lambda T)^n}{n!}
$$ where the parameter $\lambda$ can be estimated as the average number of losses per year (Hull 2015). Given our table in the Python pandas’ DataFrame format, df, we code:

# Computation of the Loss Distribution not only for Operational Risk Managers
# (c) 2016 QuantAtRisk.com, Pawel Lachowicz

from scipy.stats import lognorm, norm, poisson
from matplotlib  import pyplot as plt
import numpy as np
import pandas as pd

# reading Vanderoo Bank operational loss data
df = pd.read_hdf('vanderloo.h5', 'df')

# count the number of loss events in given year
fre = df.groupby("Year").size()
print(fre)

where the last operation groups and displays the number of losses in each year:

Year
1989.0    5
1990.0    2
1991.0    1
1992.0    4
1993.0    8
1994.0    1
1995.0    2
1996.0    1
1997.0    5
1998.0    5
1999.0    5
2000.0    2
2001.0    2
2002.0    2
2003.0    3
2004.0    6
2005.0    2
2006.0    4
2007.0    1
2008.0    4
2009.0    6
2010.0    4
2011.0    4
2012.0    1
2013.0    3
2014.0    5
2015.0    6
2016.0    4
dtype: int64

The estimation of Poisson’s $\lambda$ requires solely the computation of:

# estimate lambda parameter
lam = np.sum(fre.values) / (df.Year[df.shape[0]-1] - df.Year[0])
print(lam)
3.62962962963

what informs us that during 1989–2016 period, i.e. over the past 27 years, there were $\lambda = 3.6$ losses per year. Assuming Poisson distribution as the best descriptor for loss frequency distribution, we model the probability of operational losses of the Vanderloo Bank in the following way:

# draw random variables from a Poisson distribtion with lambda=lam
prvs = poisson.rvs(lam, size=(10000))

# plot the pdf (loss frequency distribution)
h = plt.hist(prvs, bins=range(0, 11))
plt.close("all")
y = h[0]/np.sum(h[0])
x = h[1]

plt.figure(figsize=(10, 6))
plt.bar(x[:-1], y, width=0.7, align='center', color="#2c97f1")
plt.xlim([-1, 11])
plt.ylim([0, 0.25])
plt.ylabel("Probability", fontsize=12)
plt.title("Loss Frequency Distribution", fontsize=14)
plt.savefig("f01.png")

revealing:
loss distribution

3. Loss Severity Distribution

The data collected in the last column of $df$ allow us to plot and estimate the best fit of the loss severity distribution. In the practice of operational risk mangers, the lognormal distribution is a common choice:

c = .7, .7, .7  # define grey color

plt.figure(figsize=(10, 6))
plt.hist(df["Loss ($M)"], bins=25, color=c, normed=True)
plt.xlabel("Incurred Loss ($M)", fontsize=12)
plt.ylabel("N", fontsize=12)
plt.title("Loss Severity Distribution", fontsize=14)

x = np.arange(0, 5, 0.01)
sig, loc, scale = lognorm.fit(df["Loss ($M)"])
pdf = lognorm.pdf(x, sig, loc=loc, scale=scale)
plt.plot(x, pdf, 'r')
plt.savefig("f02.png")

print(sig, loc, scale)  # lognormal pdf's parameters
0.661153638163 0.328566816132 0.647817560825

where the lognormal distribution probability density function (pdf) we use is given by:
$$
p(x; \sigma, loc, scale) = \frac{1}{x\sigma\sqrt{2\pi}} \exp{ \left[ -\frac{1}{2} \left(\frac{\log{x}}{\sigma} \right)^2 \right] }
$$
where $x = (y – loc)/scale$. The fit of pdf to the data returns:
loss distribution

4. Loss Distribution

The loss frequency distribution must be combined with the loss severity distribution for each risk type/business line combination in order to determine a loss distribution. The most common assumption here is that loss severity is independent of loss frequency. Hull (2015) suggests the following steps to be taken in building the Monte Carlo simulation leading to modelling of the loss distribution:

1. Sample from the frequency distribution to determine the number of loss events ($n$)
2. Sample $n$ times from the loss severity distribution to determine the loss experienced
      for each loss event ($L_1, L_2, …, L_n$)
3. Determine the total loss experienced ($=L_1 + L_2 + … + L_n$)



When many simulation trials are used, we obtain a total distribution for losses of the type being considered. In Python we code those steps in the following way:

def loss(r, loc, sig, scale, lam):
    X = []
    for x in range(11):  # up to 10 loss events considered
        if(r < poisson.cdf(x, lam)):  # x denotes a loss number
            out = 0
        else:
            out = lognorm.rvs(s=sig, loc=loc, scale=scale)
        X.append(out)
    return np.sum(X)  # = L_1 + L_2 + ... + L_n
    

# run 1e5 Monte Carlo simulations
losses = []
for _ in range(100000):
    r = np.random.random()
    losses.append(loss(r, loc, sig, scale, lam))
    

h = plt.hist(losses, bins=range(0, 16))
_ = plt.close("all")
y = h[0]/np.sum(h[0])
x = h[1]

plt.figure(figsize=(10, 6))
plt.bar(x[:-1], y, width=0.7, align='center', color="#ff5a19")
plt.xlim([-1, 16])
plt.ylim([0, 0.20])
plt.title("Modelled Loss Distribution", fontsize=14)
plt.xlabel("Loss ($M)", fontsize=12)
plt.ylabel("Probability of Loss", fontsize=12)
plt.savefig("f03.png")

revealing:
loss distribution

The function of loss has been designed in the way that it considers up to 10 loss events. We run $10^5$ simulations. In each trial, first, we draw a random number r from a uniform distribution. If it is less than a value of Poisson cumulative distribution function (with $\lambda = 3.6$) for x loss number ($x = 0, 1, …, 10$) then we assume a zero loss incurred. Otherwise, we draw a rv from the lognormal distribution (given by its parameters found via fitting procedure a few lines earlier). Simple as that.

The resultant loss distribution as shown in the chart above describes the expected severity of future losses (due to operational “fatal” activities of Vanderloo Bank) given by the corresponding probabilities.

5. Beyond Operational Risk Management

A natural step of the numerical procedure which we have applied here seems to pertain to the modelling of, e.g., the anticipated (predicted) loss distribution for any portfolio of N-assets. One can estimate it based on the track record of losses incurred in trading as up-to-date. By doing so, we gain an additional tool in our arsenal of quantitative risk measures and modelling. Stay tuned as a new post will illustrate that case.

 

Download

     vanderloo.h5

References

    Hull, J., 2015, Risk Management and Financial Institutions, 4th Ed.

Explore Further

Shapley Value Allocation of Operational Risk Capital Charges using Airport Problem Solution
Mining Monero (XMR): Earning Passive Income from your Mac
Operational Risk Overview, Importance, and Examples




Subscribe to QuantAtRisk Newsletter!

Leave a Reply

Your email address will not be published. Required fields are marked *