Fitting loss distributions
From the previous sections, we have discussed the most common distributions in modelling insurance losses and frequency, and the estimation of each distribution. Sometimes, we can estimate various values of parameters via different estimation method. we would like to examine whether the data sample matches with a probability distribution and a method. This is all about the model fitting.
In this section, we want to discuss three ways to test the model, Kolmogorov-Smirnoff (K-S) test, Chi-square goodness-of-0fit test and Akaike Information Criteria (AIC).
Kolmogorov-Smirnoff (K-S) test
The K-S test is an procedure for testing whether the sample outcome is coming from a specific population distribution. The test is based on the empirical distribution function.
The primary idea of the K-S test is to reject the hypothesis if there is a significant difference between the e.d.f from the given sample and hopothesised c.d.f from a paticular population distribution.
if the maximum absolute difference between and the estimated is large
Test Procedures
In order to provide the test procedures, we should start from the c.d.f. We set the null hypothesis and the alternative hypothesis for the population distribution with given parameters.
Let be a random sample, we have the e.d.f as:
where is the number of satisfying .
The Kolmogorov-Smirnoff Test Statistic is given by:
The value of is trying to find the largest distance between the c.d.f and e.d.f. The larger represents the larger numerical discrepancy between the estimated and hypothesised c.d.f.
If we assume that the null hypothesis is true, the d_n should be smaller than a critical value at a significance level . The critical value can be found in the K-S table. Equivalently, the p-value should be larger than if the null hypothesis is true.
Implementation
We will use R package to utilise the K-S test in practical. The following R codes demonstrate the inputs and outputs of the steps. We test the data with the assumption that the sample is drawn from the exponential and lognormal distributions.
# Input data
theft=read.table("theft.txt")
# Collect test data
X=theft[,1]
# Apply K-S test for exponential distribution with parameter
ks.test(theft,"pexp",1/mean(X))
# lognormal distribution
ks.test(theft,"plnorm",6.654601, sqrt(2.291516))
Asymptotic one-sample Kolmogorov-Smirnov test
data: theft
D = 0.2005, p-value = 0.0001291
alternative hypothesis: two-sided
Asymptotic one-sample Kolmogorov-Smirnov test
data: theft
D = 0.087018, p-value = 0.3236
alternative hypothesis: two-sidedFrom the above results, we can see that the p-value for the exponential distribution is 0.0001291, which is less than 0.05. Therefore, we reject the null hypothesis that the sample is drawn from the exponential distribution. On the other hand, the p-value for the lognormal distribution is 0.3236 > 0.05. Therefore, we fail to reject the null hypothesis that the sample is drawn from the lognormal distribution.
The lornormal distribution is a better fit for the data than the exponential distribution.
Limitations
Chi-square goodness-of-fit test
The Chi-square goodness-of-fit test is another statistical test to determine whether the sample data is consistent with a hypothesised distribution. The test is based on the difference between the observed and expected frequencies.
When we have a random sample from a population, we can divide the sample into intervals of the form .
Let be the observed frequency and be the expected frequency for the -th interval based on the hypothesised distribution. Then, we compare the observed and expected frequencies for each interval.
The Chi-square test statistic is given by:
The test statistic follows a Chi-square distribution with degrees of freedom. We can use the Chi-square distribution table to find the critical value at a significance level .
Akaike Information Criteria (AIC)
The last method to test the model is the Akaike Information Criteria (AIC). The AIC is a measure of the goodness of fit of a statistical model. It is based on the likelihood function and the number of parameters in the model.
The AIC is given by:
where is the likelihood function, is the critical value (usually ), and is the number of parameters in the model.