Loss Curve Modelling

Mick Cooney mickcooney@gmail.com

2018-07-17

Introduction

NAIC Schedule P Dataset

## Observations: 77,900
## Variables: 14
## $ grcode          <chr> "266", "266", "266", "266", "266", "266", "266", "2...
## $ grname          <chr> "Public Underwriters Grp", "Public Underwriters Grp...
## $ accidentyear    <int> 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 198...
## $ developmentyear <int> 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 199...
## $ developmentlag  <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7,...
## $ incurloss       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 24, 21, 24, 25, 2...
## $ cumpaidloss     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 20, 21, 23, 24, 24...
## $ bulkloss        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 1, 0, 0, 0, 0, 0, ...
## $ earnedpremdir   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 2...
## $ earnedpremceded <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ earnedpremnet   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 2...
## $ single          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ postedreserve97 <int> 932, 932, 932, 932, 932, 932, 932, 932, 932, 932, 9...
## $ lob             <chr> "comauto", "comauto", "comauto", "comauto", "comaut...
## Observations: 77,900
## Variables: 10
## $ grcode     <chr> "266", "266", "266", "266", "266", "266", "266", "266", ...
## $ grname     <chr> "Public Underwriters Grp", "Public Underwriters Grp", "P...
## $ lob        <chr> "comauto", "comauto", "comauto", "comauto", "comauto", "...
## $ acc_year   <chr> "1988", "1988", "1988", "1988", "1988", "1988", "1988", ...
## $ dev_year   <int> 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 19...
## $ dev_lag    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9...
## $ premium    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25...
## $ cum_loss   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 20, 21, 23, 24, 24, 24,...
## $ loss_ratio <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 0.2400...
## $ dev_factor <dbl> NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 0.2500...

Grid Approximation

2D Grid

Specifying the Model

Core Concept


\[ \text{Loss}(t) = \text{Premium} \times \text{ULR} \times \text{GF}(t) \]

Full Specification


\[ \text{Loss}(Y, t) \sim \text{Normal}(\mu(Y, t), \, \sigma_Y) \]

where

\[\begin{eqnarray*} \mu(Y, t) &=& \text{Premium}(Y) \times \text{LR}(Y) \times \text{GF}(t) \\ \text{GF}(t) &=& \text{growth function of } t \\ \sigma_Y &=& \text{Premium}(Y) \times \sigma \\ \text{LR}_Y &\sim& \text{Lognormal}(\mu_{\text{LR}}, \sigma_{\text{LR}}) \\ \mu_{\text{LR}} &\sim& \text{Normal}(0, 0.5) \end{eqnarray*}\]

Importance of the Functional Form


Does the choice of function matter?

Not much difference…

(probably)

Building the Stan Model

Getting Started

functions {
    real growth_factor_weibull(real t, real omega, real theta) {
        return 1 - exp(-(t/theta)^omega);
    }

    real growth_factor_loglogistic(real t, real omega, real theta) {
        real pow_t_omega = t^omega;
        return pow_t_omega / (pow_t_omega + theta^omega);
    }
}
data {
    int<lower=0,upper=1> growthmodel_id;

    int n_data;
    int n_time;
    int n_cohort;

    int cohort_id[n_data];
    int t_idx[n_data];

    int cohort_maxtime[n_cohort];

    vector<lower=0>[n_time] t_value;

    vector[n_cohort] premium;
    vector[n_data]   loss;
}
parameters {
    real<lower=0> omega;
    real<lower=0> theta;

    vector<lower=0>[n_cohort] LR;

    real mu_LR;
    real<lower=0> sd_LR;

    real<lower=0> loss_sd;
}
transformed parameters {
    vector[n_time] gf;
    vector[n_data] lm;

    for(i in 1:n_time) {
        gf[i] = growthmodel_id == 1 ?
            growth_factor_weibull    (t_value[i], omega, theta) :
            growth_factor_loglogistic(t_value[i], omega, theta);
    }

    for (i in 1:n_data) {
        lm[i] = LR[cohort_id[i]] * premium[cohort_id[i]] * gf[t_idx[i]];
    }
}
model {
    mu_LR ~ normal(0, 0.5);
    sd_LR ~ lognormal(0, 0.5);

    LR ~ lognormal(mu_LR, sd_LR);

    loss_sd ~ lognormal(0, 0.7);

    omega ~ lognormal(0, 0.5);
    theta ~ lognormal(0, 0.5);

    loss ~ normal(lm, (loss_sd * premium)[cohort_id]);
}

Models Fits and Diagnostics

Model Convergence