Re -presentation of Equity Solvency Requirement Requirements (one of my master’s thesis): From bilinear interpolation to probabilistic machine learning | R-Bloggers

Re -presentation of Equity Solvency Requirement Requirements (one of my master’s thesis): From bilinear interpolation to probabilistic machine learning | R-Bloggers

In the world of insurance and financial risk management, calculating the Solvency Capital Direct (SCR) can be a calculation intensive task that can take real -time decision -making or break through. Traditional approaches depend on expensive Monte Carlo simulations that can take hours to complete, forcing practitioners Develop approaching schedules. The development of an approaching schedule was a project In 2007-2009 I tackled for my master’s thesis in actuarial science (see references below).

What I did then

A modern probabilistic approach

Today I visit the same challenge through the lens of Probabilistic Machine LearningAnd Available functional expressions/approaches In R and Python. Fascinating how simple It can look now!

This probabilistic approach offers various benefits:

  • Built -in uncertainty quantification: Not only knows the prediction, but how certain we should be
  • Automatic learning functions: Let the model discover optimal representations
  • Fast

Of course, with a functional probabilistic machine learning model, we can come up with many ways to emphasize these results (ie-IIV analyzes), based on changes in one (or more) of the explanatory variables


References:

(scr_equity <- read.csv("ALIM4D.txt"))
A Data.frame: 96 × 5
TMGPCT_ACTIONSPvl_actionsPBESrc_equity
1.752,002,003.5056471378
1.752,002,009.00 am48531931
1.752,002,009.5048558178
1.752,002,009.7548570523
5.002,002,003.5065111083
5.002,002,009.00 am54433115
5.002,002,009.5054436348
5.002,002,009.7554526734
5.252,002,003.5065244870
5.252,002,009.00 am54325632
5.252,002,009.5054387565
5.252,002,009.7554418533
5.502,002,003.5065396012
5.502,002,009.00 am54239282
5.502,002,009.5054302132
5.502,002,009.7554333018
5.752,002,003.5065581289
5.752,002,009.00 am54168174
5.752,002,009.5054209587
5.752,002,009.7554210481
6.002,002,003.5065785420
6.002,002,009.00 am54042241
6.002,002,009.5054103639
6.002,002,009.7554134241
1.752.752.759.00 am48435808
1.752.752.759.2548446558
1.752.752.759.5048459074
1.752.752.759.7548473874
5.002.752.759.00 am54501129
5.002.752.759.2554531852
5.756.006.009.5053901737
5.756.006.009.7553968463
6.006.006.003.5062378886
6.006.006.009.2553780562
6.006.006.009.5053730182
6.006.006.009.7553950814
1.756.256.253.5051709654
1.756.256.259.2547537722
1.756.256.259.5047543381
1.756.256.259.7547555017
5.006.256.253.5061268505
5.006.256.259.2554207189
5.006.256.259.5054234608
5.006.256.259.7554268573
5.256.256.253.5061467297
5.256.256.259.2554070788
5.256.256.259.5054096598
5.256.256.259.7554154003
5.506.256.253.5061671008
5.506.256.259.2553964700
5.506.256.259.5053994107
5.506.256.259.7554052459
5.756.256.253.5061868864
5.756.256.259.2553862132
5.756.256.259.5053881640
5.756.256.259.7553941964
6.006.256.253.5062112237
6.006.256.259.2553734035
6.006.256.259.5053764618
6.006.256.259.7553825660
scr_equity$SRC_Equity <- scr_equity$SRC_Equity/1e6

options(repos = c(techtonique = "https://r-packages.techtonique.net",
                    CRAN = "https://cloud.r-project.org"))

install.packages(c("rvfl", "learningmachine"))

set.seed(13)
train_idx <- sample(nrow(scr_equity), 0.8 * nrow(scr_equity))
X_train <- as.matrix(scr_equity[train_idx, -ncol(scr_equity)])
X_test <- as.matrix(scr_equity[-train_idx, -ncol(scr_equity)])
y_train <- scr_equity$SRC_Equity[train_idx]
y_test <- scr_equity$SRC_Equity[-train_idx]

obj <- learningmachine::Regressor$new(method = "krr", pi_method = "none")
obj$get_type()
t0 <- proc.time()[3]
obj$fit(X_train, y_train, reg_lambda = 0.1)
cat("Elapsed: ", proc.time()[3] - t0, "s \n")

‘Regression’

Elapsed:  0.005 s 

print(sqrt(mean((obj$predict(X_test) - y_test)^2)))

[1] 0.7250047

obj$summary(X_test, y=y_test, show_progress=TRUE)

  |======================================================================| 100%



$R_squared
[1] 0.9306298

$R_squared_adj
[1] 0.9121311

$Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-1.097222 -0.590318 -0.051308 -0.006375  0.447859  1.660139 

$citests
              estimate      lower       upper      p-value signif
tmg          0.8311760 -0.8484270  2.51077903 3.133161e-01       
pct_actions -0.4845265 -0.9327082 -0.03634475 3.555821e-02      *
pvl_actions -0.4845265 -0.9327082 -0.03634475 3.555821e-02      *
ppe         -2.2492137 -2.4397536 -2.05867385 6.622214e-16    ***

$signif_codes
[1] "Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1"

$effects
── Data Summary ────────────────────────
                           Values 
Name                       effects
Number of rows             20     
Number of columns          4      
_______________________           
Column type frequency:            
  numeric                  4      
________________________          
Group variables            None   

── Variable type: numeric ──────────────────────────────────────────────────────
  skim_variable   mean    sd    p0    p25    p50    p75  p100 hist 

obj <- learningmachine::Regressor$new(method = "rvfl",
                                      nb_hidden = 3L,
                                      pi_method = "kdesplitconformal")

t0 <- proc.time()[3]
obj$fit(X_train, y_train, reg_lambda = 0.01)
cat("Elapsed: ", proc.time()[3] - t0, "s \n")

Elapsed:  0.006 s 

obj$summary(X_test, y=y_test, show_progress=FALSE)

$R_squared
[1] 0.8556358

$R_squared_adj
[1] 0.8171387

$Residuals
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.1720 -1.2977 -0.8132 -0.8003 -0.3254  0.5877 

$Coverage_rate
[1] 100

$citests
              estimate      lower      upper      p-value signif
tmg          179.13631  162.48868  195.78394 3.639163e-15    ***
pct_actions  -73.14222  -89.12337  -57.16108 1.046939e-08    ***
pvl_actions   62.46782   46.48668   78.44896 1.199526e-07    ***
ppe         -125.26721 -144.19952 -106.33490 2.223349e-11    ***

$signif_codes
[1] "Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1"

$effects
── Data Summary ────────────────────────
                           Values 
Name                       effects
Number of rows             20     
Number of columns          4      
_______________________           
Column type frequency:            
  numeric                  4      
________________________          
Group variables            None   

── Variable type: numeric ──────────────────────────────────────────────────────
  skim_variable   mean   sd     p0    p25    p50   p75  p100 hist     

obj$set_level(95)
res <- obj$predict(X = X_test)

plot(c(y_train, res$preds), type="l",
     main="(Probabilistic) Out-of-sample \n Equity Capital Requirement in m€",
     xlab="Observation Index",
     ylab="Equity Capital Requirement (m€)",
     ylim = c(min(c(res$upper, res$lower, y_test, y_train)),
              max(c(res$upper, res$lower, y_test, y_train))))
lines(c(y_train, res$upper), col="gray70")
lines(c(y_train, res$lower), col="gray70")
lines(c(y_train, res$preds), col = "red")
lines(c(y_train, y_test), col = "blue", lwd=2)
abline(v = length(y_train), lty=2, col="black", lwd=2)

100*mean((y_test >= as.numeric(res$lower)) * (y_test <= as.numeric(res$upper)))

100

!pip install skimpy

!pip install ydata-profiling

!pip install nnetsauce

import pandas as pd
from skimpy import skim
from ydata_profiling import ProfileReport

scr_equity = pd.read_csv("ALIM4D.csv")
scr_equity['SRC_Equity'] = scr_equity['SRC_Equity']/1e6

skim(scr_equity)

ProfileReport(scr_equity)

import nnetsauce as ns
import numpy as np

X, y = scr_equity.drop('SRC_Equity', axis=1), scr_equity['SRC_Equity'].values


from sklearn.utils import all_estimators
from tqdm import tqdm
from sklearn.utils.multiclass import type_of_target
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from time import time

# Get all scikit-learn regressors
estimators = all_estimators(type_filter="regressor")

results_regressors = []

seeds = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for i, (name, RegressorClass) in tqdm(enumerate(estimators)):

    if name in ['MultiOutputRegressor', 'MultiOutputClassifier', 'StackingRegressor', 'StackingClassifier',
                'VotingRegressor', 'VotingClassifier', 'TransformedTargetRegressor', 'RegressorChain',
                'GradientBoostingRegressor', 'HistGradientBoostingRegressor', 'RandomForestRegressor',
                'ExtraTreesRegressor', 'MLPRegressor']:

        continue

    for seed in seeds:

        try:

          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                              random_state=42+seed*1000)
          regr = ns.PredictionInterval(obj=ns.CustomRegressor(RegressorClass()),
                                       method="splitconformal",
                                       level=95,
                                       seed=312)
          start = time()
          regr.fit(X_train, y_train)
          print(f"Elapsed: {time() - start}s")
          preds = regr.predict(X_test, return_pi=True)
          coverage_rate = np.mean((preds.lower<=y_test)*(preds.upper>=y_test))
          rmse = np.sqrt(np.mean((preds-y_test)**2))
          results_regressors.append([name, seed, coverage_rate, rmse])

        except:

          continue

results_df = pd.DataFrame(results_regressors, columns=['Regressor', 'Seed', 'Coverage Rate', 'RMSE'])
results_df.sort_values(by='Coverage Rate', ascending=False)

results_df.dropna(inplace=True)

results_df['logRMSE'] = np.log(results_df['RMSE'])


#presentation #Equity #Solvency #Requirement #Requirements #masters #thesis #bilinear #interpolation #probabilistic #machine #learning #RBloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *