Task8
(a) cost-complexity pruning algorithm
Pruning is a technique used to reduce the complexity of a decision tree and protect against overfitting. This process is repeated for each remaining split until further pruning would result in decreased model accuracy.
https://www.youtube.com/watch?v=D0efHEJsfHo
(b) Choosing a complexity parameter based on cross-validation results.
Choosing the value that results in the minimum corss-validation error.
Employing the one standard-error rule. This approachproposes using the complexity parameter for the smallest model within one standard-error of the minimum cross-validation error.
Task9
(a) Explain the difference between accuracy and AUC in terms of overall model assessment.
Accuracy is measured by the ratio of correct number of predictions to total number of predictions made.
AUC measures the area under the ROC curve. It assesses the overall model performance by measuring how true positive rate and false positive rate trade off across a range of possible classification thresholds.
AUC measures performance across the full range of thresholds while accuracy measures performance only at the selected threshold.
(b) Explain why the ROC curve always goes through (0,0) and (1,1)
(0,0) : true positive rate(Sensitivity) is zero, and true negative rate(Specificity) is 1.
(1,1) : true positive rate(Sensitivity) is 1, and true negative rate(Specificity) is zero. everything is classified as postiive.
https://www.youtube.com/watch?v=4jRBRDbJemM
(c) Gradient boosting machine tree model, Explain why model performance deteriorates as the number of trees increases.

A GBM iteratively builds trees fit to the residuals of prior trees. Depending on the hyperparameters, this model can produce a very complex model, which is susceptible to overfitting to patterns in the training data.
AUC on the testing data starts to drop, which indecates the model is overfit to the training data.
https://www.youtube.com/watch?v=LsK-xG1cLYA
https://www.youtube.com/watch?v=3CC4N4z3GJc
https://www.youtube.com/watch?v=2xudPOBz-vs
잘 이해 안된다. 틈틈히 다시보자.
(d) Describe two hyperparameters to improve model performance.
Early stopping : Early stopping criteria, such as improvement of the performance metrics in each subsequent tree, can stop training when it detects the improvement is marginal. This avoids overfitting.
Controlling learning rate : Learning rate controls the impact of subsequent trees to the overall model outcome. This reduces the extent to which a single tree is able to influence the model fitting process.
(e) How to tune a hyperparameter.
Tuning a hyperparamether requires first varying the hyperparameter across a range of possible values and performing cross validation at each value. Performance is then determined based on a cross-validation performance metric, for example AUC, and the hyperparameter value with best performance based on this metric is selected.
Task11
(d) How changing the link function in the GLM impacts the model fitting and how this can impact predictor significance.
The link function specifies a functional relationship between the linear predictor and the mean of the distribution of the outcome conditional on the predictor variables. Different link functions have different shapes and can therefore fit to different nolinear relationships between the predictors and the target variable.
When the link function matches the relationship of a predictor variable, the mean of the outcome distribution (the prediction) will generally be close to the actual values for the target variable, resulting in smaller residuals and more significant p-values.
Task12
(a) Proxy variable
Proxy variables are variables that are used in place of other information, usually because the desired information is either impossible or impractical to measure. For a variable to be a good proxy it must have a close relationship with the variable of interest.
'SOA > PA' 카테고리의 다른 글
| SOA/ASA/PA 기출 및 내용정리 - 21.12.13시험(기록용) (0) | 2024.04.07 |
|---|---|
| SOA/ASA/PA 기출 및 내용정리 - 22.04.12시험 Task6~13(기록용) (0) | 2024.04.06 |
| SOA/ASA/PA 기출 및 내용정리 - 22.10.11시험 Task1~5(기록용) (0) | 2024.04.05 |
| SOA/ASA/PA 기출 및 내용정리 - 23.04시험 Task8~(기록용) (0) | 2024.04.04 |
| SOA/ASA/PA 기출 및 내용정리 - 23.04시험 Task1~4(기록용) (0) | 2024.03.31 |