Task6

(a) Explain what the variance and bias values indicate about the relative quality of predictions when comparing predictive models.

 The variance figures indicate how much the predictions vary depending on the training data used. As more predictors are used, the variance increases because the model more predicsely fits the training data for each trial and becomes less generalized. The bias figures indicate how close expected predictions and actual results are on unseen data. Generally, as more predictors are used, the bias decreases as more accurate predictions are made.

 

 

Task7

(a) Explain, for a general audience, what cost complexity pruining does.

 Cost complexity pruninig is part of a two-step approach to building a tree model. The first step is to build a large, complex decision tree, which is essentially a flow chart for deciding whether to try to transfer an animal.

 A second step called pruning is taken. Pruning reduces the size and complexity of the initial flow chart to a more useful one. That is is called cost complexity pruning has to do with the technical tradeoff being made between how simple the flow chart is compared to how well it distinguishes whether animals can be transferred or not.

 

 

Task8

(a) Boosting - Setting eta as high as possible?

 In boosting algorithms, which work by iteratively fitting a model the residuals of a prior learner, eta, also called the learning rate or shrinkage parameter, slows down the model fitting process so that the residuals from the prior learner do not have too large an influence on the final model. With eta at its maximum of 1, each model iteration is the prior learner plus the model fitting its residuals. While this will run quickly, it will be prone to high variance, overfitting the training data and not generalizing well to unseen data. Setting eta to less than 1 slows down the fitting process by only adding eta times the model fitting the residuals to form the next learner and will substantially reduce the variance.

 

(b) Explain cross validation and how it can be used to set the eta hyperparameter

 Cross validation divides the availabel data in to multiple folds for a series of model fitting runs. Each fold is used as test data exactly once. The average test metric across the runs is the result of the cross validation.

 To use cross validation to set the eta hyperparameter, a series of reasonable values for eta would be chosen beforehand. Then, for each value of eta, cross validation would be performed with each model fitting run using the same eta. The result is one average test metric result from each cross validation for each value of eta. The value of eta with the superior test metric, some measure of predictive power on unseen data, would be shosen for subsequent predictive modeling work.

 

 

 

Task8

(a) cost-complexity pruning algorithm

 Pruning is a technique used to reduce the complexity of a decision tree and protect against overfitting. This process is repeated for each remaining split until further pruning would result in decreased model accuracy.

https://www.youtube.com/watch?v=D0efHEJsfHo

StatQuest

 

(b) Choosing a complexity parameter based on cross-validation results.

 Choosing the value that results in the minimum corss-validation error.

 Employing the one standard-error rule. This approachproposes using the complexity parameter for the smallest model within one standard-error of the minimum cross-validation error.

 

Task9

(a) Explain the difference between accuracy and AUC in terms of overall model assessment.

 Accuracy is measured by the ratio of correct number of predictions to total number of predictions made.

 AUC measures the area under the ROC curve. It assesses the overall model performance by measuring how true positive rate and false positive rate trade off across a range of possible classification thresholds.

 AUC measures performance across the full range of thresholds while accuracy measures performance only at the selected threshold.

 

(b) Explain why the ROC curve always goes through (0,0) and (1,1)

(0,0) : true positive rate(Sensitivity) is zero, and true negative rate(Specificity) is 1.

(1,1) : true positive rate(Sensitivity) is 1, and true negative rate(Specificity) is zero. everything is classified as postiive.

https://www.youtube.com/watch?v=4jRBRDbJemM

(c) Gradient boosting machine tree model, Explain why model performance deteriorates as the number of trees increases.

A GBM iteratively builds trees fit to the residuals of prior trees. Depending on the hyperparameters, this model can produce a very complex model, which is susceptible to overfitting to patterns in the training data.

 AUC on the testing data starts to drop, which indecates the model is overfit to the training data.

 

 

https://www.youtube.com/watch?v=LsK-xG1cLYA

https://www.youtube.com/watch?v=3CC4N4z3GJc

https://www.youtube.com/watch?v=2xudPOBz-vs

 잘 이해 안된다.  틈틈히 다시보자.

 

(d)  Describe two hyperparameters to improve model performance.

 Early stopping : Early stopping criteria, such as improvement of the performance metrics in each subsequent tree, can stop training when it detects the improvement is marginal. This avoids overfitting.

 Controlling learning rate : Learning rate controls the impact of subsequent trees to the overall model outcome. This reduces the extent to which a single tree is able to influence the model fitting process.

 

(e) How to tune a hyperparameter.

 Tuning a hyperparamether requires first varying the hyperparameter across a range of possible values and performing cross validation at each value. Performance is then determined based on a cross-validation performance metric, for example AUC, and the hyperparameter value with best performance based on this metric is selected.

 

 

Task11

(d) How changing the link function in the GLM impacts the model fitting and how this can impact predictor significance.

 The link function specifies a functional relationship between the linear predictor and the mean of the distribution of the outcome conditional on the predictor variables. Different link functions have different shapes and can therefore fit to different nolinear relationships between the predictors and the target variable.

 When the link function matches the relationship of a predictor variable, the mean of the outcome distribution (the prediction) will generally be close to the actual values for the target variable, resulting in smaller residuals and more significant p-values.

 

 

Task12

(a) Proxy variable

 Proxy variables are variables that are used in place of other information, usually because the desired information is either impossible or impractical to measure. For a variable to be a good proxy it must have a close relationship with the variable of interest.

 

 

Task8

(a) Compare and contrast stepwise selection with shrinkage methods.

Similarities

 - both avoid overfitting to the data, especially when the number of observations is small compared to the number of predictors.

 - both can be used for variable selection to reduce model complexity.

Differences

 - Stepwise selection takes iterative steps, until there is no improvements as measured by AIC.

 - Shrinkage methods can reduce the size of coefficients without entirely eliminating variables.

 

(b) Explain why variables are standardized as part of the lasso model fitting procedure.

 Variables that are on a larger scale typically have smaller coefficients and vice-versa. Without standardizing, the regularization will focus on shrinking the variables on a smaller scale over those on a larger scale.

 

(c) Describe the process of searching for the optimal value of the hyperparameter lambda in a lasso regression.

 The optimal value for lambda can be found using cross-validation. First, a grid of lambda values is chosen for the search. Then for each lambda value, a cross-validation error is calculated.

 The first step in calculating a cross-validation error is to partition the data into k folds. A single fold is removed for testing, and the remaining folds are used to train a lasso model with the current lambda value. This process is prepeated for each of the k partition, and a cross-validation error is calculated as the average of an error measure (e.g. RMSE or AUC) across all k testing partitions.

 The optimal lambda value is the one with the lowest cross-validation error.

 

https://www.youtube.com/watch?v=fSytzGwwBVw

cross validation statQuest에도 있는데 이주제는 ISLR책이 더 잘 이해되는거 같음.

 

(f) confusion matrix

pred\ref negative positive
negative TN FN
positive FP TP

sensitivity = TP/(TP+FN)

specificity = TN/(TN+FP)

 

https://www.youtube.com/watch?v=vP06aMoz4v8

텍스트로 볼때는 와닿지 않았는데 영상으로 보니 언제 sensitivity나 specificity가 높은걸 써야하는지 알수 있었음. 3개, 4개의 경우도.

 

(g) lowering the cutoff threshold?

 Assess the consequences of this recommendation as it relates to the business problem.

 This will increase positive predictions (both TP and FP) while reducing negative predictions (both TN and FN), increasing sensitivity. 

 

 

 

Task9

(a) Describe how baagging is used in the random forest algorithm and the advantage it gives random forests over a single decision tree in terms of the bias/variance trade-off

 Random forests are created by applying bagging and taking random feature subsets to construct multiple trees, which are averaged to produce a prediction.

 Bagging is the process of training of multiple models in parallel on different random subsets of the data. Each individual tree is trained on a different training dataset. Variance refers to the sensitivity of the model to changes in the training dataset. Bagging reduces variance because each individual tree is trained on different data.

+ Recent posts