Task8

(a) Compare and contrast stepwise selection with shrinkage methods.

Similarities

 - both avoid overfitting to the data, especially when the number of observations is small compared to the number of predictors.

 - both can be used for variable selection to reduce model complexity.

Differences

 - Stepwise selection takes iterative steps, until there is no improvements as measured by AIC.

 - Shrinkage methods can reduce the size of coefficients without entirely eliminating variables.

 

(b) Explain why variables are standardized as part of the lasso model fitting procedure.

 Variables that are on a larger scale typically have smaller coefficients and vice-versa. Without standardizing, the regularization will focus on shrinking the variables on a smaller scale over those on a larger scale.

 

(c) Describe the process of searching for the optimal value of the hyperparameter lambda in a lasso regression.

 The optimal value for lambda can be found using cross-validation. First, a grid of lambda values is chosen for the search. Then for each lambda value, a cross-validation error is calculated.

 The first step in calculating a cross-validation error is to partition the data into k folds. A single fold is removed for testing, and the remaining folds are used to train a lasso model with the current lambda value. This process is prepeated for each of the k partition, and a cross-validation error is calculated as the average of an error measure (e.g. RMSE or AUC) across all k testing partitions.

 The optimal lambda value is the one with the lowest cross-validation error.

 

https://www.youtube.com/watch?v=fSytzGwwBVw

cross validation statQuest에도 있는데 이주제는 ISLR책이 더 잘 이해되는거 같음.

 

(f) confusion matrix

pred\ref negative positive
negative TN FN
positive FP TP

sensitivity = TP/(TP+FN)

specificity = TN/(TN+FP)

 

https://www.youtube.com/watch?v=vP06aMoz4v8

텍스트로 볼때는 와닿지 않았는데 영상으로 보니 언제 sensitivity나 specificity가 높은걸 써야하는지 알수 있었음. 3개, 4개의 경우도.

 

(g) lowering the cutoff threshold?

 Assess the consequences of this recommendation as it relates to the business problem.

 This will increase positive predictions (both TP and FP) while reducing negative predictions (both TN and FN), increasing sensitivity. 

 

 

 

Task9

(a) Describe how baagging is used in the random forest algorithm and the advantage it gives random forests over a single decision tree in terms of the bias/variance trade-off

 Random forests are created by applying bagging and taking random feature subsets to construct multiple trees, which are averaged to produce a prediction.

 Bagging is the process of training of multiple models in parallel on different random subsets of the data. Each individual tree is trained on a different training dataset. Variance refers to the sensitivity of the model to changes in the training dataset. Bagging reduces variance because each individual tree is trained on different data.

+ Recent posts