Task2

(a) Explain how PCA is typically used.

 PCA is an unsupervised learing techniqque that creates new uncorrelated variables that mazimize variance. Often, the first few principal components explain most of the variability in the original variables. These principal components can be used in place of the original variables to reduce dimensionality and create a simpler model.

(b) PCA 특징

 PCA is effective when there is high dimensionality (many variables) which can make univariate and bivariate data exploration and visualization techniques less effective. PCA is used to summarize high-dimensional data into fewer composite variables while retaining as much information as possible.

 PCA attempts to maximize the variance or spread in our data distribution by linearly combining original variables.

 

 

Task3

(a) assumptions for OLS

 - The residuals have a normal distribution.

 - The mean of the residual is zero.

 - The residual variance is constant.(homoscedasticity)

 

 

Task4 ridge/lasso/elastic net

(b) 

  Model1 Model2 Model3
Type Ridge or Elastic-Net Elastic Net Lasso or Elastic Net
alpha 0 <= alpha < 1 0 < alpha < 1 0 < alpha <= 1
Benefit Reduces variance by shrinking coefficients Reduces variance by shrinking coefficients, can also be used to perform model selection and is helpful in instances where there is high-dimensional data with few data points. Reduces variance by shrinking coefficients and can also be used to perform model selection and remove nonpredictive variables.

 

> ridge/lasso/elastic-net 주제도 뭔가 텍스트로 보면 잘 와닿지 않은데 StatQuest에서 시각화 잘해서 알려준다...

overfitting 방지하기 위한 방법론이다, ridge vs lasso 차이(coefficient 0 가능한지) 이상의 문제가 나오면 대응 가능할지 모르겠다. 

 

https://www.youtube.com/watch?v=Q81RR3yKn30

StatQuest - ridge

https://www.youtube.com/watch?v=NGf0voTMlcs

StatQuest - lasso

https://www.youtube.com/watch?v=Xm2C_gTAl8c

StatQuest - ridge vs lasso

https://www.youtube.com/watch?v=1dKRdX9bfIo

StatQuest - elastic-net

 

+ Recent posts