Task2

(c) Discuss the benefits of stratified sampling

 Stratified sampling results in test and train datasets that are similar with respect to the stratification variables. 

 

Wikipedia

In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations.

In computational statistics, stratified sampling is a method of variance reduction when Monte Carlo methods are used to estimate population statistics from a known population.[1]

 

Stratified sampling - Wikipedia

From Wikipedia, the free encyclopedia Sampling from a population which can be partitioned into subpopulations In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations. Stratified sampling exa

en.wikipedia.org

 

 

Task3

(c) best subset selection vs stepwise selection

 best subset selection > global minimum

 stepwise selection > local minimum, computationally more efficient

> 시험장가서 각 기법 설명를 영어로 할수 있을지 모르겠다 ㅋ

 

 

Task4

(a) Describe two ways impurity measures are used in a classification tree.

 - which split in the decision tree should be made next.

 - which branches of the tree to prune back after building a decision tree.

 

https://www.youtube.com/watch?v=_L39rN6gz7Y&t=348s

Statquest 6분경부터 Gini impurity 나옴

 

Task5

(a) Poisson regression vs Quasi-Poisson regression

 An underlying assumption of Poisson regression is that the mean and variance are equal.

 Quasi-Poisson regression is equipped to deal with the problem of overdispersion. the estimates of the coefficients are the same when compared to the Poisson output. However, the standard errors are all higher and fewer coefficients are statistically significant. If any further analysis is conducted such as deriving confidence intervals or conducting hypothesis tests, the quasi-Poisson distribution should be used.

 

 

+ Recent posts