공부하다보니 PA는 그간의 것들보다 어려워서 공부할 겸 두서없이 기록용으로 남기고자함.
기출 18년 12월꺼까지 한번 쓱 봤는데 옛날 것들은 핀트가 달라서(R도 직접해야하고...)
최근거 볼수 있는데까지 보고 시험보러 갈 예정. 시험문제말고 개념만 정리할 예정.
Task1
(a) Stratified sample
Stratified sampling works by independently drawing a set of random records from each stratum or group in your data.
1. Identify the strata
2. Draw a random sample from each stratum. Each sample size should be the same proportion of the total number of records in the stratum to ensure representativeness.
3. Combine all these samples to create a stratified sample.
(b) Advantages and disadvantages of using kind of unstructured data in a predictive model
unstructured data? > do not follow a pre-defined format and cannot be displayed in tabular format.
Advantages
- Gives insights and qualitative information that cannot be included in a structured dataset.
Disadvantages
- requires more complex methods to process for input into a predictive model. It can also be more time-consuming and resource-intensive to analyze unstructured data.
Task2
(a) similarities and differences between K-means clustering and hierarchical clustering.
Similarities :
- K-means and hierarchical clustering can both be used to generate new features from multiple predictor variables.
- K-means and hierarchical clustering are both unsupervised learning techniques, which means that they both group observations to show structures and relationships in the data without reference to a target variable.
Differences :
- K-means clustering requires choosing the number of clusters as an input. Hierarchical clustering algorithms iteratively partition the data, resulting in models from one single cluster to every observation being its own cluster. The results of the partitioning are presented graphically in a dendrogram, where the modeler can then select K by making acut at a certain height.
- K-means only considers dissimilarity among observations (using, for example, Euclidean distance) in creating clusters and does not have a notion of dissimilarity among clusters. Hierarchical clustering algorithms do consider dissimilarity among clusters through the use of a linkage function.
https://www.youtube.com/watch?v=4b5d3muPQmA
https://www.youtube.com/watch?v=7xHsRkOdVwo
각각 K-means clustering, Hierarchical clustering 관련 유튜브인데 SRM 공부할때 참조하던 채널인데 유익한듯.
(남의 채널 링크 가져다 놓는 것이 문제가 된다면 지우도록 하겠음.)
'SOA > PA' 카테고리의 다른 글
| SOA/ASA/PA 기출 및 내용정리 - 22.10.11시험 Task1~5(기록용) (0) | 2024.04.05 |
|---|---|
| SOA/ASA/PA 기출 및 내용정리 - 23.04시험 Task8~(기록용) (0) | 2024.04.04 |
| SOA/ASA/PA 기출 및 내용정리 - 23.04시험 Task1~4(기록용) (0) | 2024.03.31 |
| SOA/ASA/PA 기출 및 내용정리 - 23.10시험 Task8~(기록용) (0) | 2024.03.31 |
| SOA/ASA/PA 기출 및 내용정리 - 23.10시험 Task3~4(기록용) (1) | 2024.03.31 |