-
Data Analysis process; Feature selection methods (in Feature Engineering)]Data-Science 2020. 11. 18. 00:38
Data Analysis process
- Project Scoping(Define Problem)
- Data Collection
- EDA
- Data processing
- Cleaning (Drop, Drop duplicated row, Handling Missing value )
- Transforming (Rename, Autoencoding)
- Normalization (Min-Max, detect outlier)
- Feature Engineering
- Modeling
- Evaluation
- Project Delivery / Insights
Handling Missing values
- Deletion
- Pair-wise deletion
- List-wise deletion
- Drop entire column deletion
- Imputation
- Advance (machine learning algorithm e.g., knn)
- Time series (ffill,bfill,Linear interpolation)
- Non-Time series (imputing with constant, mean, median, mode/common value?)
Feature Engineering
- Feature selection
- Filtered (using statistical skills)
- Wrapper (set certain features ,repeat evaluation ,choose better combination)
- forward
- backward
- etc
- Embedded (depending on model )
- Feature Extracction (PCA..)
- Feature Generation (using domain knowledge) about feature engineering overview
about many method
- Filtered method
low variation(near zero)
high correlation
- save all possible combination correlations (if correlation of A & B is high)
- select one of two var , see another correrations of two with another variables execpting the mutual (select A )
- repeat 1,2