ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Data Analysis process; Feature selection methods (in Feature Engineering)]
    Data-Science 2020. 11. 18. 00:38

    Data Analysis process

    • Project Scoping(Define Problem)
    • Data Collection
    • EDA
    • Data processing
      • Cleaning (Drop, Drop duplicated row, Handling Missing value )
      • Transforming (Rename, Autoencoding)
      • Normalization (Min-Max, detect outlier)
    • Feature Engineering
    • Modeling
    • Evaluation
    • Project Delivery / Insights

    Handling Missing values

    • Deletion
      • Pair-wise deletion
      • List-wise deletion
      • Drop entire column deletion
    • Imputation
      • Advance (machine learning algorithm e.g., knn)
      • Time series (ffill,bfill,Linear interpolation)
      • Non-Time series (imputing with constant, mean, median, mode/common value?)

     

    Feature Engineering 

    • Feature selection
      • Filtered (using statistical skills)
      • Wrapper (set certain features ,repeat evaluation ,choose better combination)
        • forward
        • backward
        • etc
      • Embedded (depending on model )

    about many method

    • Filtered method
      low variation(near zero)
      high correlation
    1. save all possible combination correlations (if correlation of A & B is high)
    2. select one of two var , see another correrations of two with another variables execpting the mutual (select A )
    3. repeat 1,2
Designed by Tistory.