3 minute read
Activity | Resource |
---|---|
Data Ingestion | |
Initial exploration in Kaggle to choose a problem and gather the dataset | Balaji & Anantha |
Download dataset and push the files to GCS bucket | Balaji |
Explore different methods to flatten JSON and perform data type conversions | Balaji |
Ingest data into Notebook runtime using Pandas | Balaji |
Exploratory Data Analysis | |
Investigate basic statistics of data | Balaji |
Basic data exploration on the full dataset | Balaji |
Sweetviz exploratory analysis on the full dataset | Balaji |
Target variable distribution | Anantha |
Device groups distribution | Anantha |
Geo Network groups distribution | Anantha |
Channel grouping distribution | Anantha |
Visit Number distribution | Anantha |
Dates distribution | Anantha |
Total Bounces distribution | Anantha |
Total New visits distribution | Anantha |
Total Hits distribution | Anantha |
Total Page views distribution | Anantha |
Traffic sources distribution (Adcontent, Medium) | Anantha |
Data Pre-processing | |
Impute missing values (6 features) | Anantha |
Dropping columns with constant variance | Anantha |
Dropping columns with mostly null values and only one not null value | Anantha |
Prepare target variable | Anantha |
Build Dataframe Selector class | Anantha |
Build Attributes Pre-process class | Anantha |
Feature Engineering | |
Build Categorical Encoder Class | Anantha |
Extract date related features | Anantha |
Fix data format for few feature variables | Anantha |
Combine feature variables for enhanced predictive power | Anantha |
Create mean, sum, max, min and variance for web statistics grouped by day | Anantha |
Create mean, sum, max, min and variance for web statistics grouped by n/w domain | Anantha |
Model Training and Evaluation | |
Building the full sklearn pipeline | Balaji |
Linear Regression model - Cross validation using Grid Search | Balaji |
Linear Regression model testing and evaluation | Balaji |
Lasso Regression model - Cross validation using Grid Search | Balaji |
Lasso Regression model testing and evaluation | Balaji |
Ridge Regression model - Cross validation using Grid Search | Balaji |
Ridge Regression model testing and evaluation | Balaji |
XGBoost Regressor model training | Anantha |
XGBoost Regressor model testing and evaluation | Anantha |
Benchmarking throughout the code | Balaji |
Documenting results and the format for displaying results | Anantha & Balaji |
Feature Importance visualization and Additional fine tuning | |
Feature importance XGBoost Regressor | Anantha |
Feature importance LightGBM Regressor | Balaji |
Hyperparameter tuning and additional experimentation | Anantha & Balaji |
(Did discussions via Zoom calls & explored different approaches) | |
Project Report | |
Project Abstract | Anantha |
Introduction | Balaji |
Datasets & Metrics | Anantha |
Methodology - Load Data | Balaji |
Methodology - Data Exploration (all sub-sections) | Balaji |
Data Pre-processing | Balaji |
Feature Engineering | Balaji |
Feature Selection | Balaji |
Model Algo & Optimization methods - Linear Regression model description | Anantha |
Model Algo & Optimization methods - Linear Regression model results | Balaji |
Model Algo & Optimization methods - Lasso Regression model description | Anantha |
Model Algo & Optimization methods - Lasso Regression model results | Balaji |
Model Algo & Optimization methods - Ridge Regression model description | Anantha |
Model Algo & Optimization methods - Ridge Regression model results | Balaji |
Model Algo & Optimization methods - XGBoost Regression model description | Anantha |
Model Algo & Optimization methods - XGBoost Regression model results | Balaji |
Model Algo & Optimization methods - LightGBM Regression model description | Anantha |
Model Algo & Optimization methods - LightGBM Regression model results | Balaji |
Conclusion - Model Pipeline | Anantha |
Conclusion - Feature Exploration & Pre-processing | Anantha |
Conclusion - Outcome of Experiments | Anantha |
Conclusion - Limitations | Anantha |
Previous Explorations | Anantha |
Benchmarking (In progress) | Balaji |
References | Anantha & Balaji |