Activity Resource
Data Ingestion
Initial exploration in Kaggle to choose a problem and gather the dataset Balaji & Anantha
Download dataset and push the files to GCS bucket Balaji
Explore different methods to flatten JSON and perform data type conversions Balaji
Ingest data into Notebook runtime using Pandas Balaji
Exploratory Data Analysis
Investigate basic statistics of data Balaji
Basic data exploration on the full dataset Balaji
Sweetviz exploratory analysis on the full dataset Balaji
Target variable distribution Anantha
Device groups distribution Anantha
Geo Network groups distribution Anantha
Channel grouping distribution Anantha
Visit Number distribution Anantha
Dates distribution Anantha
Total Bounces distribution Anantha
Total New visits distribution Anantha
Total Hits distribution Anantha
Total Page views distribution Anantha
Traffic sources distribution (Adcontent, Medium) Anantha
Data Pre-processing
Impute missing values (6 features) Anantha
Dropping columns with constant variance Anantha
Dropping columns with mostly null values and only one not null value Anantha
Prepare target variable Anantha
Build Dataframe Selector class Anantha
Build Attributes Pre-process class Anantha
Feature Engineering
Build Categorical Encoder Class Anantha
Extract date related features Anantha
Fix data format for few feature variables Anantha
Combine feature variables for enhanced predictive power Anantha
Create mean, sum, max, min and variance for web statistics grouped by day Anantha
Create mean, sum, max, min and variance for web statistics grouped by n/w domain Anantha
Model Training and Evaluation
Building the full sklearn pipeline Balaji
Linear Regression model - Cross validation using Grid Search Balaji
Linear Regression model testing and evaluation Balaji
Lasso Regression model - Cross validation using Grid Search Balaji
Lasso Regression model testing and evaluation Balaji
Ridge Regression model - Cross validation using Grid Search Balaji
Ridge Regression model testing and evaluation Balaji
XGBoost Regressor model training Anantha
XGBoost Regressor model testing and evaluation Anantha
Benchmarking throughout the code Balaji
Documenting results and the format for displaying results Anantha & Balaji
Feature Importance visualization and Additional fine tuning
Feature importance XGBoost Regressor Anantha
Feature importance LightGBM Regressor Balaji
Hyperparameter tuning and additional experimentation Anantha & Balaji
(Did discussions via Zoom calls & explored different approaches)
Project Report
Project Abstract Anantha
Introduction Balaji
Datasets & Metrics Anantha
Methodology - Load Data Balaji
Methodology - Data Exploration (all sub-sections) Balaji
Data Pre-processing Balaji
Feature Engineering Balaji
Feature Selection Balaji
Model Algo & Optimization methods - Linear Regression model description Anantha
Model Algo & Optimization methods - Linear Regression model results Balaji
Model Algo & Optimization methods - Lasso Regression model description Anantha
Model Algo & Optimization methods - Lasso Regression model results Balaji
Model Algo & Optimization methods - Ridge Regression model description Anantha
Model Algo & Optimization methods - Ridge Regression model results Balaji
Model Algo & Optimization methods - XGBoost Regression model description Anantha
Model Algo & Optimization methods - XGBoost Regression model results Balaji
Model Algo & Optimization methods - LightGBM Regression model description Anantha
Model Algo & Optimization methods - LightGBM Regression model results Balaji
Conclusion - Model Pipeline Anantha
Conclusion - Feature Exploration & Pre-processing Anantha
Conclusion - Outcome of Experiments Anantha
Conclusion - Limitations Anantha
Previous Explorations Anantha
Benchmarking (In progress) Balaji
References Anantha & Balaji