3 minute read
| Activity | Resource |
|---|---|
| Data Ingestion | |
| Initial exploration in Kaggle to choose a problem and gather the dataset | Balaji & Anantha |
| Download dataset and push the files to GCS bucket | Balaji |
| Explore different methods to flatten JSON and perform data type conversions | Balaji |
| Ingest data into Notebook runtime using Pandas | Balaji |
| Exploratory Data Analysis | |
| Investigate basic statistics of data | Balaji |
| Basic data exploration on the full dataset | Balaji |
| Sweetviz exploratory analysis on the full dataset | Balaji |
| Target variable distribution | Anantha |
| Device groups distribution | Anantha |
| Geo Network groups distribution | Anantha |
| Channel grouping distribution | Anantha |
| Visit Number distribution | Anantha |
| Dates distribution | Anantha |
| Total Bounces distribution | Anantha |
| Total New visits distribution | Anantha |
| Total Hits distribution | Anantha |
| Total Page views distribution | Anantha |
| Traffic sources distribution (Adcontent, Medium) | Anantha |
| Data Pre-processing | |
| Impute missing values (6 features) | Anantha |
| Dropping columns with constant variance | Anantha |
| Dropping columns with mostly null values and only one not null value | Anantha |
| Prepare target variable | Anantha |
| Build Dataframe Selector class | Anantha |
| Build Attributes Pre-process class | Anantha |
| Feature Engineering | |
| Build Categorical Encoder Class | Anantha |
| Extract date related features | Anantha |
| Fix data format for few feature variables | Anantha |
| Combine feature variables for enhanced predictive power | Anantha |
| Create mean, sum, max, min and variance for web statistics grouped by day | Anantha |
| Create mean, sum, max, min and variance for web statistics grouped by n/w domain | Anantha |
| Model Training and Evaluation | |
| Building the full sklearn pipeline | Balaji |
| Linear Regression model - Cross validation using Grid Search | Balaji |
| Linear Regression model testing and evaluation | Balaji |
| Lasso Regression model - Cross validation using Grid Search | Balaji |
| Lasso Regression model testing and evaluation | Balaji |
| Ridge Regression model - Cross validation using Grid Search | Balaji |
| Ridge Regression model testing and evaluation | Balaji |
| XGBoost Regressor model training | Anantha |
| XGBoost Regressor model testing and evaluation | Anantha |
| Benchmarking throughout the code | Balaji |
| Documenting results and the format for displaying results | Anantha & Balaji |
| Feature Importance visualization and Additional fine tuning | |
| Feature importance XGBoost Regressor | Anantha |
| Feature importance LightGBM Regressor | Balaji |
| Hyperparameter tuning and additional experimentation | Anantha & Balaji |
| (Did discussions via Zoom calls & explored different approaches) | |
| Project Report | |
| Project Abstract | Anantha |
| Introduction | Balaji |
| Datasets & Metrics | Anantha |
| Methodology - Load Data | Balaji |
| Methodology - Data Exploration (all sub-sections) | Balaji |
| Data Pre-processing | Balaji |
| Feature Engineering | Balaji |
| Feature Selection | Balaji |
| Model Algo & Optimization methods - Linear Regression model description | Anantha |
| Model Algo & Optimization methods - Linear Regression model results | Balaji |
| Model Algo & Optimization methods - Lasso Regression model description | Anantha |
| Model Algo & Optimization methods - Lasso Regression model results | Balaji |
| Model Algo & Optimization methods - Ridge Regression model description | Anantha |
| Model Algo & Optimization methods - Ridge Regression model results | Balaji |
| Model Algo & Optimization methods - XGBoost Regression model description | Anantha |
| Model Algo & Optimization methods - XGBoost Regression model results | Balaji |
| Model Algo & Optimization methods - LightGBM Regression model description | Anantha |
| Model Algo & Optimization methods - LightGBM Regression model results | Balaji |
| Conclusion - Model Pipeline | Anantha |
| Conclusion - Feature Exploration & Pre-processing | Anantha |
| Conclusion - Outcome of Experiments | Anantha |
| Conclusion - Limitations | Anantha |
| Previous Explorations | Anantha |
| Benchmarking (In progress) | Balaji |
| References | Anantha & Balaji |