1 minute read
Assignment:6
Jiayu Li February 25, 2021
Build on homework 5 and make a plan that defines your final project. This plan should have items many of which you should have gotten in homework 5.
- A problem
- A dataset
- A deep learning algorithm
- Possibly some existing efforts that can be helpful to your work
- A Timeline
1. Structural Protein Sequences Classification
In the protein structure data set, each protein is classified according to its function. Categories include: HYDROLASE, OXYGEN TRANSPORT, VIRUS, SIGNALING PROTEIN, etc. dozens of kinds. In this project, we will use nucleic acid sequences to predict the type of protein.
2. Dataset
Structural Protein Sequences Dataset: https://www.kaggle.com/shahir/protein-data-set/code
Protein dataset classification: https://www.kaggle.com/rafay12/anti-freeze-protein-classification
RCSB PDB: https://www.rcsb.org/
3. Deep learning algorithm
Possible candidate algorithms include LSTM, CNN, SVM, etc. In actual problems, it may be necessary to combine multiple algorithms to achieve higher accuracy.
Timeline
- Week 1: Collect data, understand the data.
- Week 2: Data preprocessing, data visualization.
- Week 3: Find related works and test existing algorithms.
- Week 4: Protein structure prediction or classification based on existing work.
- Week 5: Continue the previous experiment. Complete project report