Assignment:6

Jiayu Li February 25, 2021

Build on homework 5 and make a plan that defines your final project. This plan should have items many of which you should have gotten in homework 5.

  • A problem
  • A dataset
  • A deep learning algorithm
  • Possibly some existing efforts that can be helpful to your work
  • A Timeline

1. Structural Protein Sequences Classification

In the protein structure data set, each protein is classified according to its function. Categories include: HYDROLASE, OXYGEN TRANSPORT, VIRUS, SIGNALING PROTEIN, etc. dozens of kinds. In this project, we will use nucleic acid sequences to predict the type of protein.

2. Dataset

Structural Protein Sequences Dataset: https://www.kaggle.com/shahir/protein-data-set/code

Protein dataset classification: https://www.kaggle.com/rafay12/anti-freeze-protein-classification

RCSB PDB: https://www.rcsb.org/

3. Deep learning algorithm

Possible candidate algorithms include LSTM, CNN, SVM, etc. In actual problems, it may be necessary to combine multiple algorithms to achieve higher accuracy.

Timeline

  • Week 1: Collect data, understand the data.
  • Week 2: Data preprocessing, data visualization.
  • Week 3: Find related works and test existing algorithms.
  • Week 4: Protein structure prediction or classification based on existing work.
  • Week 5: Continue the previous experiment. Complete project report