# MNIST With PyTorch

MNIST With PyTorch

In this lesson we discuss in how to create a simple IPython Notebook to solve an image classification problem with Multi Layer Perceptron with PyTorch.

## Import Libraries

import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
from torch import nn
from torch import optim
from time import time
import os


## Pre-Process Data

Here we download the data using PyTorch data utils and transform the data by using a normalization function. PyTorch provides a data loader abstraction called a DataLoader where we can set the batch size, data shuffle per batch loading. Each data loader expecte a Pytorch Dataset. The DataSet abstraction and DataLoader usage can be found here

# Data transformation function
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
])

# DataSet



## Define Network

Here we select the matching input size compared to the network definition. Here data reshaping or layer reshaping must be done to match input data shape with the network input shape. Also we define a set of hidden unit sizes along with the output layers size. The output_size must match with the number of labels associated with the classification problem. The hidden units can be chosesn depending on the problem. nn.Sequential is one way to create the network. Here we stack a set of linear layers along with a softmax layer for the classification as the output layer.

input_size = 784
hidden_sizes = [128, 128, 64, 64]
output_size = 10

model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], hidden_sizes[2]),
nn.ReLU(),
nn.Linear(hidden_sizes[2], hidden_sizes[3]),
nn.ReLU(),
nn.Linear(hidden_sizes[3], output_size),
nn.LogSoftmax(dim=1))

print(model)

Sequential(
(0): Linear(in_features=784, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=128, bias=True)
(3): ReLU()
(4): Linear(in_features=128, out_features=64, bias=True)
(5): ReLU()
(6): Linear(in_features=64, out_features=64, bias=True)
(7): ReLU()
(8): Linear(in_features=64, out_features=10, bias=True)
(9): LogSoftmax(dim=1)
)


## Define Loss Function and Optimizer

criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)


## Train

epochs = 5

for epoch in range(epochs):
loss_per_epoch = 0
images = images.view(images.shape[0], -1)

# Pass input to the model
output = model(images)
# Calculate loss after training compared to labels
loss = criterion(output, labels)

# backpropagation
loss.backward()

# optimizer step to update the weights
optimizer.step()

loss_per_epoch += loss.item()
print("Epoch {} - Training loss: {}".format(epoch, average_loss))

Epoch 0 - Training loss: 1.3052690227402808
Epoch 1 - Training loss: 0.33809808635317695
Epoch 2 - Training loss: 0.22927882223685922
Epoch 3 - Training loss: 0.16807103878669521
Epoch 4 - Training loss: 0.1369301250545995


## Model Evaluation

Similar to training data loader, we use the validation loader to load batch by batch and run the feed-forward network to get the expected prediction and compared to the label associated with the data point.

correct_predictions, all_count = 0, 0
# enumerate data from the data validation loader (loads a batch at a time)
for i in range(len(labels)):
img = images[i].view(1, 784)
# at prediction stage, only feed-forward calculation is required.
logps = model(img)

# Output layer of the network uses a LogSoftMax layer
# Hence the probability must be calculated with the exponential values.
# The final layer returns an array of probabilities for each label
# Pick the maximum probability and the corresponding index
# The corresponding index is the predicted label
ps = torch.exp(logps)
probab = list(ps.numpy()[0])
pred_label = probab.index(max(probab))
true_label = labels.numpy()[i]
if(true_label == pred_label):
correct_predictions += 1
all_count += 1

print(f"Model Accuracy {(correct_predictions/all_count) * 100} %")

Model Accuracy 95.95 %