Deep Neural Networks with PyTorch

- 12 mins

Cost Function

Threshold Function

Vanishing gradients

Problems with not initializing weights

Gradient descent with Momenturm

Terminology

Batch Normalization

The batch normalization happens for each neuron before we pass it to activation function.

Training Part

Prediction Part:

The mean and variance fo each neuron is estimated by the entire population.

Why Batch normalization works?

It take short time to converge as:

Convolution

Convolution is a linear operation similar to a linear equation, dot product, or matrix multiplication. Convolution has several advantages for analyzing images. As discussed in the video, convolution preserves the relationship between elements, and it requires fewer parameters than other methods.

You can see the relationship between the different methods that you learned:

In convolution, the parameter w is called a kernel. You can perform convolution on images where you let the variable image denote the variable X and w denote the parameter.

Illustration

Terminology

https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/chapter%206/6.1.1convltuon.gif

\[M_{new}=\dfrac{M-K}{stride}+1\] \[M'=M+2 \times padding \\ M_{new}=M'-K+1\]

Multiple Channels

Multiple Output Channels:

In Pytorch, you can a Conv2d object with multiple outputs. For each channels, a kernel is created, and each channels performs convolution independently. As a result, the number of ouputs is equal to the number of channels. This is demonstrated in the following figure. The number of 9 is convolved with thress kernels, each of a different color. There are three different activation maps represented by the different colors.

Multiple Input Channels:

For multiple inputs, you can create multiple kernels. Each kernel performs a convolution on its associated input channel. The resulting output is added together as shown:

Mutiple output and input Channels:

When using multiple inputs and outputs, a kernel is created for each input, and process is repeated for each output. This process is summarized in the following figure:

There are 3 output channels and 2 input channels. For each channels, the input in red and purple is convolved with an individual kernel that is colored differently. As a results, there are three outputs.

Convolution Neural Network

Batch Normalization before Max pooling

Like a fully connected network, we create a BatchNorm2d object, but we apply it to the 2D convolution object. First, we create objects Conv2dobject; we require the number of output channels, specified by the variable OUT.

self.cnn1 = nn.Conv2d(in_channels=1, out_channels=OUT, kernel_size=5, padding=2)
# We then create a Batch Norm  object for 2D convolution as follows:
self.conv1_bn = nn.BatchNorm2d(OUT)
# The parameter out is the number of channels in the output. 
# We can then apply batch norm  after  the convolution operation :
x = self.cnn1(x)
x=self.conv1_bn(x)

Compare convolution neural network with batch normalization and without on MINIST image dataset.

class CNN(nn.Module):
    
    # Contructor
    def __init__(self, out_1=16, out_2=32):
        super(CNN, self).__init__()
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=out_1, kernel_size=5, padding=2)
        self.maxpool1=nn.MaxPool2d(kernel_size=2)

        self.cnn2 = nn.Conv2d(in_channels=out_1, out_channels=out_2, kernel_size=5, stride=1, padding=2)
        self.maxpool2=nn.MaxPool2d(kernel_size=2)
        self.fc1 = nn.Linear(out_2 * 4 * 4, 10)
    

    # Prediction
    def forward(self, x):
        x = self.cnn1(x)
        x = torch.relu(x)
        x = self.maxpool1(x)
        x = self.cnn2(x)
        x = torch.relu(x)
        x = self.maxpool2(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

class CNN_batch(nn.Module):
    
    # Contructor
    def __init__(self, out_1=16, out_2=32,number_of_classes=10):
        super(CNN_batch, self).__init__()
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=out_1, kernel_size=5, padding=2)
        self.conv1_bn = nn.BatchNorm2d(out_1)

        self.maxpool1=nn.MaxPool2d(kernel_size=2)
        
        self.cnn2 = nn.Conv2d(in_channels=out_1, out_channels=out_2, kernel_size=5, stride=1, padding=2)
        self.conv2_bn = nn.BatchNorm2d(out_2)

        self.maxpool2=nn.MaxPool2d(kernel_size=2)
        self.fc1 = nn.Linear(out_2 * 4 * 4, number_of_classes)
        self.bn_fc1 = nn.BatchNorm1d(10)
    
    # Prediction
    def forward(self, x):
        x = self.cnn1(x)
        x=self.conv1_bn(x)
        x = torch.relu(x)
        x = self.maxpool1(x)
        x = self.cnn2(x)
        x=self.conv2_bn(x)
        x = torch.relu(x)
        x = self.maxpool2(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x=self.bn_fc1(x)
        return x

Pretrain Model on Pytorch:

You can load a pre-train model as followiyng and only train you model to update the last layer:

model = models.densenet121(pretrained=True)
for param in model.parameters:
	param.requires_grad = False

model.fc = nn.linear(512,7) # replace the last layer by your own fully connected layer 

optimizer = torch.optim.SGD([
	p for p in model.parameters() if x.requires_grad
	],lr = learning_rate)
Jingxin Fu, Ph.D.

Jingxin Fu, Ph.D.

Research Fellow interested in data mining on cancer genomics

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora