Upsampling, deconvolution, unpooling

By | September 11, 2018

This blog is inspired from my previous blog on Fully Convolutional Networks (FCN). It is important to know about upsampling, deconvolution and unpooling to fully understand FCNs.

Upsampling

In image processing, the process of upsampling is something, where we try to stretch up image resolution. 

Traditionally, upsampling is performed using techniques like:

  • Bi-linear interpolation
  • Bi-cubic interpolation
  • Nearest neighbour interpolation

These methods involve feature engineering. So, network does not learn anything in this case.

Why deconvolution ?

We want our network to learn how to upsample optimally, we can use deconvolution. It does not use any predefined interpolation, rather it has learnable parameters. In FCNs, we want output to be equal to the original input size. As we know, in Convnets, as the depth of layers increases the size of the tensors is reducing. So, in FCNs, the process of upsampling is performed using the operation called deconvolution or transposed convolution.

Deconvolution or transposed convolution

From my reading so far, this name seemed controversial. In some research paper, it was termed as deconvolution. It was considered inappropriate by other researchers. But as suggested by Stanford slides, we will refer it as transposed convolution in this blog.

In order to understand what is transposed convolution, it is necessary understand what is convolution matrix.

Convolution matrix

convolution matrix - upsampling example
Convolution matrix

As shown in above image, we have a 4×4 image. Convolution kernel is of size 3×3. The convolution operation performs element-wise multiplication between input matrix and kernel. Since we have no padding and stride length one, output is 2×2.

So, in convolution we went from 4×4 to 2×2. In transposed convolution, we go reverse, that is from 2×2 to 4×4. It is upsampling.

We can perform convolution operation using matrix multiplication. For that, we rearrange the kernel values in a matrix. Below in the image is a convolution matrix. It is the kernel values rearranged and inserted in a matrix of size 4×16. Image matrix is converted into vector of size 16×1. Matrix multiplication of this two results into an output vector of size 4×1. This output vector can further be arranged to size 2×2.