8.3k views
0 votes
722-56-9177

(1) In the Convolutional Neural Networks, the input image is given as 5 x 5. We have a gray scale image, typically zero is taken to be black, and 255 is taken to be white. The convolutional layer has 1 filter, and the filter is 3 x 3, the stride is 2. Bias is 0. After applying convolutional layer, what is the output volume?

00000
30 30 30 30 30
50 50 50 50 50
010 100
60 60 60 60 60
00000
(a)

000
(e) none of the left:
(b)
150 150 150 150 180
(c)
(6)
243 242 240 30 30
255 255 235 60 8

User Seasoned
by
7.7k points

1 Answer

4 votes

For a 5x5x3 input with a 3x3 filter, stride 2, and padding 1, the output volume is 2x2x3.

Input and filter dimensions

The input is 5x5x3, meaning it has 5 rows, 5 columns, and 3 channels (RGB). The convolutional layer has a filter of size 3x3x3, meaning it's also 3 channels deep and covers a 3x3 area of the input.

Padding

The padding is 1, which means we add a one-pixel border of zeros around the input image. This makes the padded image 7x7x3.

Convolution

The filter slides across the padded image, one pixel at a time, performing element-wise multiplication and summation with the corresponding area of the image. This is done for all three channels of the filter and image.

Stride

The stride is 2, which means the filter moves two pixels to the right after each convolution. This is done to reduce the output volume and prevent the network from becoming too complex.

Output volume calculation

The width of the output volume is calculated as: (input width - filter width + 2 * padding) / stride + 1 = (5 - 3 + 2 * 1) / 2 + 1 = 2

Similarly, the height of the output volume is also 2.

Since the filter has 3 channels, the output volume will also have 3 channels.

Therefore, the output volume of the convolutional layer is 2x2x3.

722-56-9177 (1) In the Convolutional Neural Networks, the input image is given as-example-1
User Russbear
by
8.6k points