Question
Asked By – jfbeltran
I am trying to understand the strides argument in tf.nn.avg_pool, tf.nn.max_pool, tf.nn.conv2d.
The documentation repeatedly says
strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.
My questions are:
- What do each of the 4+ integers represent?
- Why must they have strides[0] = strides[3] = 1 for convnets?
- In this example we see
tf.reshape(_X,shape=[-1, 28, 28, 1])
. Why -1?
Sadly the examples in the docs for reshape using -1 don’t translate too well to this scenario.
Now we will see solution for issue: Tensorflow Strides Argument
Answer
The pooling and convolutional ops slide a “window” across the input tensor. Using tf.nn.conv2d
as an example: If the input tensor has 4 dimensions: [batch, height, width, channels]
, then the convolution operates on a 2D window on the height, width
dimensions.
strides
determines how much the window shifts by in each of the dimensions. The typical use sets the first (the batch) and last (the depth) stride to 1.
Let’s use a very concrete example: Running a 2-d convolution over a 32×32 greyscale input image. I say greyscale because then the input image has depth=1, which helps keep it simple. Let that image look like this:
00 01 02 03 04 ...
10 11 12 13 14 ...
20 21 22 23 24 ...
30 31 32 33 34 ...
...
Let’s run a 2×2 convolution window over a single example (batch size = 1). We’ll give the convolution an output channel depth of 8.
The input to the convolution has shape=[1, 32, 32, 1]
.
If you specify strides=[1,1,1,1]
with padding=SAME
, then the output of the filter will be [1, 32, 32, 8].
The filter will first create an output for:
F(00 01
10 11)
And then for:
F(01 02
11 12)
and so on. Then it will move to the second row, calculating:
F(10, 11
20, 21)
then
F(11, 12
21, 22)
If you specify a stride of [1, 2, 2, 1] it won’t do overlapping windows. It will compute:
F(00, 01
10, 11)
and then
F(02, 03
12, 13)
The stride operates similarly for the pooling operators.
Question 2: Why strides [1, x, y, 1] for convnets
The first 1 is the batch: You don’t usually want to skip over examples in your batch, or you shouldn’t have included them in the first place. 🙂
The last 1 is the depth of the convolution: You don’t usually want to skip inputs, for the same reason.
The conv2d operator is more general, so you could create convolutions that slide the window along other dimensions, but that’s not a typical use in convnets. The typical use is to use them spatially.
Why reshape to -1 -1 is a placeholder that says “adjust as necessary to match the size needed for the full tensor.” It’s a way of making the code be independent of the input batch size, so that you can change your pipeline and not have to adjust the batch size everywhere in the code.
This question is answered By – dga
This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0