How to sweep a neural-network through an image with tensorflow?

My question is about finding an efficient (mostly in term of parameters count) way to implement a sliding window in tensorflow (1.4) in order to apply a neural network through the image and produce a 2-d map with each pixel (or region) representing the network output for the corresponding receptive field (which in this case is the sliding window itself).

In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.

The two architectures can be briefly described as:

MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.

PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.

While I cannot find any tensorflow implementation of MTANN, I found the PatchGAN implementation, which is considered as a convolutional network, but I couldn't figure out how to implement this in practice.

Let's say I got a pre-trained network of which I got the output tensor. I understand that convolution is the way to go, since a convolutional layer operates over a local region of the input and what is I'm trying to do can be clearly represented as a convolutional network. However, what if I already have the network that generates the sub-maps from a given window of fixed-size?

E.g. I got a tensor

sub_map = network(input_patch)

which returns a [1,2,2,1] maps from a [1,8,8,3] image (corresponding to a 3-layer FCN with input size 8, filter size 3x3).
How can I sweep this network on [1,64,64,3] images, in order to produce a [1,64,64,1] map composed of each spatial contribution, like it happens in a convolution?

I've considered these solutions:

Using tf.image.extract_image_patches which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.

Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something like whole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...) but I don't think this would work as the filter is an operator which depends on the receptive field itself.

The ultimate goal is to apply this small network to big images (eg. 1024x1024) in an efficient way, since my current model downscales progressively the images and wouldn't fit in memory due to the huge number of parameters.

Can anyone help me to get a better understanding of what I am missing?

Thank you

edited Nov 21 '18 at 11:26

asked Nov 21 '18 at 11:16

EdoardoG

The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.

– hkchengrex
Nov 30 '18 at 7:36

add a comment |

In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.

The two architectures can be briefly described as:

MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.

PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.

E.g. I got a tensor

sub_map = network(input_patch)

I've considered these solutions:

Using tf.image.extract_image_patches which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.

Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something like whole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...) but I don't think this would work as the filter is an operator which depends on the receptive field itself.

Can anyone help me to get a better understanding of what I am missing?

Thank you

edited Nov 21 '18 at 11:26

asked Nov 21 '18 at 11:16

EdoardoG

The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.

– hkchengrex
Nov 30 '18 at 7:36

add a comment |

In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.

The two architectures can be briefly described as:

MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.

PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.

E.g. I got a tensor

sub_map = network(input_patch)

I've considered these solutions:

Using tf.image.extract_image_patches which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.

Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something like whole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...) but I don't think this would work as the filter is an operator which depends on the receptive field itself.

Can anyone help me to get a better understanding of what I am missing?

Thank you

edited Nov 21 '18 at 11:26

asked Nov 21 '18 at 11:16

EdoardoG

In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.

The two architectures can be briefly described as:

MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.

PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.

E.g. I got a tensor

sub_map = network(input_patch)

I've considered these solutions:

Using tf.image.extract_image_patches which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.

Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something like whole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...) but I don't think this would work as the filter is an operator which depends on the receptive field itself.

Can anyone help me to get a better understanding of what I am missing?

Thank you

tensorflow neural-network deep-learning computer-vision conv-neural-network

edited Nov 21 '18 at 11:26

asked Nov 21 '18 at 11:16

EdoardoG

edited Nov 21 '18 at 11:26

asked Nov 21 '18 at 11:16

EdoardoG

edited Nov 21 '18 at 11:26

asked Nov 21 '18 at 11:16

EdoardoG

asked Nov 21 '18 at 11:16

EdoardoG

asked Nov 21 '18 at 11:16

EdoardoG

The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.

– hkchengrex
Nov 30 '18 at 7:36

add a comment |

The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.

– hkchengrex
Nov 30 '18 at 7:36

The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.

– hkchengrex
Nov 30 '18 at 7:36

add a comment |

1 Answer
1

active

oldest

votes

I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer.
The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.

In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.

answered Nov 28 '18 at 11:06

EdoardoG

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53410952%2fhow-to-sweep-a-neural-network-through-an-image-with-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Nov 28 '18 at 11:06

EdoardoG

add a comment |

answered Nov 28 '18 at 11:06

EdoardoG

add a comment |

answered Nov 28 '18 at 11:06

EdoardoG

answered Nov 28 '18 at 11:06

EdoardoG

answered Nov 28 '18 at 11:06

EdoardoG

answered Nov 28 '18 at 11:06

EdoardoG

answered Nov 28 '18 at 11:06

EdoardoG

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu