How to sweep a neural-network through an image with tensorflow?
My question is about finding an efficient (mostly in term of parameters count) way to implement a sliding window in tensorflow (1.4) in order to apply a neural network through the image and produce a 2-d map with each pixel (or region) representing the network output for the corresponding receptive field (which in this case is the sliding window itself).
In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.
The two architectures can be briefly described as:
MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.
PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.
While I cannot find any tensorflow implementation of MTANN, I found the PatchGAN implementation, which is considered as a convolutional network, but I couldn't figure out how to implement this in practice.
Let's say I got a pre-trained network of which I got the output tensor. I understand that convolution is the way to go, since a convolutional layer operates over a local region of the input and what is I'm trying to do can be clearly represented as a convolutional network. However, what if I already have the network that generates the sub-maps from a given window of fixed-size?
E.g. I got a tensor
sub_map = network(input_patch)
which returns a [1,2,2,1] maps from a [1,8,8,3] image (corresponding to a 3-layer FCN with input size 8, filter size 3x3).
How can I sweep this network on [1,64,64,3] images, in order to produce a [1,64,64,1] map composed of each spatial contribution, like it happens in a convolution?
I've considered these solutions:
Using
tf.image.extract_image_patches
which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something likewhole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...)
but I don't think this would work as the filter is an operator which depends on the receptive field itself.
The ultimate goal is to apply this small network to big images (eg. 1024x1024) in an efficient way, since my current model downscales progressively the images and wouldn't fit in memory due to the huge number of parameters.
Can anyone help me to get a better understanding of what I am missing?
Thank you
tensorflow neural-network deep-learning computer-vision conv-neural-network
add a comment |
My question is about finding an efficient (mostly in term of parameters count) way to implement a sliding window in tensorflow (1.4) in order to apply a neural network through the image and produce a 2-d map with each pixel (or region) representing the network output for the corresponding receptive field (which in this case is the sliding window itself).
In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.
The two architectures can be briefly described as:
MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.
PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.
While I cannot find any tensorflow implementation of MTANN, I found the PatchGAN implementation, which is considered as a convolutional network, but I couldn't figure out how to implement this in practice.
Let's say I got a pre-trained network of which I got the output tensor. I understand that convolution is the way to go, since a convolutional layer operates over a local region of the input and what is I'm trying to do can be clearly represented as a convolutional network. However, what if I already have the network that generates the sub-maps from a given window of fixed-size?
E.g. I got a tensor
sub_map = network(input_patch)
which returns a [1,2,2,1] maps from a [1,8,8,3] image (corresponding to a 3-layer FCN with input size 8, filter size 3x3).
How can I sweep this network on [1,64,64,3] images, in order to produce a [1,64,64,1] map composed of each spatial contribution, like it happens in a convolution?
I've considered these solutions:
Using
tf.image.extract_image_patches
which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something likewhole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...)
but I don't think this would work as the filter is an operator which depends on the receptive field itself.
The ultimate goal is to apply this small network to big images (eg. 1024x1024) in an efficient way, since my current model downscales progressively the images and wouldn't fit in memory due to the huge number of parameters.
Can anyone help me to get a better understanding of what I am missing?
Thank you
tensorflow neural-network deep-learning computer-vision conv-neural-network
The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.
– hkchengrex
Nov 30 '18 at 7:36
add a comment |
My question is about finding an efficient (mostly in term of parameters count) way to implement a sliding window in tensorflow (1.4) in order to apply a neural network through the image and produce a 2-d map with each pixel (or region) representing the network output for the corresponding receptive field (which in this case is the sliding window itself).
In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.
The two architectures can be briefly described as:
MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.
PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.
While I cannot find any tensorflow implementation of MTANN, I found the PatchGAN implementation, which is considered as a convolutional network, but I couldn't figure out how to implement this in practice.
Let's say I got a pre-trained network of which I got the output tensor. I understand that convolution is the way to go, since a convolutional layer operates over a local region of the input and what is I'm trying to do can be clearly represented as a convolutional network. However, what if I already have the network that generates the sub-maps from a given window of fixed-size?
E.g. I got a tensor
sub_map = network(input_patch)
which returns a [1,2,2,1] maps from a [1,8,8,3] image (corresponding to a 3-layer FCN with input size 8, filter size 3x3).
How can I sweep this network on [1,64,64,3] images, in order to produce a [1,64,64,1] map composed of each spatial contribution, like it happens in a convolution?
I've considered these solutions:
Using
tf.image.extract_image_patches
which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something likewhole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...)
but I don't think this would work as the filter is an operator which depends on the receptive field itself.
The ultimate goal is to apply this small network to big images (eg. 1024x1024) in an efficient way, since my current model downscales progressively the images and wouldn't fit in memory due to the huge number of parameters.
Can anyone help me to get a better understanding of what I am missing?
Thank you
tensorflow neural-network deep-learning computer-vision conv-neural-network
My question is about finding an efficient (mostly in term of parameters count) way to implement a sliding window in tensorflow (1.4) in order to apply a neural network through the image and produce a 2-d map with each pixel (or region) representing the network output for the corresponding receptive field (which in this case is the sliding window itself).
In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.
The two architectures can be briefly described as:
MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.
PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.
While I cannot find any tensorflow implementation of MTANN, I found the PatchGAN implementation, which is considered as a convolutional network, but I couldn't figure out how to implement this in practice.
Let's say I got a pre-trained network of which I got the output tensor. I understand that convolution is the way to go, since a convolutional layer operates over a local region of the input and what is I'm trying to do can be clearly represented as a convolutional network. However, what if I already have the network that generates the sub-maps from a given window of fixed-size?
E.g. I got a tensor
sub_map = network(input_patch)
which returns a [1,2,2,1] maps from a [1,8,8,3] image (corresponding to a 3-layer FCN with input size 8, filter size 3x3).
How can I sweep this network on [1,64,64,3] images, in order to produce a [1,64,64,1] map composed of each spatial contribution, like it happens in a convolution?
I've considered these solutions:
Using
tf.image.extract_image_patches
which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something likewhole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...)
but I don't think this would work as the filter is an operator which depends on the receptive field itself.
The ultimate goal is to apply this small network to big images (eg. 1024x1024) in an efficient way, since my current model downscales progressively the images and wouldn't fit in memory due to the huge number of parameters.
Can anyone help me to get a better understanding of what I am missing?
Thank you
tensorflow neural-network deep-learning computer-vision conv-neural-network
tensorflow neural-network deep-learning computer-vision conv-neural-network
edited Nov 21 '18 at 11:26
EdoardoG
asked Nov 21 '18 at 11:16
EdoardoGEdoardoG
62
62
The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.
– hkchengrex
Nov 30 '18 at 7:36
add a comment |
The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.
– hkchengrex
Nov 30 '18 at 7:36
The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.
– hkchengrex
Nov 30 '18 at 7:36
The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.
– hkchengrex
Nov 30 '18 at 7:36
add a comment |
1 Answer
1
active
oldest
votes
I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer.
The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.
In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53410952%2fhow-to-sweep-a-neural-network-through-an-image-with-tensorflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer.
The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.
In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.
add a comment |
I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer.
The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.
In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.
add a comment |
I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer.
The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.
In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.
I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer.
The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.
In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.
answered Nov 28 '18 at 11:06
EdoardoGEdoardoG
62
62
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53410952%2fhow-to-sweep-a-neural-network-through-an-image-with-tensorflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The PatchGAN is just a regular CNN without fully connected layers. Convolution layers don't care -- they take feature maps and give feature maps. When the input spatial dimension is larger, the output spatial dimension will also be larger.
– hkchengrex
Nov 30 '18 at 7:36