Effiecient implementation of conv2 (valid) in image_data
I'm trying to implement the conv2
(2D convolution function in MATLAB) with the 'valid'
argument which return only parts of the convolution that are computed without zero-padded edges, meaning the kernel does not scan beyond the input.
I have this code so far which works but it seems to be needlessly complex as you can see, and I plan on converting to to fixed point and implementing it on hardware later, and the SampleWindow
variable keeps causing me trouble because the coder assigns a dynamic matrix to it.
So I'm looking for a simpler and/or efficient implementation of the function.
function outConvAcc = convn(input, kernel, S)
% Get the input size in terms of rows and cols. The weights should have
% same depth as the input volume(image)
[rowsIn, colsIn, depthInput] = size(input);
% Get the kernel size, considering a square kernel always
F = size(kernel,1);
kernelf=rot90(squeeze(kernel),2);
%% Initialize outputs
sizeRowsOut = ((rowsIn-F)/S) + 1;
sizeColsOut = ((colsIn-F)/S) + 1;
outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);
%% Do the convolution
% Convolve each channel on the input with it's respective kernel channel,
% at the end sum all the channel results.
for r=1:S:(rowsIn-1)
for c=1:S:(colsIn-1)
% Avoid sampling out of the image.
if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)
% Select window on input volume (patch)
sampleWindow = input(r:(r+F)-1,c:(c+F)-1);
% Do the dot product
dotProd =(sampleWindow(:) .* kernelf(:));
n=size(dotProd,1);
dotProdc=0;
for km=1:n % Replace function Sum for code generation
dotProdc=dotProd(km)+dotProdc;
end
% Store result
outConvAcc(ceil(r/S),ceil(c/S),depthInput) = dotProdc;
end
end
end
end
matlab convolution hardware-programming
|
show 4 more comments
I'm trying to implement the conv2
(2D convolution function in MATLAB) with the 'valid'
argument which return only parts of the convolution that are computed without zero-padded edges, meaning the kernel does not scan beyond the input.
I have this code so far which works but it seems to be needlessly complex as you can see, and I plan on converting to to fixed point and implementing it on hardware later, and the SampleWindow
variable keeps causing me trouble because the coder assigns a dynamic matrix to it.
So I'm looking for a simpler and/or efficient implementation of the function.
function outConvAcc = convn(input, kernel, S)
% Get the input size in terms of rows and cols. The weights should have
% same depth as the input volume(image)
[rowsIn, colsIn, depthInput] = size(input);
% Get the kernel size, considering a square kernel always
F = size(kernel,1);
kernelf=rot90(squeeze(kernel),2);
%% Initialize outputs
sizeRowsOut = ((rowsIn-F)/S) + 1;
sizeColsOut = ((colsIn-F)/S) + 1;
outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);
%% Do the convolution
% Convolve each channel on the input with it's respective kernel channel,
% at the end sum all the channel results.
for r=1:S:(rowsIn-1)
for c=1:S:(colsIn-1)
% Avoid sampling out of the image.
if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)
% Select window on input volume (patch)
sampleWindow = input(r:(r+F)-1,c:(c+F)-1);
% Do the dot product
dotProd =(sampleWindow(:) .* kernelf(:));
n=size(dotProd,1);
dotProdc=0;
for km=1:n % Replace function Sum for code generation
dotProdc=dotProd(km)+dotProdc;
end
% Store result
outConvAcc(ceil(r/S),ceil(c/S),depthInput) = dotProdc;
end
end
end
end
matlab convolution hardware-programming
So, what is your question?
– Nicky Mattsson
Nov 20 '18 at 10:41
1
Efficient convolutions usually involve FFT which avoid using the double nested loop you have.
– Sembei Norimaki
Nov 20 '18 at 10:43
@SembeiNorimaki As much as I agree that an efficient convolution usually involvefft
orfft2
in this case, it does not remove the double nested loop, it just hides them in thefft
. The main benefit is that one of the loops is smaller in the manner asfft
is faster thandft
(which I guess does not exist in MATLAB).
– Nicky Mattsson
Nov 20 '18 at 10:51
why not loop onr=1:S:min(rowsIn-1,rowsIn+1-F)
(or assuming F>=2,r=1:S:rowsIn+1-F
) rather than doing a loop then starting the iteration with anif
on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.
– Brice
Nov 20 '18 at 10:56
@NickyMattsson Also, hiding the loop in thefft
means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab
– Luis Mendo
Nov 20 '18 at 11:13
|
show 4 more comments
I'm trying to implement the conv2
(2D convolution function in MATLAB) with the 'valid'
argument which return only parts of the convolution that are computed without zero-padded edges, meaning the kernel does not scan beyond the input.
I have this code so far which works but it seems to be needlessly complex as you can see, and I plan on converting to to fixed point and implementing it on hardware later, and the SampleWindow
variable keeps causing me trouble because the coder assigns a dynamic matrix to it.
So I'm looking for a simpler and/or efficient implementation of the function.
function outConvAcc = convn(input, kernel, S)
% Get the input size in terms of rows and cols. The weights should have
% same depth as the input volume(image)
[rowsIn, colsIn, depthInput] = size(input);
% Get the kernel size, considering a square kernel always
F = size(kernel,1);
kernelf=rot90(squeeze(kernel),2);
%% Initialize outputs
sizeRowsOut = ((rowsIn-F)/S) + 1;
sizeColsOut = ((colsIn-F)/S) + 1;
outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);
%% Do the convolution
% Convolve each channel on the input with it's respective kernel channel,
% at the end sum all the channel results.
for r=1:S:(rowsIn-1)
for c=1:S:(colsIn-1)
% Avoid sampling out of the image.
if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)
% Select window on input volume (patch)
sampleWindow = input(r:(r+F)-1,c:(c+F)-1);
% Do the dot product
dotProd =(sampleWindow(:) .* kernelf(:));
n=size(dotProd,1);
dotProdc=0;
for km=1:n % Replace function Sum for code generation
dotProdc=dotProd(km)+dotProdc;
end
% Store result
outConvAcc(ceil(r/S),ceil(c/S),depthInput) = dotProdc;
end
end
end
end
matlab convolution hardware-programming
I'm trying to implement the conv2
(2D convolution function in MATLAB) with the 'valid'
argument which return only parts of the convolution that are computed without zero-padded edges, meaning the kernel does not scan beyond the input.
I have this code so far which works but it seems to be needlessly complex as you can see, and I plan on converting to to fixed point and implementing it on hardware later, and the SampleWindow
variable keeps causing me trouble because the coder assigns a dynamic matrix to it.
So I'm looking for a simpler and/or efficient implementation of the function.
function outConvAcc = convn(input, kernel, S)
% Get the input size in terms of rows and cols. The weights should have
% same depth as the input volume(image)
[rowsIn, colsIn, depthInput] = size(input);
% Get the kernel size, considering a square kernel always
F = size(kernel,1);
kernelf=rot90(squeeze(kernel),2);
%% Initialize outputs
sizeRowsOut = ((rowsIn-F)/S) + 1;
sizeColsOut = ((colsIn-F)/S) + 1;
outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);
%% Do the convolution
% Convolve each channel on the input with it's respective kernel channel,
% at the end sum all the channel results.
for r=1:S:(rowsIn-1)
for c=1:S:(colsIn-1)
% Avoid sampling out of the image.
if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)
% Select window on input volume (patch)
sampleWindow = input(r:(r+F)-1,c:(c+F)-1);
% Do the dot product
dotProd =(sampleWindow(:) .* kernelf(:));
n=size(dotProd,1);
dotProdc=0;
for km=1:n % Replace function Sum for code generation
dotProdc=dotProd(km)+dotProdc;
end
% Store result
outConvAcc(ceil(r/S),ceil(c/S),depthInput) = dotProdc;
end
end
end
end
matlab convolution hardware-programming
matlab convolution hardware-programming
edited Nov 20 '18 at 15:00


Cris Luengo
19.6k52149
19.6k52149
asked Nov 20 '18 at 10:39
AllaAlla
106
106
So, what is your question?
– Nicky Mattsson
Nov 20 '18 at 10:41
1
Efficient convolutions usually involve FFT which avoid using the double nested loop you have.
– Sembei Norimaki
Nov 20 '18 at 10:43
@SembeiNorimaki As much as I agree that an efficient convolution usually involvefft
orfft2
in this case, it does not remove the double nested loop, it just hides them in thefft
. The main benefit is that one of the loops is smaller in the manner asfft
is faster thandft
(which I guess does not exist in MATLAB).
– Nicky Mattsson
Nov 20 '18 at 10:51
why not loop onr=1:S:min(rowsIn-1,rowsIn+1-F)
(or assuming F>=2,r=1:S:rowsIn+1-F
) rather than doing a loop then starting the iteration with anif
on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.
– Brice
Nov 20 '18 at 10:56
@NickyMattsson Also, hiding the loop in thefft
means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab
– Luis Mendo
Nov 20 '18 at 11:13
|
show 4 more comments
So, what is your question?
– Nicky Mattsson
Nov 20 '18 at 10:41
1
Efficient convolutions usually involve FFT which avoid using the double nested loop you have.
– Sembei Norimaki
Nov 20 '18 at 10:43
@SembeiNorimaki As much as I agree that an efficient convolution usually involvefft
orfft2
in this case, it does not remove the double nested loop, it just hides them in thefft
. The main benefit is that one of the loops is smaller in the manner asfft
is faster thandft
(which I guess does not exist in MATLAB).
– Nicky Mattsson
Nov 20 '18 at 10:51
why not loop onr=1:S:min(rowsIn-1,rowsIn+1-F)
(or assuming F>=2,r=1:S:rowsIn+1-F
) rather than doing a loop then starting the iteration with anif
on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.
– Brice
Nov 20 '18 at 10:56
@NickyMattsson Also, hiding the loop in thefft
means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab
– Luis Mendo
Nov 20 '18 at 11:13
So, what is your question?
– Nicky Mattsson
Nov 20 '18 at 10:41
So, what is your question?
– Nicky Mattsson
Nov 20 '18 at 10:41
1
1
Efficient convolutions usually involve FFT which avoid using the double nested loop you have.
– Sembei Norimaki
Nov 20 '18 at 10:43
Efficient convolutions usually involve FFT which avoid using the double nested loop you have.
– Sembei Norimaki
Nov 20 '18 at 10:43
@SembeiNorimaki As much as I agree that an efficient convolution usually involve
fft
or fft2
in this case, it does not remove the double nested loop, it just hides them in the fft
. The main benefit is that one of the loops is smaller in the manner as fft
is faster than dft
(which I guess does not exist in MATLAB).– Nicky Mattsson
Nov 20 '18 at 10:51
@SembeiNorimaki As much as I agree that an efficient convolution usually involve
fft
or fft2
in this case, it does not remove the double nested loop, it just hides them in the fft
. The main benefit is that one of the loops is smaller in the manner as fft
is faster than dft
(which I guess does not exist in MATLAB).– Nicky Mattsson
Nov 20 '18 at 10:51
why not loop on
r=1:S:min(rowsIn-1,rowsIn+1-F)
(or assuming F>=2, r=1:S:rowsIn+1-F
) rather than doing a loop then starting the iteration with an if
on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.– Brice
Nov 20 '18 at 10:56
why not loop on
r=1:S:min(rowsIn-1,rowsIn+1-F)
(or assuming F>=2, r=1:S:rowsIn+1-F
) rather than doing a loop then starting the iteration with an if
on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.– Brice
Nov 20 '18 at 10:56
@NickyMattsson Also, hiding the loop in the
fft
means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab– Luis Mendo
Nov 20 '18 at 11:13
@NickyMattsson Also, hiding the loop in the
fft
means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab– Luis Mendo
Nov 20 '18 at 11:13
|
show 4 more comments
1 Answer
1
active
oldest
votes
First of all, if the image doesn't divide evenly by S
you get an error. You need to add floor
here:
sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;
The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S
, and computing the location in the output image by dividing by S
, loop over the output image, then compute the location in the input image:
for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end
(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)
Here, you don't need the if
any more. input
is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.
Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r
and c
loops for a good speed boost (noticeable only with larger images):
for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;
Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:
sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));
or simply:
dotProd = dot(sampleWindow(:), kernelf(:));
or even:
dotProd = sampleWindow(:).' * kernelf(:);
But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:
dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end
This is much cleaner and readable (IMO) because there are fewer variables to keep track of.
Oh, and one more issue: color images. If depthInput>1
, then you read from the first channel, and write to the last channel. You are not doing color processing at all!
Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.
Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.
– Alla
Nov 20 '18 at 16:01
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391170%2feffiecient-implementation-of-conv2-valid-in-image-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First of all, if the image doesn't divide evenly by S
you get an error. You need to add floor
here:
sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;
The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S
, and computing the location in the output image by dividing by S
, loop over the output image, then compute the location in the input image:
for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end
(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)
Here, you don't need the if
any more. input
is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.
Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r
and c
loops for a good speed boost (noticeable only with larger images):
for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;
Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:
sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));
or simply:
dotProd = dot(sampleWindow(:), kernelf(:));
or even:
dotProd = sampleWindow(:).' * kernelf(:);
But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:
dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end
This is much cleaner and readable (IMO) because there are fewer variables to keep track of.
Oh, and one more issue: color images. If depthInput>1
, then you read from the first channel, and write to the last channel. You are not doing color processing at all!
Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.
Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.
– Alla
Nov 20 '18 at 16:01
add a comment |
First of all, if the image doesn't divide evenly by S
you get an error. You need to add floor
here:
sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;
The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S
, and computing the location in the output image by dividing by S
, loop over the output image, then compute the location in the input image:
for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end
(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)
Here, you don't need the if
any more. input
is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.
Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r
and c
loops for a good speed boost (noticeable only with larger images):
for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;
Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:
sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));
or simply:
dotProd = dot(sampleWindow(:), kernelf(:));
or even:
dotProd = sampleWindow(:).' * kernelf(:);
But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:
dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end
This is much cleaner and readable (IMO) because there are fewer variables to keep track of.
Oh, and one more issue: color images. If depthInput>1
, then you read from the first channel, and write to the last channel. You are not doing color processing at all!
Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.
Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.
– Alla
Nov 20 '18 at 16:01
add a comment |
First of all, if the image doesn't divide evenly by S
you get an error. You need to add floor
here:
sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;
The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S
, and computing the location in the output image by dividing by S
, loop over the output image, then compute the location in the input image:
for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end
(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)
Here, you don't need the if
any more. input
is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.
Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r
and c
loops for a good speed boost (noticeable only with larger images):
for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;
Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:
sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));
or simply:
dotProd = dot(sampleWindow(:), kernelf(:));
or even:
dotProd = sampleWindow(:).' * kernelf(:);
But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:
dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end
This is much cleaner and readable (IMO) because there are fewer variables to keep track of.
Oh, and one more issue: color images. If depthInput>1
, then you read from the first channel, and write to the last channel. You are not doing color processing at all!
Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.
First of all, if the image doesn't divide evenly by S
you get an error. You need to add floor
here:
sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;
The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S
, and computing the location in the output image by dividing by S
, loop over the output image, then compute the location in the input image:
for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end
(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)
Here, you don't need the if
any more. input
is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.
Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r
and c
loops for a good speed boost (noticeable only with larger images):
for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;
Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:
sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));
or simply:
dotProd = dot(sampleWindow(:), kernelf(:));
or even:
dotProd = sampleWindow(:).' * kernelf(:);
But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:
dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end
This is much cleaner and readable (IMO) because there are fewer variables to keep track of.
Oh, and one more issue: color images. If depthInput>1
, then you read from the first channel, and write to the last channel. You are not doing color processing at all!
Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.
answered Nov 20 '18 at 15:17


Cris LuengoCris Luengo
19.6k52149
19.6k52149
Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.
– Alla
Nov 20 '18 at 16:01
add a comment |
Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.
– Alla
Nov 20 '18 at 16:01
Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.
– Alla
Nov 20 '18 at 16:01
Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.
– Alla
Nov 20 '18 at 16:01
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391170%2feffiecient-implementation-of-conv2-valid-in-image-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
So, what is your question?
– Nicky Mattsson
Nov 20 '18 at 10:41
1
Efficient convolutions usually involve FFT which avoid using the double nested loop you have.
– Sembei Norimaki
Nov 20 '18 at 10:43
@SembeiNorimaki As much as I agree that an efficient convolution usually involve
fft
orfft2
in this case, it does not remove the double nested loop, it just hides them in thefft
. The main benefit is that one of the loops is smaller in the manner asfft
is faster thandft
(which I guess does not exist in MATLAB).– Nicky Mattsson
Nov 20 '18 at 10:51
why not loop on
r=1:S:min(rowsIn-1,rowsIn+1-F)
(or assuming F>=2,r=1:S:rowsIn+1-F
) rather than doing a loop then starting the iteration with anif
on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.– Brice
Nov 20 '18 at 10:56
@NickyMattsson Also, hiding the loop in the
fft
means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab– Luis Mendo
Nov 20 '18 at 11:13