Effiecient implementation of conv2 (valid) in image_data












0















I'm trying to implement the conv2 (2D convolution function in MATLAB) with the 'valid' argument which return only parts of the convolution that are computed without zero-padded edges, meaning the kernel does not scan beyond the input.



I have this code so far which works but it seems to be needlessly complex as you can see, and I plan on converting to to fixed point and implementing it on hardware later, and the SampleWindow variable keeps causing me trouble because the coder assigns a dynamic matrix to it.



So I'm looking for a simpler and/or efficient implementation of the function.



function outConvAcc = convn(input, kernel, S)
% Get the input size in terms of rows and cols. The weights should have
% same depth as the input volume(image)
[rowsIn, colsIn, depthInput] = size(input);


% Get the kernel size, considering a square kernel always
F = size(kernel,1);
kernelf=rot90(squeeze(kernel),2);
%% Initialize outputs
sizeRowsOut = ((rowsIn-F)/S) + 1;
sizeColsOut = ((colsIn-F)/S) + 1;
outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);

%% Do the convolution
% Convolve each channel on the input with it's respective kernel channel,
% at the end sum all the channel results.

for r=1:S:(rowsIn-1)
for c=1:S:(colsIn-1)
% Avoid sampling out of the image.
if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)
% Select window on input volume (patch)
sampleWindow = input(r:(r+F)-1,c:(c+F)-1);
% Do the dot product
dotProd =(sampleWindow(:) .* kernelf(:));
n=size(dotProd,1);
dotProdc=0;
for km=1:n % Replace function Sum for code generation
dotProdc=dotProd(km)+dotProdc;
end
% Store result
outConvAcc(ceil(r/S),ceil(c/S),depthInput) = dotProdc;
end
end
end
end









share|improve this question

























  • So, what is your question?

    – Nicky Mattsson
    Nov 20 '18 at 10:41






  • 1





    Efficient convolutions usually involve FFT which avoid using the double nested loop you have.

    – Sembei Norimaki
    Nov 20 '18 at 10:43











  • @SembeiNorimaki As much as I agree that an efficient convolution usually involve fft or fft2 in this case, it does not remove the double nested loop, it just hides them in the fft. The main benefit is that one of the loops is smaller in the manner as fft is faster than dft (which I guess does not exist in MATLAB).

    – Nicky Mattsson
    Nov 20 '18 at 10:51











  • why not loop on r=1:S:min(rowsIn-1,rowsIn+1-F) (or assuming F>=2, r=1:S:rowsIn+1-F) rather than doing a loop then starting the iteration with an if on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.

    – Brice
    Nov 20 '18 at 10:56













  • @NickyMattsson Also, hiding the loop in the fft means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab

    – Luis Mendo
    Nov 20 '18 at 11:13


















0















I'm trying to implement the conv2 (2D convolution function in MATLAB) with the 'valid' argument which return only parts of the convolution that are computed without zero-padded edges, meaning the kernel does not scan beyond the input.



I have this code so far which works but it seems to be needlessly complex as you can see, and I plan on converting to to fixed point and implementing it on hardware later, and the SampleWindow variable keeps causing me trouble because the coder assigns a dynamic matrix to it.



So I'm looking for a simpler and/or efficient implementation of the function.



function outConvAcc = convn(input, kernel, S)
% Get the input size in terms of rows and cols. The weights should have
% same depth as the input volume(image)
[rowsIn, colsIn, depthInput] = size(input);


% Get the kernel size, considering a square kernel always
F = size(kernel,1);
kernelf=rot90(squeeze(kernel),2);
%% Initialize outputs
sizeRowsOut = ((rowsIn-F)/S) + 1;
sizeColsOut = ((colsIn-F)/S) + 1;
outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);

%% Do the convolution
% Convolve each channel on the input with it's respective kernel channel,
% at the end sum all the channel results.

for r=1:S:(rowsIn-1)
for c=1:S:(colsIn-1)
% Avoid sampling out of the image.
if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)
% Select window on input volume (patch)
sampleWindow = input(r:(r+F)-1,c:(c+F)-1);
% Do the dot product
dotProd =(sampleWindow(:) .* kernelf(:));
n=size(dotProd,1);
dotProdc=0;
for km=1:n % Replace function Sum for code generation
dotProdc=dotProd(km)+dotProdc;
end
% Store result
outConvAcc(ceil(r/S),ceil(c/S),depthInput) = dotProdc;
end
end
end
end









share|improve this question

























  • So, what is your question?

    – Nicky Mattsson
    Nov 20 '18 at 10:41






  • 1





    Efficient convolutions usually involve FFT which avoid using the double nested loop you have.

    – Sembei Norimaki
    Nov 20 '18 at 10:43











  • @SembeiNorimaki As much as I agree that an efficient convolution usually involve fft or fft2 in this case, it does not remove the double nested loop, it just hides them in the fft. The main benefit is that one of the loops is smaller in the manner as fft is faster than dft (which I guess does not exist in MATLAB).

    – Nicky Mattsson
    Nov 20 '18 at 10:51











  • why not loop on r=1:S:min(rowsIn-1,rowsIn+1-F) (or assuming F>=2, r=1:S:rowsIn+1-F) rather than doing a loop then starting the iteration with an if on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.

    – Brice
    Nov 20 '18 at 10:56













  • @NickyMattsson Also, hiding the loop in the fft means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab

    – Luis Mendo
    Nov 20 '18 at 11:13
















0












0








0








I'm trying to implement the conv2 (2D convolution function in MATLAB) with the 'valid' argument which return only parts of the convolution that are computed without zero-padded edges, meaning the kernel does not scan beyond the input.



I have this code so far which works but it seems to be needlessly complex as you can see, and I plan on converting to to fixed point and implementing it on hardware later, and the SampleWindow variable keeps causing me trouble because the coder assigns a dynamic matrix to it.



So I'm looking for a simpler and/or efficient implementation of the function.



function outConvAcc = convn(input, kernel, S)
% Get the input size in terms of rows and cols. The weights should have
% same depth as the input volume(image)
[rowsIn, colsIn, depthInput] = size(input);


% Get the kernel size, considering a square kernel always
F = size(kernel,1);
kernelf=rot90(squeeze(kernel),2);
%% Initialize outputs
sizeRowsOut = ((rowsIn-F)/S) + 1;
sizeColsOut = ((colsIn-F)/S) + 1;
outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);

%% Do the convolution
% Convolve each channel on the input with it's respective kernel channel,
% at the end sum all the channel results.

for r=1:S:(rowsIn-1)
for c=1:S:(colsIn-1)
% Avoid sampling out of the image.
if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)
% Select window on input volume (patch)
sampleWindow = input(r:(r+F)-1,c:(c+F)-1);
% Do the dot product
dotProd =(sampleWindow(:) .* kernelf(:));
n=size(dotProd,1);
dotProdc=0;
for km=1:n % Replace function Sum for code generation
dotProdc=dotProd(km)+dotProdc;
end
% Store result
outConvAcc(ceil(r/S),ceil(c/S),depthInput) = dotProdc;
end
end
end
end









share|improve this question
















I'm trying to implement the conv2 (2D convolution function in MATLAB) with the 'valid' argument which return only parts of the convolution that are computed without zero-padded edges, meaning the kernel does not scan beyond the input.



I have this code so far which works but it seems to be needlessly complex as you can see, and I plan on converting to to fixed point and implementing it on hardware later, and the SampleWindow variable keeps causing me trouble because the coder assigns a dynamic matrix to it.



So I'm looking for a simpler and/or efficient implementation of the function.



function outConvAcc = convn(input, kernel, S)
% Get the input size in terms of rows and cols. The weights should have
% same depth as the input volume(image)
[rowsIn, colsIn, depthInput] = size(input);


% Get the kernel size, considering a square kernel always
F = size(kernel,1);
kernelf=rot90(squeeze(kernel),2);
%% Initialize outputs
sizeRowsOut = ((rowsIn-F)/S) + 1;
sizeColsOut = ((colsIn-F)/S) + 1;
outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);

%% Do the convolution
% Convolve each channel on the input with it's respective kernel channel,
% at the end sum all the channel results.

for r=1:S:(rowsIn-1)
for c=1:S:(colsIn-1)
% Avoid sampling out of the image.
if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)
% Select window on input volume (patch)
sampleWindow = input(r:(r+F)-1,c:(c+F)-1);
% Do the dot product
dotProd =(sampleWindow(:) .* kernelf(:));
n=size(dotProd,1);
dotProdc=0;
for km=1:n % Replace function Sum for code generation
dotProdc=dotProd(km)+dotProdc;
end
% Store result
outConvAcc(ceil(r/S),ceil(c/S),depthInput) = dotProdc;
end
end
end
end






matlab convolution hardware-programming






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 15:00









Cris Luengo

19.6k52149




19.6k52149










asked Nov 20 '18 at 10:39









AllaAlla

106




106













  • So, what is your question?

    – Nicky Mattsson
    Nov 20 '18 at 10:41






  • 1





    Efficient convolutions usually involve FFT which avoid using the double nested loop you have.

    – Sembei Norimaki
    Nov 20 '18 at 10:43











  • @SembeiNorimaki As much as I agree that an efficient convolution usually involve fft or fft2 in this case, it does not remove the double nested loop, it just hides them in the fft. The main benefit is that one of the loops is smaller in the manner as fft is faster than dft (which I guess does not exist in MATLAB).

    – Nicky Mattsson
    Nov 20 '18 at 10:51











  • why not loop on r=1:S:min(rowsIn-1,rowsIn+1-F) (or assuming F>=2, r=1:S:rowsIn+1-F) rather than doing a loop then starting the iteration with an if on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.

    – Brice
    Nov 20 '18 at 10:56













  • @NickyMattsson Also, hiding the loop in the fft means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab

    – Luis Mendo
    Nov 20 '18 at 11:13





















  • So, what is your question?

    – Nicky Mattsson
    Nov 20 '18 at 10:41






  • 1





    Efficient convolutions usually involve FFT which avoid using the double nested loop you have.

    – Sembei Norimaki
    Nov 20 '18 at 10:43











  • @SembeiNorimaki As much as I agree that an efficient convolution usually involve fft or fft2 in this case, it does not remove the double nested loop, it just hides them in the fft. The main benefit is that one of the loops is smaller in the manner as fft is faster than dft (which I guess does not exist in MATLAB).

    – Nicky Mattsson
    Nov 20 '18 at 10:51











  • why not loop on r=1:S:min(rowsIn-1,rowsIn+1-F) (or assuming F>=2, r=1:S:rowsIn+1-F) rather than doing a loop then starting the iteration with an if on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.

    – Brice
    Nov 20 '18 at 10:56













  • @NickyMattsson Also, hiding the loop in the fft means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab

    – Luis Mendo
    Nov 20 '18 at 11:13



















So, what is your question?

– Nicky Mattsson
Nov 20 '18 at 10:41





So, what is your question?

– Nicky Mattsson
Nov 20 '18 at 10:41




1




1





Efficient convolutions usually involve FFT which avoid using the double nested loop you have.

– Sembei Norimaki
Nov 20 '18 at 10:43





Efficient convolutions usually involve FFT which avoid using the double nested loop you have.

– Sembei Norimaki
Nov 20 '18 at 10:43













@SembeiNorimaki As much as I agree that an efficient convolution usually involve fft or fft2 in this case, it does not remove the double nested loop, it just hides them in the fft. The main benefit is that one of the loops is smaller in the manner as fft is faster than dft (which I guess does not exist in MATLAB).

– Nicky Mattsson
Nov 20 '18 at 10:51





@SembeiNorimaki As much as I agree that an efficient convolution usually involve fft or fft2 in this case, it does not remove the double nested loop, it just hides them in the fft. The main benefit is that one of the loops is smaller in the manner as fft is faster than dft (which I guess does not exist in MATLAB).

– Nicky Mattsson
Nov 20 '18 at 10:51













why not loop on r=1:S:min(rowsIn-1,rowsIn+1-F) (or assuming F>=2, r=1:S:rowsIn+1-F) rather than doing a loop then starting the iteration with an if on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.

– Brice
Nov 20 '18 at 10:56







why not loop on r=1:S:min(rowsIn-1,rowsIn+1-F) (or assuming F>=2, r=1:S:rowsIn+1-F) rather than doing a loop then starting the iteration with an if on counter value? Other than that, I do not see much needless complexity if the final purpose is to convert that code to C or another low level language.

– Brice
Nov 20 '18 at 10:56















@NickyMattsson Also, hiding the loop in the fft means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab

– Luis Mendo
Nov 20 '18 at 11:13







@NickyMattsson Also, hiding the loop in the fft means the loop is done internally (in C probably), which is faster than doing explicitly in Matlab

– Luis Mendo
Nov 20 '18 at 11:13














1 Answer
1






active

oldest

votes


















0














First of all, if the image doesn't divide evenly by S you get an error. You need to add floor here:



sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;




The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S, and computing the location in the output image by dividing by S, loop over the output image, then compute the location in the input image:



for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end


(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)



Here, you don't need the if any more. input is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.





Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r and c loops for a good speed boost (noticeable only with larger images):



for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;




Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:



sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));


or simply:



dotProd = dot(sampleWindow(:), kernelf(:));


or even:



dotProd = sampleWindow(:).' * kernelf(:);


But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:



dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end


This is much cleaner and readable (IMO) because there are fewer variables to keep track of.





Oh, and one more issue: color images. If depthInput>1, then you read from the first channel, and write to the last channel. You are not doing color processing at all!



Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.






share|improve this answer
























  • Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.

    – Alla
    Nov 20 '18 at 16:01











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391170%2feffiecient-implementation-of-conv2-valid-in-image-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














First of all, if the image doesn't divide evenly by S you get an error. You need to add floor here:



sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;




The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S, and computing the location in the output image by dividing by S, loop over the output image, then compute the location in the input image:



for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end


(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)



Here, you don't need the if any more. input is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.





Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r and c loops for a good speed boost (noticeable only with larger images):



for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;




Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:



sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));


or simply:



dotProd = dot(sampleWindow(:), kernelf(:));


or even:



dotProd = sampleWindow(:).' * kernelf(:);


But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:



dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end


This is much cleaner and readable (IMO) because there are fewer variables to keep track of.





Oh, and one more issue: color images. If depthInput>1, then you read from the first channel, and write to the last channel. You are not doing color processing at all!



Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.






share|improve this answer
























  • Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.

    – Alla
    Nov 20 '18 at 16:01
















0














First of all, if the image doesn't divide evenly by S you get an error. You need to add floor here:



sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;




The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S, and computing the location in the output image by dividing by S, loop over the output image, then compute the location in the input image:



for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end


(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)



Here, you don't need the if any more. input is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.





Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r and c loops for a good speed boost (noticeable only with larger images):



for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;




Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:



sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));


or simply:



dotProd = dot(sampleWindow(:), kernelf(:));


or even:



dotProd = sampleWindow(:).' * kernelf(:);


But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:



dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end


This is much cleaner and readable (IMO) because there are fewer variables to keep track of.





Oh, and one more issue: color images. If depthInput>1, then you read from the first channel, and write to the last channel. You are not doing color processing at all!



Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.






share|improve this answer
























  • Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.

    – Alla
    Nov 20 '18 at 16:01














0












0








0







First of all, if the image doesn't divide evenly by S you get an error. You need to add floor here:



sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;




The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S, and computing the location in the output image by dividing by S, loop over the output image, then compute the location in the input image:



for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end


(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)



Here, you don't need the if any more. input is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.





Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r and c loops for a good speed boost (noticeable only with larger images):



for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;




Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:



sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));


or simply:



dotProd = dot(sampleWindow(:), kernelf(:));


or even:



dotProd = sampleWindow(:).' * kernelf(:);


But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:



dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end


This is much cleaner and readable (IMO) because there are fewer variables to keep track of.





Oh, and one more issue: color images. If depthInput>1, then you read from the first channel, and write to the last channel. You are not doing color processing at all!



Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.






share|improve this answer













First of all, if the image doesn't divide evenly by S you get an error. You need to add floor here:



sizeRowsOut = floor((rowsIn-F)/S) + 1;
sizeColsOut = floor((colsIn-F)/S) + 1;




The main double-loop can be simplified a little bit. Instead of looping over the input image in steps of S, and computing the location in the output image by dividing by S, loop over the output image, then compute the location in the input image:



for r=1:sizeRowsOut
r_in = (r-1)*S; % NOTE! the actual location is r_in+1
for c=1:sizeColsOut
c_in = (c-1)*S;
sampleWindow = input(r_in+(1:F),c_in+(1:F));
% ...
outConvAcc(r,c,depthInput) = dotProdc;
end
end


(Note that all this indexing looks a little tidier with 0-based indexing, but alas.)



Here, you don't need the if any more. input is guaranteed to be large enough to fit that kernel, by the way that the indices are computed.





Next, you need to be aware of the order of data in memory, and loop such that you access the data in that order. This optimizes cache usage. MATLAB is column-major, meaning that each column is stored consecutively. Your inner loop goes along a row (across columns), meaning you are looping in the wrong order. Simply swap the r and c loops for a good speed boost (noticeable only with larger images):



for c=1:sizeColsOut
c_in = (c-1)*S;
for r=1:sizeRowsOut
r_in = (r-1)*S;




Finally the bit inside the main double loop: It is more complicated than it needs to be because of the loop. In MATLAB you don't need it:



sampleWindow = input(r_in+(1:F),c_in+(1:F));
dotProd = sum(sampleWindow(:) .* kernelf(:));


or simply:



dotProd = dot(sampleWindow(:), kernelf(:));


or even:



dotProd = sampleWindow(:).' * kernelf(:);


But if you want to write out the inner loop as well, I recommend that you don't copy out a piece of the image, but access the data in the image directly:



dotProd = 0;
for jj=1:F
for ii=1:F
dotProd = dotProd + input(r_in+ii,c_in+jj) * kernelf(ii,jj);
end
end


This is much cleaner and readable (IMO) because there are fewer variables to keep track of.





Oh, and one more issue: color images. If depthInput>1, then you read from the first channel, and write to the last channel. You are not doing color processing at all!



Because the color dimension is stored last, the most efficient thing to do is to call this gray-value convolution once for each color channel.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 20 '18 at 15:17









Cris LuengoCris Luengo

19.6k52149




19.6k52149













  • Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.

    – Alla
    Nov 20 '18 at 16:01



















  • Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.

    – Alla
    Nov 20 '18 at 16:01

















Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.

– Alla
Nov 20 '18 at 16:01





Thank you Chris, this was both very helpful and very well presented, a great teaching moement. the reason I had a loop for the dotProd was for hardware code generation since those functions aren't supported, and for the depth channels I decided just to handle them on the main function instead.

– Alla
Nov 20 '18 at 16:01


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391170%2feffiecient-implementation-of-conv2-valid-in-image-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith