Convolution via Fourier transform in a window












0












$begingroup$


The question is related to an engineering application I am writing.



We are computing convolutions of large amounts of data with a few kernels with bounded support. The standard way to do so is to perform Fast Fourier Transform (FFT) on both the kernel and the data, multiply the arrays, and inverse FFT.



However, the amount of data is huge and it's not feasible to perform FFT on the entire data. So we do computation in sliding windows: start with a window of size S+W+S, where S is the size of the kernel support, do FFT+multiply+inverseFFT on the S+W+S amount of data, record the results for a smaller window of size W, and slide by W further along the data array. Since S is not negligible compared with W we compute more results than we can use.



I have a question though: are there any known shortcuts to reduce the amount of computation? For example, is there such thing as "partial inverseFFT" that would be enough to produce results only for the smaller window of size W? Anything else we can reduce to avoid computing the values that we know are going to be thrown away?










share|cite|improve this question









$endgroup$












  • $begingroup$
    You mean you are interested in some specific few Fourier frequencies or specific time/location points?
    $endgroup$
    – mathreadler
    May 2 '18 at 15:59










  • $begingroup$
    @mathreadler, I am interested in enough Fourier frequencies to compute W amount of data, no more. For example, suppose my kernel has support, say, S=16 data points; I set window to W=32 data points; I compute 16+32+16=64 frequencies, multiply with FFT of the kernel, invFFT, and save only the middle 32 results because the ones on the left and on the right are unreliable. Then I advance the window forward by 32 data points and repeat. Half of the final computation is wasted.
    $endgroup$
    – Michael
    May 2 '18 at 16:44










  • $begingroup$
    For such small filters do you really gain on doing an FFT instead of factoring the filter itself into some convolution net? You would also need to consider that FFT multiplication is circular convolution if what you want to do is a zero-padded or other kind of convolution. That would waste even more calculations.
    $endgroup$
    – mathreadler
    May 2 '18 at 17:07












  • $begingroup$
    @mathreadler, the implementation has much bigger filter, 16+32+16 was just an example to make it clear what I'm asking.
    $endgroup$
    – Michael
    May 2 '18 at 18:39










  • $begingroup$
    Alright. Then maybe it makes sense. Maybe also consider if you can do in-place convolution. For many filters it can give considerable speed-up.
    $endgroup$
    – mathreadler
    May 2 '18 at 19:49
















0












$begingroup$


The question is related to an engineering application I am writing.



We are computing convolutions of large amounts of data with a few kernels with bounded support. The standard way to do so is to perform Fast Fourier Transform (FFT) on both the kernel and the data, multiply the arrays, and inverse FFT.



However, the amount of data is huge and it's not feasible to perform FFT on the entire data. So we do computation in sliding windows: start with a window of size S+W+S, where S is the size of the kernel support, do FFT+multiply+inverseFFT on the S+W+S amount of data, record the results for a smaller window of size W, and slide by W further along the data array. Since S is not negligible compared with W we compute more results than we can use.



I have a question though: are there any known shortcuts to reduce the amount of computation? For example, is there such thing as "partial inverseFFT" that would be enough to produce results only for the smaller window of size W? Anything else we can reduce to avoid computing the values that we know are going to be thrown away?










share|cite|improve this question









$endgroup$












  • $begingroup$
    You mean you are interested in some specific few Fourier frequencies or specific time/location points?
    $endgroup$
    – mathreadler
    May 2 '18 at 15:59










  • $begingroup$
    @mathreadler, I am interested in enough Fourier frequencies to compute W amount of data, no more. For example, suppose my kernel has support, say, S=16 data points; I set window to W=32 data points; I compute 16+32+16=64 frequencies, multiply with FFT of the kernel, invFFT, and save only the middle 32 results because the ones on the left and on the right are unreliable. Then I advance the window forward by 32 data points and repeat. Half of the final computation is wasted.
    $endgroup$
    – Michael
    May 2 '18 at 16:44










  • $begingroup$
    For such small filters do you really gain on doing an FFT instead of factoring the filter itself into some convolution net? You would also need to consider that FFT multiplication is circular convolution if what you want to do is a zero-padded or other kind of convolution. That would waste even more calculations.
    $endgroup$
    – mathreadler
    May 2 '18 at 17:07












  • $begingroup$
    @mathreadler, the implementation has much bigger filter, 16+32+16 was just an example to make it clear what I'm asking.
    $endgroup$
    – Michael
    May 2 '18 at 18:39










  • $begingroup$
    Alright. Then maybe it makes sense. Maybe also consider if you can do in-place convolution. For many filters it can give considerable speed-up.
    $endgroup$
    – mathreadler
    May 2 '18 at 19:49














0












0








0





$begingroup$


The question is related to an engineering application I am writing.



We are computing convolutions of large amounts of data with a few kernels with bounded support. The standard way to do so is to perform Fast Fourier Transform (FFT) on both the kernel and the data, multiply the arrays, and inverse FFT.



However, the amount of data is huge and it's not feasible to perform FFT on the entire data. So we do computation in sliding windows: start with a window of size S+W+S, where S is the size of the kernel support, do FFT+multiply+inverseFFT on the S+W+S amount of data, record the results for a smaller window of size W, and slide by W further along the data array. Since S is not negligible compared with W we compute more results than we can use.



I have a question though: are there any known shortcuts to reduce the amount of computation? For example, is there such thing as "partial inverseFFT" that would be enough to produce results only for the smaller window of size W? Anything else we can reduce to avoid computing the values that we know are going to be thrown away?










share|cite|improve this question









$endgroup$




The question is related to an engineering application I am writing.



We are computing convolutions of large amounts of data with a few kernels with bounded support. The standard way to do so is to perform Fast Fourier Transform (FFT) on both the kernel and the data, multiply the arrays, and inverse FFT.



However, the amount of data is huge and it's not feasible to perform FFT on the entire data. So we do computation in sliding windows: start with a window of size S+W+S, where S is the size of the kernel support, do FFT+multiply+inverseFFT on the S+W+S amount of data, record the results for a smaller window of size W, and slide by W further along the data array. Since S is not negligible compared with W we compute more results than we can use.



I have a question though: are there any known shortcuts to reduce the amount of computation? For example, is there such thing as "partial inverseFFT" that would be enough to produce results only for the smaller window of size W? Anything else we can reduce to avoid computing the values that we know are going to be thrown away?







numerical-methods fourier-analysis applications






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked May 2 '18 at 15:43









MichaelMichael

1,372712




1,372712












  • $begingroup$
    You mean you are interested in some specific few Fourier frequencies or specific time/location points?
    $endgroup$
    – mathreadler
    May 2 '18 at 15:59










  • $begingroup$
    @mathreadler, I am interested in enough Fourier frequencies to compute W amount of data, no more. For example, suppose my kernel has support, say, S=16 data points; I set window to W=32 data points; I compute 16+32+16=64 frequencies, multiply with FFT of the kernel, invFFT, and save only the middle 32 results because the ones on the left and on the right are unreliable. Then I advance the window forward by 32 data points and repeat. Half of the final computation is wasted.
    $endgroup$
    – Michael
    May 2 '18 at 16:44










  • $begingroup$
    For such small filters do you really gain on doing an FFT instead of factoring the filter itself into some convolution net? You would also need to consider that FFT multiplication is circular convolution if what you want to do is a zero-padded or other kind of convolution. That would waste even more calculations.
    $endgroup$
    – mathreadler
    May 2 '18 at 17:07












  • $begingroup$
    @mathreadler, the implementation has much bigger filter, 16+32+16 was just an example to make it clear what I'm asking.
    $endgroup$
    – Michael
    May 2 '18 at 18:39










  • $begingroup$
    Alright. Then maybe it makes sense. Maybe also consider if you can do in-place convolution. For many filters it can give considerable speed-up.
    $endgroup$
    – mathreadler
    May 2 '18 at 19:49


















  • $begingroup$
    You mean you are interested in some specific few Fourier frequencies or specific time/location points?
    $endgroup$
    – mathreadler
    May 2 '18 at 15:59










  • $begingroup$
    @mathreadler, I am interested in enough Fourier frequencies to compute W amount of data, no more. For example, suppose my kernel has support, say, S=16 data points; I set window to W=32 data points; I compute 16+32+16=64 frequencies, multiply with FFT of the kernel, invFFT, and save only the middle 32 results because the ones on the left and on the right are unreliable. Then I advance the window forward by 32 data points and repeat. Half of the final computation is wasted.
    $endgroup$
    – Michael
    May 2 '18 at 16:44










  • $begingroup$
    For such small filters do you really gain on doing an FFT instead of factoring the filter itself into some convolution net? You would also need to consider that FFT multiplication is circular convolution if what you want to do is a zero-padded or other kind of convolution. That would waste even more calculations.
    $endgroup$
    – mathreadler
    May 2 '18 at 17:07












  • $begingroup$
    @mathreadler, the implementation has much bigger filter, 16+32+16 was just an example to make it clear what I'm asking.
    $endgroup$
    – Michael
    May 2 '18 at 18:39










  • $begingroup$
    Alright. Then maybe it makes sense. Maybe also consider if you can do in-place convolution. For many filters it can give considerable speed-up.
    $endgroup$
    – mathreadler
    May 2 '18 at 19:49
















$begingroup$
You mean you are interested in some specific few Fourier frequencies or specific time/location points?
$endgroup$
– mathreadler
May 2 '18 at 15:59




$begingroup$
You mean you are interested in some specific few Fourier frequencies or specific time/location points?
$endgroup$
– mathreadler
May 2 '18 at 15:59












$begingroup$
@mathreadler, I am interested in enough Fourier frequencies to compute W amount of data, no more. For example, suppose my kernel has support, say, S=16 data points; I set window to W=32 data points; I compute 16+32+16=64 frequencies, multiply with FFT of the kernel, invFFT, and save only the middle 32 results because the ones on the left and on the right are unreliable. Then I advance the window forward by 32 data points and repeat. Half of the final computation is wasted.
$endgroup$
– Michael
May 2 '18 at 16:44




$begingroup$
@mathreadler, I am interested in enough Fourier frequencies to compute W amount of data, no more. For example, suppose my kernel has support, say, S=16 data points; I set window to W=32 data points; I compute 16+32+16=64 frequencies, multiply with FFT of the kernel, invFFT, and save only the middle 32 results because the ones on the left and on the right are unreliable. Then I advance the window forward by 32 data points and repeat. Half of the final computation is wasted.
$endgroup$
– Michael
May 2 '18 at 16:44












$begingroup$
For such small filters do you really gain on doing an FFT instead of factoring the filter itself into some convolution net? You would also need to consider that FFT multiplication is circular convolution if what you want to do is a zero-padded or other kind of convolution. That would waste even more calculations.
$endgroup$
– mathreadler
May 2 '18 at 17:07






$begingroup$
For such small filters do you really gain on doing an FFT instead of factoring the filter itself into some convolution net? You would also need to consider that FFT multiplication is circular convolution if what you want to do is a zero-padded or other kind of convolution. That would waste even more calculations.
$endgroup$
– mathreadler
May 2 '18 at 17:07














$begingroup$
@mathreadler, the implementation has much bigger filter, 16+32+16 was just an example to make it clear what I'm asking.
$endgroup$
– Michael
May 2 '18 at 18:39




$begingroup$
@mathreadler, the implementation has much bigger filter, 16+32+16 was just an example to make it clear what I'm asking.
$endgroup$
– Michael
May 2 '18 at 18:39












$begingroup$
Alright. Then maybe it makes sense. Maybe also consider if you can do in-place convolution. For many filters it can give considerable speed-up.
$endgroup$
– mathreadler
May 2 '18 at 19:49




$begingroup$
Alright. Then maybe it makes sense. Maybe also consider if you can do in-place convolution. For many filters it can give considerable speed-up.
$endgroup$
– mathreadler
May 2 '18 at 19:49










1 Answer
1






active

oldest

votes


















3












$begingroup$

I am not sure this is practical in the general case. If you look at the structure of the (I)FFT, every input influences every output. If you only need some outputs and use the decimation-in-frequency form you could skip some computations but the improvement appears to be only log(k), where k is the fraction of outputs you want. In the limit, if you only want one output (e.g., the DC component) you'd reduce the FFT complexity from O(N log N) down to O(N), which make sense since averages can be computed with complexity O(N). This still might be worth it if you only want a tiny fraction of the output.



In theory you could write a special (I)FFT to do this, but you probably won't outperform a widely used package like FFTW3. And it's so heavily tuned that I wouldn't think of trying to modify it.



But there's one trick you might do. If your convolution kernel represents a low pass (or bandpass) filter, you can use a small IFFT that covers only the part of the spectrum you want.



I do this in my software defined radio. I take a relatively wide digital IF (IQ) signal, select one part and downsample to a more reasonable output sample rate. So I perform a large forward FFT, multiply the frequency domain components I want by those of my filter, and use a small IFFT to produce my output. This automatically downsamples by the ratio of the two transform sizes.



It's not quite this simple. If I did a brickwall filter by just performing the IFFT directly on a subset of frequency bins, the equivalent impulse response would be very long and I'd get wraparound aliasing in the time domain (remember, FFT-multiply-IFFT actually performs circular convolution). The impulse response of my filter must not exceed M samples, where N = L + M - 1 is the length of the FFT and L is the number of new samples for each FFT.



Since I'd still like to approximate a brickwall, I start with one and do an IFFT to the time domain (which gives me a sinc pulse) and multiply by a sampling window (Kaiser, since it has a nice tuning knob). Then I side the impulse to the front of the buffer to minimize latency and do a FFT to get my filter in the frequency domain. If I've chosen a good Kaiser parameter, the coefficients outside the frequency range will still be nearly zero but those in the passband but close to the edge will roll off in a controlled way to bound the time-domain impulse response.



I could probably save some time on the (forward) FFT by doing a bunch of small ones instead of one big one and feeding a tree of FFTs that essentially implement a series of decimation filters. In a sense, this is doing the FFT you want that avoids unneeded calculations. But I need to find out how long the impulse response of each filter needs to be; if they're short, a hand-tuned direct FIR implementation might be more efficient. But I've found it's really hard to beat FFTW3 these days, since it's so highly tuned.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    If only the 16 lowest frequency bins are non-zero, you can inv-FFT and put the result in $x(frac{K}{16}n)$, at the end of your inv-STFT you'll have a signal which is zero for $n not equiv 0 bmod frac{K}{16}$, you can extract $y(n) = x(frac{K}{16}n)$ to obtain a downsampled result, or apply to $x(n)$ a low-pass filter (using a last FFT) to obtain the result in the original sampling rate. Making the whole process perfectly compatible with the naive one under any window and STFT parameters isn't so obvious
    $endgroup$
    – reuns
    Jan 24 at 6:06













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2763526%2fconvolution-via-fourier-transform-in-a-window%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$

I am not sure this is practical in the general case. If you look at the structure of the (I)FFT, every input influences every output. If you only need some outputs and use the decimation-in-frequency form you could skip some computations but the improvement appears to be only log(k), where k is the fraction of outputs you want. In the limit, if you only want one output (e.g., the DC component) you'd reduce the FFT complexity from O(N log N) down to O(N), which make sense since averages can be computed with complexity O(N). This still might be worth it if you only want a tiny fraction of the output.



In theory you could write a special (I)FFT to do this, but you probably won't outperform a widely used package like FFTW3. And it's so heavily tuned that I wouldn't think of trying to modify it.



But there's one trick you might do. If your convolution kernel represents a low pass (or bandpass) filter, you can use a small IFFT that covers only the part of the spectrum you want.



I do this in my software defined radio. I take a relatively wide digital IF (IQ) signal, select one part and downsample to a more reasonable output sample rate. So I perform a large forward FFT, multiply the frequency domain components I want by those of my filter, and use a small IFFT to produce my output. This automatically downsamples by the ratio of the two transform sizes.



It's not quite this simple. If I did a brickwall filter by just performing the IFFT directly on a subset of frequency bins, the equivalent impulse response would be very long and I'd get wraparound aliasing in the time domain (remember, FFT-multiply-IFFT actually performs circular convolution). The impulse response of my filter must not exceed M samples, where N = L + M - 1 is the length of the FFT and L is the number of new samples for each FFT.



Since I'd still like to approximate a brickwall, I start with one and do an IFFT to the time domain (which gives me a sinc pulse) and multiply by a sampling window (Kaiser, since it has a nice tuning knob). Then I side the impulse to the front of the buffer to minimize latency and do a FFT to get my filter in the frequency domain. If I've chosen a good Kaiser parameter, the coefficients outside the frequency range will still be nearly zero but those in the passband but close to the edge will roll off in a controlled way to bound the time-domain impulse response.



I could probably save some time on the (forward) FFT by doing a bunch of small ones instead of one big one and feeding a tree of FFTs that essentially implement a series of decimation filters. In a sense, this is doing the FFT you want that avoids unneeded calculations. But I need to find out how long the impulse response of each filter needs to be; if they're short, a hand-tuned direct FIR implementation might be more efficient. But I've found it's really hard to beat FFTW3 these days, since it's so highly tuned.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    If only the 16 lowest frequency bins are non-zero, you can inv-FFT and put the result in $x(frac{K}{16}n)$, at the end of your inv-STFT you'll have a signal which is zero for $n not equiv 0 bmod frac{K}{16}$, you can extract $y(n) = x(frac{K}{16}n)$ to obtain a downsampled result, or apply to $x(n)$ a low-pass filter (using a last FFT) to obtain the result in the original sampling rate. Making the whole process perfectly compatible with the naive one under any window and STFT parameters isn't so obvious
    $endgroup$
    – reuns
    Jan 24 at 6:06


















3












$begingroup$

I am not sure this is practical in the general case. If you look at the structure of the (I)FFT, every input influences every output. If you only need some outputs and use the decimation-in-frequency form you could skip some computations but the improvement appears to be only log(k), where k is the fraction of outputs you want. In the limit, if you only want one output (e.g., the DC component) you'd reduce the FFT complexity from O(N log N) down to O(N), which make sense since averages can be computed with complexity O(N). This still might be worth it if you only want a tiny fraction of the output.



In theory you could write a special (I)FFT to do this, but you probably won't outperform a widely used package like FFTW3. And it's so heavily tuned that I wouldn't think of trying to modify it.



But there's one trick you might do. If your convolution kernel represents a low pass (or bandpass) filter, you can use a small IFFT that covers only the part of the spectrum you want.



I do this in my software defined radio. I take a relatively wide digital IF (IQ) signal, select one part and downsample to a more reasonable output sample rate. So I perform a large forward FFT, multiply the frequency domain components I want by those of my filter, and use a small IFFT to produce my output. This automatically downsamples by the ratio of the two transform sizes.



It's not quite this simple. If I did a brickwall filter by just performing the IFFT directly on a subset of frequency bins, the equivalent impulse response would be very long and I'd get wraparound aliasing in the time domain (remember, FFT-multiply-IFFT actually performs circular convolution). The impulse response of my filter must not exceed M samples, where N = L + M - 1 is the length of the FFT and L is the number of new samples for each FFT.



Since I'd still like to approximate a brickwall, I start with one and do an IFFT to the time domain (which gives me a sinc pulse) and multiply by a sampling window (Kaiser, since it has a nice tuning knob). Then I side the impulse to the front of the buffer to minimize latency and do a FFT to get my filter in the frequency domain. If I've chosen a good Kaiser parameter, the coefficients outside the frequency range will still be nearly zero but those in the passband but close to the edge will roll off in a controlled way to bound the time-domain impulse response.



I could probably save some time on the (forward) FFT by doing a bunch of small ones instead of one big one and feeding a tree of FFTs that essentially implement a series of decimation filters. In a sense, this is doing the FFT you want that avoids unneeded calculations. But I need to find out how long the impulse response of each filter needs to be; if they're short, a hand-tuned direct FIR implementation might be more efficient. But I've found it's really hard to beat FFTW3 these days, since it's so highly tuned.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    If only the 16 lowest frequency bins are non-zero, you can inv-FFT and put the result in $x(frac{K}{16}n)$, at the end of your inv-STFT you'll have a signal which is zero for $n not equiv 0 bmod frac{K}{16}$, you can extract $y(n) = x(frac{K}{16}n)$ to obtain a downsampled result, or apply to $x(n)$ a low-pass filter (using a last FFT) to obtain the result in the original sampling rate. Making the whole process perfectly compatible with the naive one under any window and STFT parameters isn't so obvious
    $endgroup$
    – reuns
    Jan 24 at 6:06
















3












3








3





$begingroup$

I am not sure this is practical in the general case. If you look at the structure of the (I)FFT, every input influences every output. If you only need some outputs and use the decimation-in-frequency form you could skip some computations but the improvement appears to be only log(k), where k is the fraction of outputs you want. In the limit, if you only want one output (e.g., the DC component) you'd reduce the FFT complexity from O(N log N) down to O(N), which make sense since averages can be computed with complexity O(N). This still might be worth it if you only want a tiny fraction of the output.



In theory you could write a special (I)FFT to do this, but you probably won't outperform a widely used package like FFTW3. And it's so heavily tuned that I wouldn't think of trying to modify it.



But there's one trick you might do. If your convolution kernel represents a low pass (or bandpass) filter, you can use a small IFFT that covers only the part of the spectrum you want.



I do this in my software defined radio. I take a relatively wide digital IF (IQ) signal, select one part and downsample to a more reasonable output sample rate. So I perform a large forward FFT, multiply the frequency domain components I want by those of my filter, and use a small IFFT to produce my output. This automatically downsamples by the ratio of the two transform sizes.



It's not quite this simple. If I did a brickwall filter by just performing the IFFT directly on a subset of frequency bins, the equivalent impulse response would be very long and I'd get wraparound aliasing in the time domain (remember, FFT-multiply-IFFT actually performs circular convolution). The impulse response of my filter must not exceed M samples, where N = L + M - 1 is the length of the FFT and L is the number of new samples for each FFT.



Since I'd still like to approximate a brickwall, I start with one and do an IFFT to the time domain (which gives me a sinc pulse) and multiply by a sampling window (Kaiser, since it has a nice tuning knob). Then I side the impulse to the front of the buffer to minimize latency and do a FFT to get my filter in the frequency domain. If I've chosen a good Kaiser parameter, the coefficients outside the frequency range will still be nearly zero but those in the passband but close to the edge will roll off in a controlled way to bound the time-domain impulse response.



I could probably save some time on the (forward) FFT by doing a bunch of small ones instead of one big one and feeding a tree of FFTs that essentially implement a series of decimation filters. In a sense, this is doing the FFT you want that avoids unneeded calculations. But I need to find out how long the impulse response of each filter needs to be; if they're short, a hand-tuned direct FIR implementation might be more efficient. But I've found it's really hard to beat FFTW3 these days, since it's so highly tuned.






share|cite|improve this answer











$endgroup$



I am not sure this is practical in the general case. If you look at the structure of the (I)FFT, every input influences every output. If you only need some outputs and use the decimation-in-frequency form you could skip some computations but the improvement appears to be only log(k), where k is the fraction of outputs you want. In the limit, if you only want one output (e.g., the DC component) you'd reduce the FFT complexity from O(N log N) down to O(N), which make sense since averages can be computed with complexity O(N). This still might be worth it if you only want a tiny fraction of the output.



In theory you could write a special (I)FFT to do this, but you probably won't outperform a widely used package like FFTW3. And it's so heavily tuned that I wouldn't think of trying to modify it.



But there's one trick you might do. If your convolution kernel represents a low pass (or bandpass) filter, you can use a small IFFT that covers only the part of the spectrum you want.



I do this in my software defined radio. I take a relatively wide digital IF (IQ) signal, select one part and downsample to a more reasonable output sample rate. So I perform a large forward FFT, multiply the frequency domain components I want by those of my filter, and use a small IFFT to produce my output. This automatically downsamples by the ratio of the two transform sizes.



It's not quite this simple. If I did a brickwall filter by just performing the IFFT directly on a subset of frequency bins, the equivalent impulse response would be very long and I'd get wraparound aliasing in the time domain (remember, FFT-multiply-IFFT actually performs circular convolution). The impulse response of my filter must not exceed M samples, where N = L + M - 1 is the length of the FFT and L is the number of new samples for each FFT.



Since I'd still like to approximate a brickwall, I start with one and do an IFFT to the time domain (which gives me a sinc pulse) and multiply by a sampling window (Kaiser, since it has a nice tuning knob). Then I side the impulse to the front of the buffer to minimize latency and do a FFT to get my filter in the frequency domain. If I've chosen a good Kaiser parameter, the coefficients outside the frequency range will still be nearly zero but those in the passband but close to the edge will roll off in a controlled way to bound the time-domain impulse response.



I could probably save some time on the (forward) FFT by doing a bunch of small ones instead of one big one and feeding a tree of FFTs that essentially implement a series of decimation filters. In a sense, this is doing the FFT you want that avoids unneeded calculations. But I need to find out how long the impulse response of each filter needs to be; if they're short, a hand-tuned direct FIR implementation might be more efficient. But I've found it's really hard to beat FFTW3 these days, since it's so highly tuned.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Jan 24 at 4:55

























answered Jan 24 at 3:39









Phil KarnPhil Karn

313




313












  • $begingroup$
    If only the 16 lowest frequency bins are non-zero, you can inv-FFT and put the result in $x(frac{K}{16}n)$, at the end of your inv-STFT you'll have a signal which is zero for $n not equiv 0 bmod frac{K}{16}$, you can extract $y(n) = x(frac{K}{16}n)$ to obtain a downsampled result, or apply to $x(n)$ a low-pass filter (using a last FFT) to obtain the result in the original sampling rate. Making the whole process perfectly compatible with the naive one under any window and STFT parameters isn't so obvious
    $endgroup$
    – reuns
    Jan 24 at 6:06




















  • $begingroup$
    If only the 16 lowest frequency bins are non-zero, you can inv-FFT and put the result in $x(frac{K}{16}n)$, at the end of your inv-STFT you'll have a signal which is zero for $n not equiv 0 bmod frac{K}{16}$, you can extract $y(n) = x(frac{K}{16}n)$ to obtain a downsampled result, or apply to $x(n)$ a low-pass filter (using a last FFT) to obtain the result in the original sampling rate. Making the whole process perfectly compatible with the naive one under any window and STFT parameters isn't so obvious
    $endgroup$
    – reuns
    Jan 24 at 6:06


















$begingroup$
If only the 16 lowest frequency bins are non-zero, you can inv-FFT and put the result in $x(frac{K}{16}n)$, at the end of your inv-STFT you'll have a signal which is zero for $n not equiv 0 bmod frac{K}{16}$, you can extract $y(n) = x(frac{K}{16}n)$ to obtain a downsampled result, or apply to $x(n)$ a low-pass filter (using a last FFT) to obtain the result in the original sampling rate. Making the whole process perfectly compatible with the naive one under any window and STFT parameters isn't so obvious
$endgroup$
– reuns
Jan 24 at 6:06






$begingroup$
If only the 16 lowest frequency bins are non-zero, you can inv-FFT and put the result in $x(frac{K}{16}n)$, at the end of your inv-STFT you'll have a signal which is zero for $n not equiv 0 bmod frac{K}{16}$, you can extract $y(n) = x(frac{K}{16}n)$ to obtain a downsampled result, or apply to $x(n)$ a low-pass filter (using a last FFT) to obtain the result in the original sampling rate. Making the whole process perfectly compatible with the naive one under any window and STFT parameters isn't so obvious
$endgroup$
– reuns
Jan 24 at 6:06




















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2763526%2fconvolution-via-fourier-transform-in-a-window%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

A Topological Invariant for $pi_3(U(n))$