Java: How to get current frequency of audio input?
I want to analyse the current frequency of the microphone input to synchronize my LEDs with the music playing. I know how to capture the sound from the microphone, but I don't know about FFT, which I often saw while searching for a solution to get the frequency.
I want to test if the current volume of a certain frequency is bigger than a set value. The code should be looking something like this:
if(frequency > value) {
LEDs on
else {
LEDs off
}
My problem is how to implement FFT in Java. For better understanding, here is a link to a YouTube video, that shows really good what I'm trying to achieve.
The whole code:
public class Music {
static AudioFormat format;
static DataLine.Info info;
public static void input() {
format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 4, 44100, false);
try {
info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
AudioInputStream audioStream = new AudioInputStream(targetLine);
byte buf = new byte[256]
Thread targetThread = new Thread() {
public void run() {
targetLine.start();
try {
audioStream.read(buf);
} catch (IOException e) {
e.printStackTrace();
}
}
};
targetThread.start();
} catch (LineUnavailableException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Edit: I tried using the JavaFX AudioSpectrumListener of the MediaPlayer, which works really good as long as I use a .mp3
file. The problem is, that I have to use a byte array in which I store the microphone input. I asked another question for this problem here.
java audio fft frequency javasound
|
show 4 more comments
I want to analyse the current frequency of the microphone input to synchronize my LEDs with the music playing. I know how to capture the sound from the microphone, but I don't know about FFT, which I often saw while searching for a solution to get the frequency.
I want to test if the current volume of a certain frequency is bigger than a set value. The code should be looking something like this:
if(frequency > value) {
LEDs on
else {
LEDs off
}
My problem is how to implement FFT in Java. For better understanding, here is a link to a YouTube video, that shows really good what I'm trying to achieve.
The whole code:
public class Music {
static AudioFormat format;
static DataLine.Info info;
public static void input() {
format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 4, 44100, false);
try {
info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
AudioInputStream audioStream = new AudioInputStream(targetLine);
byte buf = new byte[256]
Thread targetThread = new Thread() {
public void run() {
targetLine.start();
try {
audioStream.read(buf);
} catch (IOException e) {
e.printStackTrace();
}
}
};
targetThread.start();
} catch (LineUnavailableException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Edit: I tried using the JavaFX AudioSpectrumListener of the MediaPlayer, which works really good as long as I use a .mp3
file. The problem is, that I have to use a byte array in which I store the microphone input. I asked another question for this problem here.
java audio fft frequency javasound
What do you have for "audio input", for example do you have a list of voltages sampled at a given frequency? Also you would need to quantify what you mean by "frequency" of the audio input, for example, noise would have some component at every frequency, so you need to define some criteria. In general this is not a trivial problem.
– nPn
Jan 1 at 17:59
I use the Java Sound API to capture the audio and store it in an byte array. By "frequency" I mean the volume at a certain frequency (Hz).
– Jannik
Jan 1 at 18:15
I'm not one (usually) to recommend Java-FX, but in this case, they have one built in via the AudioSpectrumListener..
– Andrew Thompson
Jan 1 at 18:43
I'm using JavaFX for the Gui anyway so that may be a good solution, I'll take a look at it. Thank you!
– Jannik
Jan 1 at 19:40
I think that is not only a frequency, you should calculate energy of sound, search about rms
– amin saffar
Jan 1 at 22:08
|
show 4 more comments
I want to analyse the current frequency of the microphone input to synchronize my LEDs with the music playing. I know how to capture the sound from the microphone, but I don't know about FFT, which I often saw while searching for a solution to get the frequency.
I want to test if the current volume of a certain frequency is bigger than a set value. The code should be looking something like this:
if(frequency > value) {
LEDs on
else {
LEDs off
}
My problem is how to implement FFT in Java. For better understanding, here is a link to a YouTube video, that shows really good what I'm trying to achieve.
The whole code:
public class Music {
static AudioFormat format;
static DataLine.Info info;
public static void input() {
format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 4, 44100, false);
try {
info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
AudioInputStream audioStream = new AudioInputStream(targetLine);
byte buf = new byte[256]
Thread targetThread = new Thread() {
public void run() {
targetLine.start();
try {
audioStream.read(buf);
} catch (IOException e) {
e.printStackTrace();
}
}
};
targetThread.start();
} catch (LineUnavailableException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Edit: I tried using the JavaFX AudioSpectrumListener of the MediaPlayer, which works really good as long as I use a .mp3
file. The problem is, that I have to use a byte array in which I store the microphone input. I asked another question for this problem here.
java audio fft frequency javasound
I want to analyse the current frequency of the microphone input to synchronize my LEDs with the music playing. I know how to capture the sound from the microphone, but I don't know about FFT, which I often saw while searching for a solution to get the frequency.
I want to test if the current volume of a certain frequency is bigger than a set value. The code should be looking something like this:
if(frequency > value) {
LEDs on
else {
LEDs off
}
My problem is how to implement FFT in Java. For better understanding, here is a link to a YouTube video, that shows really good what I'm trying to achieve.
The whole code:
public class Music {
static AudioFormat format;
static DataLine.Info info;
public static void input() {
format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 4, 44100, false);
try {
info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
AudioInputStream audioStream = new AudioInputStream(targetLine);
byte buf = new byte[256]
Thread targetThread = new Thread() {
public void run() {
targetLine.start();
try {
audioStream.read(buf);
} catch (IOException e) {
e.printStackTrace();
}
}
};
targetThread.start();
} catch (LineUnavailableException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Edit: I tried using the JavaFX AudioSpectrumListener of the MediaPlayer, which works really good as long as I use a .mp3
file. The problem is, that I have to use a byte array in which I store the microphone input. I asked another question for this problem here.
java audio fft frequency javasound
java audio fft frequency javasound
edited Jan 3 at 11:18
Jannik
asked Jan 1 at 17:17
JannikJannik
309
309
What do you have for "audio input", for example do you have a list of voltages sampled at a given frequency? Also you would need to quantify what you mean by "frequency" of the audio input, for example, noise would have some component at every frequency, so you need to define some criteria. In general this is not a trivial problem.
– nPn
Jan 1 at 17:59
I use the Java Sound API to capture the audio and store it in an byte array. By "frequency" I mean the volume at a certain frequency (Hz).
– Jannik
Jan 1 at 18:15
I'm not one (usually) to recommend Java-FX, but in this case, they have one built in via the AudioSpectrumListener..
– Andrew Thompson
Jan 1 at 18:43
I'm using JavaFX for the Gui anyway so that may be a good solution, I'll take a look at it. Thank you!
– Jannik
Jan 1 at 19:40
I think that is not only a frequency, you should calculate energy of sound, search about rms
– amin saffar
Jan 1 at 22:08
|
show 4 more comments
What do you have for "audio input", for example do you have a list of voltages sampled at a given frequency? Also you would need to quantify what you mean by "frequency" of the audio input, for example, noise would have some component at every frequency, so you need to define some criteria. In general this is not a trivial problem.
– nPn
Jan 1 at 17:59
I use the Java Sound API to capture the audio and store it in an byte array. By "frequency" I mean the volume at a certain frequency (Hz).
– Jannik
Jan 1 at 18:15
I'm not one (usually) to recommend Java-FX, but in this case, they have one built in via the AudioSpectrumListener..
– Andrew Thompson
Jan 1 at 18:43
I'm using JavaFX for the Gui anyway so that may be a good solution, I'll take a look at it. Thank you!
– Jannik
Jan 1 at 19:40
I think that is not only a frequency, you should calculate energy of sound, search about rms
– amin saffar
Jan 1 at 22:08
What do you have for "audio input", for example do you have a list of voltages sampled at a given frequency? Also you would need to quantify what you mean by "frequency" of the audio input, for example, noise would have some component at every frequency, so you need to define some criteria. In general this is not a trivial problem.
– nPn
Jan 1 at 17:59
What do you have for "audio input", for example do you have a list of voltages sampled at a given frequency? Also you would need to quantify what you mean by "frequency" of the audio input, for example, noise would have some component at every frequency, so you need to define some criteria. In general this is not a trivial problem.
– nPn
Jan 1 at 17:59
I use the Java Sound API to capture the audio and store it in an byte array. By "frequency" I mean the volume at a certain frequency (Hz).
– Jannik
Jan 1 at 18:15
I use the Java Sound API to capture the audio and store it in an byte array. By "frequency" I mean the volume at a certain frequency (Hz).
– Jannik
Jan 1 at 18:15
I'm not one (usually) to recommend Java-FX, but in this case, they have one built in via the AudioSpectrumListener..
– Andrew Thompson
Jan 1 at 18:43
I'm not one (usually) to recommend Java-FX, but in this case, they have one built in via the AudioSpectrumListener..
– Andrew Thompson
Jan 1 at 18:43
I'm using JavaFX for the Gui anyway so that may be a good solution, I'll take a look at it. Thank you!
– Jannik
Jan 1 at 19:40
I'm using JavaFX for the Gui anyway so that may be a good solution, I'll take a look at it. Thank you!
– Jannik
Jan 1 at 19:40
I think that is not only a frequency, you should calculate energy of sound, search about rms
– amin saffar
Jan 1 at 22:08
I think that is not only a frequency, you should calculate energy of sound, search about rms
– amin saffar
Jan 1 at 22:08
|
show 4 more comments
2 Answers
2
active
oldest
votes
Using the JavaFFT
class from here, you can do something like this:
import javax.sound.sampled.*;
public class AudioLED {
private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;
public static void main(final String args) throws Exception {
// use only 1 channel, to make this easier
final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
targetLine.start();
final AudioInputStream audioStream = new AudioInputStream(targetLine);
final byte buf = new byte[256]; // <--- increase this for higher frequency resolution
final int numberOfSamples = buf.length / format.getFrameSize();
final JavaFFT fft = new JavaFFT(numberOfSamples);
while (true) {
// in real impl, don't just ignore how many bytes you read
audioStream.read(buf);
// the stream represents each sample as two bytes -> decode
final float samples = decode(buf, format);
final float transformed = fft.transform(samples);
final float realPart = transformed[0];
final float imaginaryPart = transformed[1];
final double magnitudes = toMagnitudes(realPart, imaginaryPart);
// do something with magnitudes...
}
}
private static float decode(final byte buf, final AudioFormat format) {
final float fbuf = new float[buf.length / format.getFrameSize()];
for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
final int sample = format.isBigEndian()
? byteToIntBigEndian(buf, pos, format.getFrameSize())
: byteToIntLittleEndian(buf, pos, format.getFrameSize());
// normalize to [0,1] (not strictly necessary, but makes things easier)
fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
}
return fbuf;
}
private static double toMagnitudes(final float realPart, final float imaginaryPart) {
final double powers = new double[realPart.length / 2];
for (int i = 0; i < powers.length; i++) {
powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
}
return powers;
}
private static int byteToIntLittleEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << 8 * (byteIndex);
}
return sample;
}
private static int byteToIntBigEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
}
return sample;
}
}
What does the Fourier Transform do?
In very simple terms: While a PCM signal encodes audio in the time domain, a Fourier transformed signal encodes audio in the frequency domain. What does this mean?
In PCM each value encodes an amplitude. You can imagine this like the membrane of a speaker that swing back and forth with certain amplitudes. The position of the speaker membrane is sampled a certain time per second (sampling rate). In your example the sampling rate is 44100 Hz, i.e. 44100 times per second. This is the typical rate for CD quality audio. For your purposes you probably don't need this high a rate.
To transform from the time domain to the frequency domain, you take a certain number of samples (let's say N=1024
) and transform them using the fast Fourier transform (FFT). In primers about the Fourier transform you will see a lot of info about the continuous case, but what you need to pay attention to is the discrete case (also called discrete Fourier transform, DTFT), because we are dealing with digital signals, not analog signals.
So what happens when you transform 1024
samples using the DTFT (using its fast implementation FFT)? Typically, the samples are real numbers, not complex numbers. But the output of the DTFT is complex. This is why you usually get two output arrays from one input array. One array for the real part and one for the imaginary part. Together they form one array of complex numbers. This array represents the frequency spectrum of your input samples. The spectrum is complex, because it has to encode two aspects: magnitude (amplitude) and phase. Imagine a sine wave with amplitude 1
. As you might remember from math way back, a sine wave crosses through the origin (0, 0)
, while a cosine wave cuts the y-axis at (0, 1)
. Apart from this shift both waves are identical in amplitude and shape. This shift is called phase. In your context we don't care about phase, but only about amplitude/magnitude, but the complex numbers you get encode both. To convert one of those complex numbers (r, i)
to a simple magnitude value (how loud at a certain frequency), you simply calculate m=sqrt(r*r+i*i)
. The outcome is always positive. A simple way to understand why and how this works is to imagine a cartesian plane. Treat (r,i)
as vector on that plane. Because of the Pythagorean theorem the length of that vector from the origin is just m=sqrt(r*r+i*i)
.
Now we have magnitudes. But how do they relate to frequencies? Each of the magnitude values corresponds to a certain (linearly spaced) frequency. The first thing to understand is that the output of the FFT is symmetric (mirrored at the midpoint). So of the 1024
complex numbers, only the first 512
are of interest to us. And which frequencies does that cover? Because of the Nyquist–Shannon sampling theorem a signal sampled with SR=44100 Hz
cannot contain information about frequencies greater than F=SR/2=22050 Hz
(you may realize that this is the upper boundary of human hearing, which is why it was chosen for CDs). So the first 512
complex values you get from the FFT for 1024
samples of a signal sampled at 44100 Hz
cover the frequencies 0 Hz - 22050 Hz
. Each so-called frequency bin covers 2F/N = SR/N = 22050/512 Hz = 43 Hz
(bandwidth of bin).
So the bin for 11025 Hz
is right at index 512/2=256
. The magnitude may be at m[256]
.
To put this to work in your application you need to understand one more thing: 1024
samples of a 44100 Hz signal
cover a very short amount of time, i.e. 23ms. With that short a time you will see sudden peaks. It's better to aggregate multiple of those 1024
samples into one value before thresholding. Alternatively you could also use a longer DTFT, e.g. 1024*64
, however, I advise against making the DTFT very long as it creates a large computational burden.
Thank you for your answer, but since I'm new to Java it's really hard for me to understand all the different methods and functions. It would be great if you could try to explain me, what I have to do in order to test if the volume of a certain frequency > value. I don't have enough reputation yet, but I'll vote your answer up as soon as I'm able to do that!
– Jannik
Jan 2 at 15:37
@Jannik I don't think this is really a Java issue. I'd recommend reading up on some basic digital signal processing (DSP) and how the Javasound API is organized. Start with how PCM works (e.g. en.wikipedia.org/wiki/Pulse-code_modulation). All the above code does is to read sound into a byte buffer, decode the buffer into samples (each sample consists of two bytes!) and then transform it using the FFT. Because the FFT result is complex, magnitudes are computed. If you don't know what the FFT delivers, please read some primer on the web, e.g. math.stackexchange.com/q/1002.
– hendrik
Jan 2 at 15:48
1
I am new to programming at all, so I barely have any knowledge about all these things, that's what I meant with "new to java". I'll check out these links, thank you!
– Jannik
Jan 2 at 19:48
First of all, thank you for your edit, I feel like I understand some of the processes for the first time. But I don't really get how to "aggregate multiple of those1024
samples into one value". And why is the frequency bin for11025
atm[128]
and not atm[256]
(11025 / 43 ~ 256)
?
– Jannik
Jan 3 at 12:06
1
Sorry, my mistake. Fixed it in the answer. With aggregate I mean: simply average a bunch of consecutive values (i.e. avg each frequency) to get more stability.
– hendrik
Jan 3 at 12:41
|
show 4 more comments
I think hendrik has the basic plan, but I hear your pain about understanding the process of getting there!
I assume you are getting your byte array via a TargetDataLine
and it is returning bytes. Converting the bytes to floats will take a bit of manipulation, and depend upon the AudioFormat
. A typical format has 44100 frames per second, and 16-bit encoding (two bytes to form one data point) and stereo. This would mean 4 bytes make up a single frame consisting of a left and a right value.
Example code that shows how to read and handle the incoming stream of individual bytes can be found in the java audio tutorial Using Files and Format Converters. Scroll down to the first "code snippet" in the section "Reading Sound Files". The key point where you would convert the incoming data to floats occurs at the spot marked as follows:
// Here, do something useful with the audio data that's
// now in the audioBytes array...
At this point you can take the two bytes (assuming 16-bit encoding) and append them into a single short, and scale the value to a normalized float (range from -1 to 1). There are several StackOverflow questions that show algorithms for doing this conversion.
You may have to also go through a process editing where the sample code reads from an AudioInputStream
(as in the example) vs. a TargetDataLine
, but I think if that poses a problem, there are also StackOverflow questions that can help with that.
For the FFTFactory recommended by hendrik, I suspect that using the transform method with just a float for input will suffice. But I haven't gotten into the details or tried running this myself yet. (It looks promising. I suspect a search might also uncover other FFT libraries with more complete documentation. I recall something being available perhaps from MIT. I'm probably only a couple steps ahead of you technically.)
In any event, at the point above where the conversion happens, you can add to the input array for transform() until it is full, and on that iteration call the transform() method.
Interpreting the output from the method might be best accomplished on a separate thread. I'm thinking, hand off the results of the FFT call, or hand off the transform() call itself via some sort of loose coupling. (Are you familiar with this term and multi-threaded coding?)
Significant insights into how Java encodes sound and sound formats can be found in tutorials that directly precede the one linked above.
Another great resource, if you want to better understand how to interpret FFT results, can be found as a free download: "The Scientists and Engineers Guide to DSP"
Thank you! Yes, my problem is that I don't know how to use FFT. Neither how to provide it with the right data, nor how to interpret it's results. I think I'll have to take some time and read how the FFT works, but for me it's really hard to understand..
– Jannik
Jan 3 at 1:44
1
For good reason. Both the book and the Java Sound tutorials were very difficult reads for me as well. I hoped I helped with a couple of the steps and got you pointed in the right direction for the rest. Don't be discouraged if it takes multiple passes. I hang out at Java-gaming.org and that forum has a section for Sound/Audio which might also be a source of help. Last thought, check out the frequency analysis tool that comes free with Audacity! It's pretty cool and using it might add insights into how it all works.
– Phil Freihofner
Jan 3 at 8:13
I'll check out the things that you mentioned, thanks!
– Jannik
Jan 3 at 10:32
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997426%2fjava-how-to-get-current-frequency-of-audio-input%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using the JavaFFT
class from here, you can do something like this:
import javax.sound.sampled.*;
public class AudioLED {
private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;
public static void main(final String args) throws Exception {
// use only 1 channel, to make this easier
final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
targetLine.start();
final AudioInputStream audioStream = new AudioInputStream(targetLine);
final byte buf = new byte[256]; // <--- increase this for higher frequency resolution
final int numberOfSamples = buf.length / format.getFrameSize();
final JavaFFT fft = new JavaFFT(numberOfSamples);
while (true) {
// in real impl, don't just ignore how many bytes you read
audioStream.read(buf);
// the stream represents each sample as two bytes -> decode
final float samples = decode(buf, format);
final float transformed = fft.transform(samples);
final float realPart = transformed[0];
final float imaginaryPart = transformed[1];
final double magnitudes = toMagnitudes(realPart, imaginaryPart);
// do something with magnitudes...
}
}
private static float decode(final byte buf, final AudioFormat format) {
final float fbuf = new float[buf.length / format.getFrameSize()];
for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
final int sample = format.isBigEndian()
? byteToIntBigEndian(buf, pos, format.getFrameSize())
: byteToIntLittleEndian(buf, pos, format.getFrameSize());
// normalize to [0,1] (not strictly necessary, but makes things easier)
fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
}
return fbuf;
}
private static double toMagnitudes(final float realPart, final float imaginaryPart) {
final double powers = new double[realPart.length / 2];
for (int i = 0; i < powers.length; i++) {
powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
}
return powers;
}
private static int byteToIntLittleEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << 8 * (byteIndex);
}
return sample;
}
private static int byteToIntBigEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
}
return sample;
}
}
What does the Fourier Transform do?
In very simple terms: While a PCM signal encodes audio in the time domain, a Fourier transformed signal encodes audio in the frequency domain. What does this mean?
In PCM each value encodes an amplitude. You can imagine this like the membrane of a speaker that swing back and forth with certain amplitudes. The position of the speaker membrane is sampled a certain time per second (sampling rate). In your example the sampling rate is 44100 Hz, i.e. 44100 times per second. This is the typical rate for CD quality audio. For your purposes you probably don't need this high a rate.
To transform from the time domain to the frequency domain, you take a certain number of samples (let's say N=1024
) and transform them using the fast Fourier transform (FFT). In primers about the Fourier transform you will see a lot of info about the continuous case, but what you need to pay attention to is the discrete case (also called discrete Fourier transform, DTFT), because we are dealing with digital signals, not analog signals.
So what happens when you transform 1024
samples using the DTFT (using its fast implementation FFT)? Typically, the samples are real numbers, not complex numbers. But the output of the DTFT is complex. This is why you usually get two output arrays from one input array. One array for the real part and one for the imaginary part. Together they form one array of complex numbers. This array represents the frequency spectrum of your input samples. The spectrum is complex, because it has to encode two aspects: magnitude (amplitude) and phase. Imagine a sine wave with amplitude 1
. As you might remember from math way back, a sine wave crosses through the origin (0, 0)
, while a cosine wave cuts the y-axis at (0, 1)
. Apart from this shift both waves are identical in amplitude and shape. This shift is called phase. In your context we don't care about phase, but only about amplitude/magnitude, but the complex numbers you get encode both. To convert one of those complex numbers (r, i)
to a simple magnitude value (how loud at a certain frequency), you simply calculate m=sqrt(r*r+i*i)
. The outcome is always positive. A simple way to understand why and how this works is to imagine a cartesian plane. Treat (r,i)
as vector on that plane. Because of the Pythagorean theorem the length of that vector from the origin is just m=sqrt(r*r+i*i)
.
Now we have magnitudes. But how do they relate to frequencies? Each of the magnitude values corresponds to a certain (linearly spaced) frequency. The first thing to understand is that the output of the FFT is symmetric (mirrored at the midpoint). So of the 1024
complex numbers, only the first 512
are of interest to us. And which frequencies does that cover? Because of the Nyquist–Shannon sampling theorem a signal sampled with SR=44100 Hz
cannot contain information about frequencies greater than F=SR/2=22050 Hz
(you may realize that this is the upper boundary of human hearing, which is why it was chosen for CDs). So the first 512
complex values you get from the FFT for 1024
samples of a signal sampled at 44100 Hz
cover the frequencies 0 Hz - 22050 Hz
. Each so-called frequency bin covers 2F/N = SR/N = 22050/512 Hz = 43 Hz
(bandwidth of bin).
So the bin for 11025 Hz
is right at index 512/2=256
. The magnitude may be at m[256]
.
To put this to work in your application you need to understand one more thing: 1024
samples of a 44100 Hz signal
cover a very short amount of time, i.e. 23ms. With that short a time you will see sudden peaks. It's better to aggregate multiple of those 1024
samples into one value before thresholding. Alternatively you could also use a longer DTFT, e.g. 1024*64
, however, I advise against making the DTFT very long as it creates a large computational burden.
Thank you for your answer, but since I'm new to Java it's really hard for me to understand all the different methods and functions. It would be great if you could try to explain me, what I have to do in order to test if the volume of a certain frequency > value. I don't have enough reputation yet, but I'll vote your answer up as soon as I'm able to do that!
– Jannik
Jan 2 at 15:37
@Jannik I don't think this is really a Java issue. I'd recommend reading up on some basic digital signal processing (DSP) and how the Javasound API is organized. Start with how PCM works (e.g. en.wikipedia.org/wiki/Pulse-code_modulation). All the above code does is to read sound into a byte buffer, decode the buffer into samples (each sample consists of two bytes!) and then transform it using the FFT. Because the FFT result is complex, magnitudes are computed. If you don't know what the FFT delivers, please read some primer on the web, e.g. math.stackexchange.com/q/1002.
– hendrik
Jan 2 at 15:48
1
I am new to programming at all, so I barely have any knowledge about all these things, that's what I meant with "new to java". I'll check out these links, thank you!
– Jannik
Jan 2 at 19:48
First of all, thank you for your edit, I feel like I understand some of the processes for the first time. But I don't really get how to "aggregate multiple of those1024
samples into one value". And why is the frequency bin for11025
atm[128]
and not atm[256]
(11025 / 43 ~ 256)
?
– Jannik
Jan 3 at 12:06
1
Sorry, my mistake. Fixed it in the answer. With aggregate I mean: simply average a bunch of consecutive values (i.e. avg each frequency) to get more stability.
– hendrik
Jan 3 at 12:41
|
show 4 more comments
Using the JavaFFT
class from here, you can do something like this:
import javax.sound.sampled.*;
public class AudioLED {
private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;
public static void main(final String args) throws Exception {
// use only 1 channel, to make this easier
final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
targetLine.start();
final AudioInputStream audioStream = new AudioInputStream(targetLine);
final byte buf = new byte[256]; // <--- increase this for higher frequency resolution
final int numberOfSamples = buf.length / format.getFrameSize();
final JavaFFT fft = new JavaFFT(numberOfSamples);
while (true) {
// in real impl, don't just ignore how many bytes you read
audioStream.read(buf);
// the stream represents each sample as two bytes -> decode
final float samples = decode(buf, format);
final float transformed = fft.transform(samples);
final float realPart = transformed[0];
final float imaginaryPart = transformed[1];
final double magnitudes = toMagnitudes(realPart, imaginaryPart);
// do something with magnitudes...
}
}
private static float decode(final byte buf, final AudioFormat format) {
final float fbuf = new float[buf.length / format.getFrameSize()];
for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
final int sample = format.isBigEndian()
? byteToIntBigEndian(buf, pos, format.getFrameSize())
: byteToIntLittleEndian(buf, pos, format.getFrameSize());
// normalize to [0,1] (not strictly necessary, but makes things easier)
fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
}
return fbuf;
}
private static double toMagnitudes(final float realPart, final float imaginaryPart) {
final double powers = new double[realPart.length / 2];
for (int i = 0; i < powers.length; i++) {
powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
}
return powers;
}
private static int byteToIntLittleEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << 8 * (byteIndex);
}
return sample;
}
private static int byteToIntBigEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
}
return sample;
}
}
What does the Fourier Transform do?
In very simple terms: While a PCM signal encodes audio in the time domain, a Fourier transformed signal encodes audio in the frequency domain. What does this mean?
In PCM each value encodes an amplitude. You can imagine this like the membrane of a speaker that swing back and forth with certain amplitudes. The position of the speaker membrane is sampled a certain time per second (sampling rate). In your example the sampling rate is 44100 Hz, i.e. 44100 times per second. This is the typical rate for CD quality audio. For your purposes you probably don't need this high a rate.
To transform from the time domain to the frequency domain, you take a certain number of samples (let's say N=1024
) and transform them using the fast Fourier transform (FFT). In primers about the Fourier transform you will see a lot of info about the continuous case, but what you need to pay attention to is the discrete case (also called discrete Fourier transform, DTFT), because we are dealing with digital signals, not analog signals.
So what happens when you transform 1024
samples using the DTFT (using its fast implementation FFT)? Typically, the samples are real numbers, not complex numbers. But the output of the DTFT is complex. This is why you usually get two output arrays from one input array. One array for the real part and one for the imaginary part. Together they form one array of complex numbers. This array represents the frequency spectrum of your input samples. The spectrum is complex, because it has to encode two aspects: magnitude (amplitude) and phase. Imagine a sine wave with amplitude 1
. As you might remember from math way back, a sine wave crosses through the origin (0, 0)
, while a cosine wave cuts the y-axis at (0, 1)
. Apart from this shift both waves are identical in amplitude and shape. This shift is called phase. In your context we don't care about phase, but only about amplitude/magnitude, but the complex numbers you get encode both. To convert one of those complex numbers (r, i)
to a simple magnitude value (how loud at a certain frequency), you simply calculate m=sqrt(r*r+i*i)
. The outcome is always positive. A simple way to understand why and how this works is to imagine a cartesian plane. Treat (r,i)
as vector on that plane. Because of the Pythagorean theorem the length of that vector from the origin is just m=sqrt(r*r+i*i)
.
Now we have magnitudes. But how do they relate to frequencies? Each of the magnitude values corresponds to a certain (linearly spaced) frequency. The first thing to understand is that the output of the FFT is symmetric (mirrored at the midpoint). So of the 1024
complex numbers, only the first 512
are of interest to us. And which frequencies does that cover? Because of the Nyquist–Shannon sampling theorem a signal sampled with SR=44100 Hz
cannot contain information about frequencies greater than F=SR/2=22050 Hz
(you may realize that this is the upper boundary of human hearing, which is why it was chosen for CDs). So the first 512
complex values you get from the FFT for 1024
samples of a signal sampled at 44100 Hz
cover the frequencies 0 Hz - 22050 Hz
. Each so-called frequency bin covers 2F/N = SR/N = 22050/512 Hz = 43 Hz
(bandwidth of bin).
So the bin for 11025 Hz
is right at index 512/2=256
. The magnitude may be at m[256]
.
To put this to work in your application you need to understand one more thing: 1024
samples of a 44100 Hz signal
cover a very short amount of time, i.e. 23ms. With that short a time you will see sudden peaks. It's better to aggregate multiple of those 1024
samples into one value before thresholding. Alternatively you could also use a longer DTFT, e.g. 1024*64
, however, I advise against making the DTFT very long as it creates a large computational burden.
Thank you for your answer, but since I'm new to Java it's really hard for me to understand all the different methods and functions. It would be great if you could try to explain me, what I have to do in order to test if the volume of a certain frequency > value. I don't have enough reputation yet, but I'll vote your answer up as soon as I'm able to do that!
– Jannik
Jan 2 at 15:37
@Jannik I don't think this is really a Java issue. I'd recommend reading up on some basic digital signal processing (DSP) and how the Javasound API is organized. Start with how PCM works (e.g. en.wikipedia.org/wiki/Pulse-code_modulation). All the above code does is to read sound into a byte buffer, decode the buffer into samples (each sample consists of two bytes!) and then transform it using the FFT. Because the FFT result is complex, magnitudes are computed. If you don't know what the FFT delivers, please read some primer on the web, e.g. math.stackexchange.com/q/1002.
– hendrik
Jan 2 at 15:48
1
I am new to programming at all, so I barely have any knowledge about all these things, that's what I meant with "new to java". I'll check out these links, thank you!
– Jannik
Jan 2 at 19:48
First of all, thank you for your edit, I feel like I understand some of the processes for the first time. But I don't really get how to "aggregate multiple of those1024
samples into one value". And why is the frequency bin for11025
atm[128]
and not atm[256]
(11025 / 43 ~ 256)
?
– Jannik
Jan 3 at 12:06
1
Sorry, my mistake. Fixed it in the answer. With aggregate I mean: simply average a bunch of consecutive values (i.e. avg each frequency) to get more stability.
– hendrik
Jan 3 at 12:41
|
show 4 more comments
Using the JavaFFT
class from here, you can do something like this:
import javax.sound.sampled.*;
public class AudioLED {
private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;
public static void main(final String args) throws Exception {
// use only 1 channel, to make this easier
final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
targetLine.start();
final AudioInputStream audioStream = new AudioInputStream(targetLine);
final byte buf = new byte[256]; // <--- increase this for higher frequency resolution
final int numberOfSamples = buf.length / format.getFrameSize();
final JavaFFT fft = new JavaFFT(numberOfSamples);
while (true) {
// in real impl, don't just ignore how many bytes you read
audioStream.read(buf);
// the stream represents each sample as two bytes -> decode
final float samples = decode(buf, format);
final float transformed = fft.transform(samples);
final float realPart = transformed[0];
final float imaginaryPart = transformed[1];
final double magnitudes = toMagnitudes(realPart, imaginaryPart);
// do something with magnitudes...
}
}
private static float decode(final byte buf, final AudioFormat format) {
final float fbuf = new float[buf.length / format.getFrameSize()];
for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
final int sample = format.isBigEndian()
? byteToIntBigEndian(buf, pos, format.getFrameSize())
: byteToIntLittleEndian(buf, pos, format.getFrameSize());
// normalize to [0,1] (not strictly necessary, but makes things easier)
fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
}
return fbuf;
}
private static double toMagnitudes(final float realPart, final float imaginaryPart) {
final double powers = new double[realPart.length / 2];
for (int i = 0; i < powers.length; i++) {
powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
}
return powers;
}
private static int byteToIntLittleEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << 8 * (byteIndex);
}
return sample;
}
private static int byteToIntBigEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
}
return sample;
}
}
What does the Fourier Transform do?
In very simple terms: While a PCM signal encodes audio in the time domain, a Fourier transformed signal encodes audio in the frequency domain. What does this mean?
In PCM each value encodes an amplitude. You can imagine this like the membrane of a speaker that swing back and forth with certain amplitudes. The position of the speaker membrane is sampled a certain time per second (sampling rate). In your example the sampling rate is 44100 Hz, i.e. 44100 times per second. This is the typical rate for CD quality audio. For your purposes you probably don't need this high a rate.
To transform from the time domain to the frequency domain, you take a certain number of samples (let's say N=1024
) and transform them using the fast Fourier transform (FFT). In primers about the Fourier transform you will see a lot of info about the continuous case, but what you need to pay attention to is the discrete case (also called discrete Fourier transform, DTFT), because we are dealing with digital signals, not analog signals.
So what happens when you transform 1024
samples using the DTFT (using its fast implementation FFT)? Typically, the samples are real numbers, not complex numbers. But the output of the DTFT is complex. This is why you usually get two output arrays from one input array. One array for the real part and one for the imaginary part. Together they form one array of complex numbers. This array represents the frequency spectrum of your input samples. The spectrum is complex, because it has to encode two aspects: magnitude (amplitude) and phase. Imagine a sine wave with amplitude 1
. As you might remember from math way back, a sine wave crosses through the origin (0, 0)
, while a cosine wave cuts the y-axis at (0, 1)
. Apart from this shift both waves are identical in amplitude and shape. This shift is called phase. In your context we don't care about phase, but only about amplitude/magnitude, but the complex numbers you get encode both. To convert one of those complex numbers (r, i)
to a simple magnitude value (how loud at a certain frequency), you simply calculate m=sqrt(r*r+i*i)
. The outcome is always positive. A simple way to understand why and how this works is to imagine a cartesian plane. Treat (r,i)
as vector on that plane. Because of the Pythagorean theorem the length of that vector from the origin is just m=sqrt(r*r+i*i)
.
Now we have magnitudes. But how do they relate to frequencies? Each of the magnitude values corresponds to a certain (linearly spaced) frequency. The first thing to understand is that the output of the FFT is symmetric (mirrored at the midpoint). So of the 1024
complex numbers, only the first 512
are of interest to us. And which frequencies does that cover? Because of the Nyquist–Shannon sampling theorem a signal sampled with SR=44100 Hz
cannot contain information about frequencies greater than F=SR/2=22050 Hz
(you may realize that this is the upper boundary of human hearing, which is why it was chosen for CDs). So the first 512
complex values you get from the FFT for 1024
samples of a signal sampled at 44100 Hz
cover the frequencies 0 Hz - 22050 Hz
. Each so-called frequency bin covers 2F/N = SR/N = 22050/512 Hz = 43 Hz
(bandwidth of bin).
So the bin for 11025 Hz
is right at index 512/2=256
. The magnitude may be at m[256]
.
To put this to work in your application you need to understand one more thing: 1024
samples of a 44100 Hz signal
cover a very short amount of time, i.e. 23ms. With that short a time you will see sudden peaks. It's better to aggregate multiple of those 1024
samples into one value before thresholding. Alternatively you could also use a longer DTFT, e.g. 1024*64
, however, I advise against making the DTFT very long as it creates a large computational burden.
Using the JavaFFT
class from here, you can do something like this:
import javax.sound.sampled.*;
public class AudioLED {
private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;
public static void main(final String args) throws Exception {
// use only 1 channel, to make this easier
final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
targetLine.open();
targetLine.start();
final AudioInputStream audioStream = new AudioInputStream(targetLine);
final byte buf = new byte[256]; // <--- increase this for higher frequency resolution
final int numberOfSamples = buf.length / format.getFrameSize();
final JavaFFT fft = new JavaFFT(numberOfSamples);
while (true) {
// in real impl, don't just ignore how many bytes you read
audioStream.read(buf);
// the stream represents each sample as two bytes -> decode
final float samples = decode(buf, format);
final float transformed = fft.transform(samples);
final float realPart = transformed[0];
final float imaginaryPart = transformed[1];
final double magnitudes = toMagnitudes(realPart, imaginaryPart);
// do something with magnitudes...
}
}
private static float decode(final byte buf, final AudioFormat format) {
final float fbuf = new float[buf.length / format.getFrameSize()];
for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
final int sample = format.isBigEndian()
? byteToIntBigEndian(buf, pos, format.getFrameSize())
: byteToIntLittleEndian(buf, pos, format.getFrameSize());
// normalize to [0,1] (not strictly necessary, but makes things easier)
fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
}
return fbuf;
}
private static double toMagnitudes(final float realPart, final float imaginaryPart) {
final double powers = new double[realPart.length / 2];
for (int i = 0; i < powers.length; i++) {
powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
}
return powers;
}
private static int byteToIntLittleEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << 8 * (byteIndex);
}
return sample;
}
private static int byteToIntBigEndian(final byte buf, final int offset, final int bytesPerSample) {
int sample = 0;
for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
final int aByte = buf[offset + byteIndex] & 0xff;
sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
}
return sample;
}
}
What does the Fourier Transform do?
In very simple terms: While a PCM signal encodes audio in the time domain, a Fourier transformed signal encodes audio in the frequency domain. What does this mean?
In PCM each value encodes an amplitude. You can imagine this like the membrane of a speaker that swing back and forth with certain amplitudes. The position of the speaker membrane is sampled a certain time per second (sampling rate). In your example the sampling rate is 44100 Hz, i.e. 44100 times per second. This is the typical rate for CD quality audio. For your purposes you probably don't need this high a rate.
To transform from the time domain to the frequency domain, you take a certain number of samples (let's say N=1024
) and transform them using the fast Fourier transform (FFT). In primers about the Fourier transform you will see a lot of info about the continuous case, but what you need to pay attention to is the discrete case (also called discrete Fourier transform, DTFT), because we are dealing with digital signals, not analog signals.
So what happens when you transform 1024
samples using the DTFT (using its fast implementation FFT)? Typically, the samples are real numbers, not complex numbers. But the output of the DTFT is complex. This is why you usually get two output arrays from one input array. One array for the real part and one for the imaginary part. Together they form one array of complex numbers. This array represents the frequency spectrum of your input samples. The spectrum is complex, because it has to encode two aspects: magnitude (amplitude) and phase. Imagine a sine wave with amplitude 1
. As you might remember from math way back, a sine wave crosses through the origin (0, 0)
, while a cosine wave cuts the y-axis at (0, 1)
. Apart from this shift both waves are identical in amplitude and shape. This shift is called phase. In your context we don't care about phase, but only about amplitude/magnitude, but the complex numbers you get encode both. To convert one of those complex numbers (r, i)
to a simple magnitude value (how loud at a certain frequency), you simply calculate m=sqrt(r*r+i*i)
. The outcome is always positive. A simple way to understand why and how this works is to imagine a cartesian plane. Treat (r,i)
as vector on that plane. Because of the Pythagorean theorem the length of that vector from the origin is just m=sqrt(r*r+i*i)
.
Now we have magnitudes. But how do they relate to frequencies? Each of the magnitude values corresponds to a certain (linearly spaced) frequency. The first thing to understand is that the output of the FFT is symmetric (mirrored at the midpoint). So of the 1024
complex numbers, only the first 512
are of interest to us. And which frequencies does that cover? Because of the Nyquist–Shannon sampling theorem a signal sampled with SR=44100 Hz
cannot contain information about frequencies greater than F=SR/2=22050 Hz
(you may realize that this is the upper boundary of human hearing, which is why it was chosen for CDs). So the first 512
complex values you get from the FFT for 1024
samples of a signal sampled at 44100 Hz
cover the frequencies 0 Hz - 22050 Hz
. Each so-called frequency bin covers 2F/N = SR/N = 22050/512 Hz = 43 Hz
(bandwidth of bin).
So the bin for 11025 Hz
is right at index 512/2=256
. The magnitude may be at m[256]
.
To put this to work in your application you need to understand one more thing: 1024
samples of a 44100 Hz signal
cover a very short amount of time, i.e. 23ms. With that short a time you will see sudden peaks. It's better to aggregate multiple of those 1024
samples into one value before thresholding. Alternatively you could also use a longer DTFT, e.g. 1024*64
, however, I advise against making the DTFT very long as it creates a large computational burden.
edited Jan 3 at 12:39
answered Jan 2 at 14:00
hendrikhendrik
1,552821
1,552821
Thank you for your answer, but since I'm new to Java it's really hard for me to understand all the different methods and functions. It would be great if you could try to explain me, what I have to do in order to test if the volume of a certain frequency > value. I don't have enough reputation yet, but I'll vote your answer up as soon as I'm able to do that!
– Jannik
Jan 2 at 15:37
@Jannik I don't think this is really a Java issue. I'd recommend reading up on some basic digital signal processing (DSP) and how the Javasound API is organized. Start with how PCM works (e.g. en.wikipedia.org/wiki/Pulse-code_modulation). All the above code does is to read sound into a byte buffer, decode the buffer into samples (each sample consists of two bytes!) and then transform it using the FFT. Because the FFT result is complex, magnitudes are computed. If you don't know what the FFT delivers, please read some primer on the web, e.g. math.stackexchange.com/q/1002.
– hendrik
Jan 2 at 15:48
1
I am new to programming at all, so I barely have any knowledge about all these things, that's what I meant with "new to java". I'll check out these links, thank you!
– Jannik
Jan 2 at 19:48
First of all, thank you for your edit, I feel like I understand some of the processes for the first time. But I don't really get how to "aggregate multiple of those1024
samples into one value". And why is the frequency bin for11025
atm[128]
and not atm[256]
(11025 / 43 ~ 256)
?
– Jannik
Jan 3 at 12:06
1
Sorry, my mistake. Fixed it in the answer. With aggregate I mean: simply average a bunch of consecutive values (i.e. avg each frequency) to get more stability.
– hendrik
Jan 3 at 12:41
|
show 4 more comments
Thank you for your answer, but since I'm new to Java it's really hard for me to understand all the different methods and functions. It would be great if you could try to explain me, what I have to do in order to test if the volume of a certain frequency > value. I don't have enough reputation yet, but I'll vote your answer up as soon as I'm able to do that!
– Jannik
Jan 2 at 15:37
@Jannik I don't think this is really a Java issue. I'd recommend reading up on some basic digital signal processing (DSP) and how the Javasound API is organized. Start with how PCM works (e.g. en.wikipedia.org/wiki/Pulse-code_modulation). All the above code does is to read sound into a byte buffer, decode the buffer into samples (each sample consists of two bytes!) and then transform it using the FFT. Because the FFT result is complex, magnitudes are computed. If you don't know what the FFT delivers, please read some primer on the web, e.g. math.stackexchange.com/q/1002.
– hendrik
Jan 2 at 15:48
1
I am new to programming at all, so I barely have any knowledge about all these things, that's what I meant with "new to java". I'll check out these links, thank you!
– Jannik
Jan 2 at 19:48
First of all, thank you for your edit, I feel like I understand some of the processes for the first time. But I don't really get how to "aggregate multiple of those1024
samples into one value". And why is the frequency bin for11025
atm[128]
and not atm[256]
(11025 / 43 ~ 256)
?
– Jannik
Jan 3 at 12:06
1
Sorry, my mistake. Fixed it in the answer. With aggregate I mean: simply average a bunch of consecutive values (i.e. avg each frequency) to get more stability.
– hendrik
Jan 3 at 12:41
Thank you for your answer, but since I'm new to Java it's really hard for me to understand all the different methods and functions. It would be great if you could try to explain me, what I have to do in order to test if the volume of a certain frequency > value. I don't have enough reputation yet, but I'll vote your answer up as soon as I'm able to do that!
– Jannik
Jan 2 at 15:37
Thank you for your answer, but since I'm new to Java it's really hard for me to understand all the different methods and functions. It would be great if you could try to explain me, what I have to do in order to test if the volume of a certain frequency > value. I don't have enough reputation yet, but I'll vote your answer up as soon as I'm able to do that!
– Jannik
Jan 2 at 15:37
@Jannik I don't think this is really a Java issue. I'd recommend reading up on some basic digital signal processing (DSP) and how the Javasound API is organized. Start with how PCM works (e.g. en.wikipedia.org/wiki/Pulse-code_modulation). All the above code does is to read sound into a byte buffer, decode the buffer into samples (each sample consists of two bytes!) and then transform it using the FFT. Because the FFT result is complex, magnitudes are computed. If you don't know what the FFT delivers, please read some primer on the web, e.g. math.stackexchange.com/q/1002.
– hendrik
Jan 2 at 15:48
@Jannik I don't think this is really a Java issue. I'd recommend reading up on some basic digital signal processing (DSP) and how the Javasound API is organized. Start with how PCM works (e.g. en.wikipedia.org/wiki/Pulse-code_modulation). All the above code does is to read sound into a byte buffer, decode the buffer into samples (each sample consists of two bytes!) and then transform it using the FFT. Because the FFT result is complex, magnitudes are computed. If you don't know what the FFT delivers, please read some primer on the web, e.g. math.stackexchange.com/q/1002.
– hendrik
Jan 2 at 15:48
1
1
I am new to programming at all, so I barely have any knowledge about all these things, that's what I meant with "new to java". I'll check out these links, thank you!
– Jannik
Jan 2 at 19:48
I am new to programming at all, so I barely have any knowledge about all these things, that's what I meant with "new to java". I'll check out these links, thank you!
– Jannik
Jan 2 at 19:48
First of all, thank you for your edit, I feel like I understand some of the processes for the first time. But I don't really get how to "aggregate multiple of those
1024
samples into one value". And why is the frequency bin for 11025
at m[128]
and not at m[256]
(11025 / 43 ~ 256)
?– Jannik
Jan 3 at 12:06
First of all, thank you for your edit, I feel like I understand some of the processes for the first time. But I don't really get how to "aggregate multiple of those
1024
samples into one value". And why is the frequency bin for 11025
at m[128]
and not at m[256]
(11025 / 43 ~ 256)
?– Jannik
Jan 3 at 12:06
1
1
Sorry, my mistake. Fixed it in the answer. With aggregate I mean: simply average a bunch of consecutive values (i.e. avg each frequency) to get more stability.
– hendrik
Jan 3 at 12:41
Sorry, my mistake. Fixed it in the answer. With aggregate I mean: simply average a bunch of consecutive values (i.e. avg each frequency) to get more stability.
– hendrik
Jan 3 at 12:41
|
show 4 more comments
I think hendrik has the basic plan, but I hear your pain about understanding the process of getting there!
I assume you are getting your byte array via a TargetDataLine
and it is returning bytes. Converting the bytes to floats will take a bit of manipulation, and depend upon the AudioFormat
. A typical format has 44100 frames per second, and 16-bit encoding (two bytes to form one data point) and stereo. This would mean 4 bytes make up a single frame consisting of a left and a right value.
Example code that shows how to read and handle the incoming stream of individual bytes can be found in the java audio tutorial Using Files and Format Converters. Scroll down to the first "code snippet" in the section "Reading Sound Files". The key point where you would convert the incoming data to floats occurs at the spot marked as follows:
// Here, do something useful with the audio data that's
// now in the audioBytes array...
At this point you can take the two bytes (assuming 16-bit encoding) and append them into a single short, and scale the value to a normalized float (range from -1 to 1). There are several StackOverflow questions that show algorithms for doing this conversion.
You may have to also go through a process editing where the sample code reads from an AudioInputStream
(as in the example) vs. a TargetDataLine
, but I think if that poses a problem, there are also StackOverflow questions that can help with that.
For the FFTFactory recommended by hendrik, I suspect that using the transform method with just a float for input will suffice. But I haven't gotten into the details or tried running this myself yet. (It looks promising. I suspect a search might also uncover other FFT libraries with more complete documentation. I recall something being available perhaps from MIT. I'm probably only a couple steps ahead of you technically.)
In any event, at the point above where the conversion happens, you can add to the input array for transform() until it is full, and on that iteration call the transform() method.
Interpreting the output from the method might be best accomplished on a separate thread. I'm thinking, hand off the results of the FFT call, or hand off the transform() call itself via some sort of loose coupling. (Are you familiar with this term and multi-threaded coding?)
Significant insights into how Java encodes sound and sound formats can be found in tutorials that directly precede the one linked above.
Another great resource, if you want to better understand how to interpret FFT results, can be found as a free download: "The Scientists and Engineers Guide to DSP"
Thank you! Yes, my problem is that I don't know how to use FFT. Neither how to provide it with the right data, nor how to interpret it's results. I think I'll have to take some time and read how the FFT works, but for me it's really hard to understand..
– Jannik
Jan 3 at 1:44
1
For good reason. Both the book and the Java Sound tutorials were very difficult reads for me as well. I hoped I helped with a couple of the steps and got you pointed in the right direction for the rest. Don't be discouraged if it takes multiple passes. I hang out at Java-gaming.org and that forum has a section for Sound/Audio which might also be a source of help. Last thought, check out the frequency analysis tool that comes free with Audacity! It's pretty cool and using it might add insights into how it all works.
– Phil Freihofner
Jan 3 at 8:13
I'll check out the things that you mentioned, thanks!
– Jannik
Jan 3 at 10:32
add a comment |
I think hendrik has the basic plan, but I hear your pain about understanding the process of getting there!
I assume you are getting your byte array via a TargetDataLine
and it is returning bytes. Converting the bytes to floats will take a bit of manipulation, and depend upon the AudioFormat
. A typical format has 44100 frames per second, and 16-bit encoding (two bytes to form one data point) and stereo. This would mean 4 bytes make up a single frame consisting of a left and a right value.
Example code that shows how to read and handle the incoming stream of individual bytes can be found in the java audio tutorial Using Files and Format Converters. Scroll down to the first "code snippet" in the section "Reading Sound Files". The key point where you would convert the incoming data to floats occurs at the spot marked as follows:
// Here, do something useful with the audio data that's
// now in the audioBytes array...
At this point you can take the two bytes (assuming 16-bit encoding) and append them into a single short, and scale the value to a normalized float (range from -1 to 1). There are several StackOverflow questions that show algorithms for doing this conversion.
You may have to also go through a process editing where the sample code reads from an AudioInputStream
(as in the example) vs. a TargetDataLine
, but I think if that poses a problem, there are also StackOverflow questions that can help with that.
For the FFTFactory recommended by hendrik, I suspect that using the transform method with just a float for input will suffice. But I haven't gotten into the details or tried running this myself yet. (It looks promising. I suspect a search might also uncover other FFT libraries with more complete documentation. I recall something being available perhaps from MIT. I'm probably only a couple steps ahead of you technically.)
In any event, at the point above where the conversion happens, you can add to the input array for transform() until it is full, and on that iteration call the transform() method.
Interpreting the output from the method might be best accomplished on a separate thread. I'm thinking, hand off the results of the FFT call, or hand off the transform() call itself via some sort of loose coupling. (Are you familiar with this term and multi-threaded coding?)
Significant insights into how Java encodes sound and sound formats can be found in tutorials that directly precede the one linked above.
Another great resource, if you want to better understand how to interpret FFT results, can be found as a free download: "The Scientists and Engineers Guide to DSP"
Thank you! Yes, my problem is that I don't know how to use FFT. Neither how to provide it with the right data, nor how to interpret it's results. I think I'll have to take some time and read how the FFT works, but for me it's really hard to understand..
– Jannik
Jan 3 at 1:44
1
For good reason. Both the book and the Java Sound tutorials were very difficult reads for me as well. I hoped I helped with a couple of the steps and got you pointed in the right direction for the rest. Don't be discouraged if it takes multiple passes. I hang out at Java-gaming.org and that forum has a section for Sound/Audio which might also be a source of help. Last thought, check out the frequency analysis tool that comes free with Audacity! It's pretty cool and using it might add insights into how it all works.
– Phil Freihofner
Jan 3 at 8:13
I'll check out the things that you mentioned, thanks!
– Jannik
Jan 3 at 10:32
add a comment |
I think hendrik has the basic plan, but I hear your pain about understanding the process of getting there!
I assume you are getting your byte array via a TargetDataLine
and it is returning bytes. Converting the bytes to floats will take a bit of manipulation, and depend upon the AudioFormat
. A typical format has 44100 frames per second, and 16-bit encoding (two bytes to form one data point) and stereo. This would mean 4 bytes make up a single frame consisting of a left and a right value.
Example code that shows how to read and handle the incoming stream of individual bytes can be found in the java audio tutorial Using Files and Format Converters. Scroll down to the first "code snippet" in the section "Reading Sound Files". The key point where you would convert the incoming data to floats occurs at the spot marked as follows:
// Here, do something useful with the audio data that's
// now in the audioBytes array...
At this point you can take the two bytes (assuming 16-bit encoding) and append them into a single short, and scale the value to a normalized float (range from -1 to 1). There are several StackOverflow questions that show algorithms for doing this conversion.
You may have to also go through a process editing where the sample code reads from an AudioInputStream
(as in the example) vs. a TargetDataLine
, but I think if that poses a problem, there are also StackOverflow questions that can help with that.
For the FFTFactory recommended by hendrik, I suspect that using the transform method with just a float for input will suffice. But I haven't gotten into the details or tried running this myself yet. (It looks promising. I suspect a search might also uncover other FFT libraries with more complete documentation. I recall something being available perhaps from MIT. I'm probably only a couple steps ahead of you technically.)
In any event, at the point above where the conversion happens, you can add to the input array for transform() until it is full, and on that iteration call the transform() method.
Interpreting the output from the method might be best accomplished on a separate thread. I'm thinking, hand off the results of the FFT call, or hand off the transform() call itself via some sort of loose coupling. (Are you familiar with this term and multi-threaded coding?)
Significant insights into how Java encodes sound and sound formats can be found in tutorials that directly precede the one linked above.
Another great resource, if you want to better understand how to interpret FFT results, can be found as a free download: "The Scientists and Engineers Guide to DSP"
I think hendrik has the basic plan, but I hear your pain about understanding the process of getting there!
I assume you are getting your byte array via a TargetDataLine
and it is returning bytes. Converting the bytes to floats will take a bit of manipulation, and depend upon the AudioFormat
. A typical format has 44100 frames per second, and 16-bit encoding (two bytes to form one data point) and stereo. This would mean 4 bytes make up a single frame consisting of a left and a right value.
Example code that shows how to read and handle the incoming stream of individual bytes can be found in the java audio tutorial Using Files and Format Converters. Scroll down to the first "code snippet" in the section "Reading Sound Files". The key point where you would convert the incoming data to floats occurs at the spot marked as follows:
// Here, do something useful with the audio data that's
// now in the audioBytes array...
At this point you can take the two bytes (assuming 16-bit encoding) and append them into a single short, and scale the value to a normalized float (range from -1 to 1). There are several StackOverflow questions that show algorithms for doing this conversion.
You may have to also go through a process editing where the sample code reads from an AudioInputStream
(as in the example) vs. a TargetDataLine
, but I think if that poses a problem, there are also StackOverflow questions that can help with that.
For the FFTFactory recommended by hendrik, I suspect that using the transform method with just a float for input will suffice. But I haven't gotten into the details or tried running this myself yet. (It looks promising. I suspect a search might also uncover other FFT libraries with more complete documentation. I recall something being available perhaps from MIT. I'm probably only a couple steps ahead of you technically.)
In any event, at the point above where the conversion happens, you can add to the input array for transform() until it is full, and on that iteration call the transform() method.
Interpreting the output from the method might be best accomplished on a separate thread. I'm thinking, hand off the results of the FFT call, or hand off the transform() call itself via some sort of loose coupling. (Are you familiar with this term and multi-threaded coding?)
Significant insights into how Java encodes sound and sound formats can be found in tutorials that directly precede the one linked above.
Another great resource, if you want to better understand how to interpret FFT results, can be found as a free download: "The Scientists and Engineers Guide to DSP"
edited Jan 2 at 21:01
answered Jan 2 at 20:46
Phil FreihofnerPhil Freihofner
3,84811028
3,84811028
Thank you! Yes, my problem is that I don't know how to use FFT. Neither how to provide it with the right data, nor how to interpret it's results. I think I'll have to take some time and read how the FFT works, but for me it's really hard to understand..
– Jannik
Jan 3 at 1:44
1
For good reason. Both the book and the Java Sound tutorials were very difficult reads for me as well. I hoped I helped with a couple of the steps and got you pointed in the right direction for the rest. Don't be discouraged if it takes multiple passes. I hang out at Java-gaming.org and that forum has a section for Sound/Audio which might also be a source of help. Last thought, check out the frequency analysis tool that comes free with Audacity! It's pretty cool and using it might add insights into how it all works.
– Phil Freihofner
Jan 3 at 8:13
I'll check out the things that you mentioned, thanks!
– Jannik
Jan 3 at 10:32
add a comment |
Thank you! Yes, my problem is that I don't know how to use FFT. Neither how to provide it with the right data, nor how to interpret it's results. I think I'll have to take some time and read how the FFT works, but for me it's really hard to understand..
– Jannik
Jan 3 at 1:44
1
For good reason. Both the book and the Java Sound tutorials were very difficult reads for me as well. I hoped I helped with a couple of the steps and got you pointed in the right direction for the rest. Don't be discouraged if it takes multiple passes. I hang out at Java-gaming.org and that forum has a section for Sound/Audio which might also be a source of help. Last thought, check out the frequency analysis tool that comes free with Audacity! It's pretty cool and using it might add insights into how it all works.
– Phil Freihofner
Jan 3 at 8:13
I'll check out the things that you mentioned, thanks!
– Jannik
Jan 3 at 10:32
Thank you! Yes, my problem is that I don't know how to use FFT. Neither how to provide it with the right data, nor how to interpret it's results. I think I'll have to take some time and read how the FFT works, but for me it's really hard to understand..
– Jannik
Jan 3 at 1:44
Thank you! Yes, my problem is that I don't know how to use FFT. Neither how to provide it with the right data, nor how to interpret it's results. I think I'll have to take some time and read how the FFT works, but for me it's really hard to understand..
– Jannik
Jan 3 at 1:44
1
1
For good reason. Both the book and the Java Sound tutorials were very difficult reads for me as well. I hoped I helped with a couple of the steps and got you pointed in the right direction for the rest. Don't be discouraged if it takes multiple passes. I hang out at Java-gaming.org and that forum has a section for Sound/Audio which might also be a source of help. Last thought, check out the frequency analysis tool that comes free with Audacity! It's pretty cool and using it might add insights into how it all works.
– Phil Freihofner
Jan 3 at 8:13
For good reason. Both the book and the Java Sound tutorials were very difficult reads for me as well. I hoped I helped with a couple of the steps and got you pointed in the right direction for the rest. Don't be discouraged if it takes multiple passes. I hang out at Java-gaming.org and that forum has a section for Sound/Audio which might also be a source of help. Last thought, check out the frequency analysis tool that comes free with Audacity! It's pretty cool and using it might add insights into how it all works.
– Phil Freihofner
Jan 3 at 8:13
I'll check out the things that you mentioned, thanks!
– Jannik
Jan 3 at 10:32
I'll check out the things that you mentioned, thanks!
– Jannik
Jan 3 at 10:32
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997426%2fjava-how-to-get-current-frequency-of-audio-input%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What do you have for "audio input", for example do you have a list of voltages sampled at a given frequency? Also you would need to quantify what you mean by "frequency" of the audio input, for example, noise would have some component at every frequency, so you need to define some criteria. In general this is not a trivial problem.
– nPn
Jan 1 at 17:59
I use the Java Sound API to capture the audio and store it in an byte array. By "frequency" I mean the volume at a certain frequency (Hz).
– Jannik
Jan 1 at 18:15
I'm not one (usually) to recommend Java-FX, but in this case, they have one built in via the AudioSpectrumListener..
– Andrew Thompson
Jan 1 at 18:43
I'm using JavaFX for the Gui anyway so that may be a good solution, I'll take a look at it. Thank you!
– Jannik
Jan 1 at 19:40
I think that is not only a frequency, you should calculate energy of sound, search about rms
– amin saffar
Jan 1 at 22:08