Comparing similiar audio files, FFT?

kieran · Oct 8, 2008

Hello,
I am trying to compare two similar audio files (WAV). From what i have
read i need to sample both audio files at certain frequencies and run
these through a FFT and then compare the results. Can anyone advise me
if this is the correct approach and also describe the steps i need to
take to get to the stage where I can compare the files.
TIA,
Kieran

Steve · Oct 8, 2008

That is one way to compare the files. But what are you trying to do with the
comparison?

You need to do classical DSP work.

Use a low pass filter to prevent aliasing.
Take a binary number of samples (ie: 128, 256, 512, 1024 ...)
Run an FFT on the samples, this will give you frequency domain
data from your time domain data. Each data point is refered to as a bin
and the frequencies that fall into that bin depends on the clock frequency
of the samples.

Plot the spectrum of the 2 audio files.

Even Excel can do FFTs but it is not obvious what it is doing unless you are
familiar with FFTs
Maybe there is some free FFT software you can grab.

Jan Panteltje · Oct 8, 2008

Take a look at http://www.libinst.com:80/Audio DiffMaker.htm

... the one that tells the truth about green pens and all that!

Chris

Look like a copycat in 2008 of what I wrote around 2000 (Linux):
http://panteltje.com/panteltje/dvd/substract_wave-0.3.tgz
I used this to cancel common background in translated tracks.

It also aligns, matches amplitude, and substracts.
Wrote quite a few more audio utilities actually, most are here:
http://panteltje.com/panteltje/dvd/index.html

You would still need to understand digital audio and audio in general
to use these of course.

JosephKK · Oct 9, 2008

Hello,
I am trying to compare two similar audio files (WAV). From what i have
read i need to sample both audio files at certain frequencies and run
these through a FFT and then compare the results. Can anyone advise me
if this is the correct approach and also describe the steps i need to
take to get to the stage where I can compare the files.
TIA,
Kieran

Maybe the other suggestions are good enough, personally i suspect that
tempo adjusting software with at least one dial to keep them
synchronized, put one signal in each ear and listen. The brains
software is much better than any available package.

miso@sushi.com · Oct 9, 2008

Hello,
I am trying to compare two similar audio files (WAV). From what i have
read i need to sample both audio files at certain frequencies and run
these through a FFT and then compare the results. Can anyone advise me
if this is the correct approach and also describe the steps i need to
take to get to the stage where I can compare the files.
TIA,
Kieran

You failed to indicate the criteria of the comparison. Just what in
these files do you want to compare?

kieran · Oct 9, 2008

That is one way to compare the files. But what are you trying to do with the
comparison?

You need to do classical DSP work.

Use a low pass filter to prevent aliasing.
Take a binary number of samples (ie: 128, 256, 512, 1024 ...)
Run an FFT on the samples, this will give you frequency domain
data from your time domain data. Each data point is refered to as a bin
and the frequencies that fall into that bin depends on the clock frequency
of the samples.

Plot the spectrum of the 2 audio files.

Even Excel can do FFTs but it is not obvious what it is doing unless you are
familiar with FFTs
Maybe there is some free FFT software you can grab.

Hi TIA,
This seems to be a good approach.
What I am trying to do is to automate the comparison of audio files.
The two files I will be comparing will be audio recorded from an IVR
system. The first file will be a high quality recording, checked by
ear, the second file will be recorded evey hour to ensure the IVR is
working correctly, ie if the two files sound similarI can consider the
IVR to be working.
I will give this a go and let you know teh results.
Thanks for your help,
Kieran

Steve · Oct 9, 2008

Could you put a test mode in your IVR?
Perhaps have it respond with something easy to detect like DTMF?

.....or perhaps figure out a way to subtract the one recording from the other
and except for some gain adjust and phase offset the results should be
a close to silence. Calculate the amplitude of the results and see that it
is low.

whit3rd · Oct 9, 2008

I am trying to compare two similar audio files (WAV). From what i have
read i need to sample both audio files at certain frequencies and run
these through a FFT and then compare the results.

Well, yeah, but... what's the similarity criterion?

In some sense, an FFT will tell you the voice of the singer
or the instrument(s) but might not distinguish multiple
works of different composition performed on the same
instrument. Similarly, a time/amplitude breakdown
might pick up the 'Surprise' symphony easily from
other works, but can't tell you whether it was performed
by an orchestra or a kazoo band.

A two-minute selection from a CD has 10 million samples,
and that means it selects a point in a 10-million-dimension
vector space. What makes two such points similar?

Le Chaud Lapin · Oct 10, 2008

Could you put a test mode in your IVR?
Perhaps have it respond with something easy to detect like DTMF?

....or perhaps figure out a way to subtract the one recording from the other
and except for some gain adjust and phase offset the results should be
a close to silence. Calculate the amplitude of the results and see that it
is low.

There is simple problem with this: there is no way to adjust the phase
because phase only make sense in context of periodic signals. A time
domain signal as above is not periodic, but one can pluck components
from frequency domain from each signal and look at their phases.

In other words, if a speaker is offered $100US if s/he can create the
same sampled digital signal, more or less, by speaking into IVR, such
that only by shifting signal2 a bit relative to signal1 he is able to
get the signals properly aligned for comparison, he will fail. The
reason is that, even at the relatively low sample rate of 8kHz, no
human is able to begin speaking just at the right instant, let alone
control the physiology of speech path to generate more-or-less the
exact same signal. Any attempt to find out when a signal begins is
hopeless in the time domain. Is it the first non-zero sample? The
second? Third? Is that noise or voice? Is it when the "hump" is really
high? Almost really high? One cannot know.

This is classical problem in speech recognition and related areas. I
responded to OP in comp.dsp with outline of what he needs to do:

http://tinyurl.com/4568b3

-Le Chaud Lapin-

Steve · Oct 10, 2008

I didn't think he was trying to use a human in this instance
and that the IVR is playing the exact same speech
each time. So would you not be able to do a cross-correlation?

Could you put a test mode in your IVR?
Perhaps have it respond with something easy to detect like DTMF?

....or perhaps figure out a way to subtract the one recording from the
other
and except for some gain adjust and phase offset the results should be
a close to silence. Calculate the amplitude of the results and see that it
is low.

There is simple problem with this: there is no way to adjust the phase
because phase only make sense in context of periodic signals. A time
domain signal as above is not periodic, but one can pluck components
from frequency domain from each signal and look at their phases.

In other words, if a speaker is offered $100US if s/he can create the
same sampled digital signal, more or less, by speaking into IVR, such
that only by shifting signal2 a bit relative to signal1 he is able to
get the signals properly aligned for comparison, he will fail. The
reason is that, even at the relatively low sample rate of 8kHz, no
human is able to begin speaking just at the right instant, let alone
control the physiology of speech path to generate more-or-less the
exact same signal. Any attempt to find out when a signal begins is
hopeless in the time domain. Is it the first non-zero sample? The
second? Third? Is that noise or voice? Is it when the "hump" is really
high? Almost really high? One cannot know.

This is classical problem in speech recognition and related areas. I
responded to OP in comp.dsp with outline of what he needs to do:

http://tinyurl.com/4568b3

-Le Chaud Lapin-

Le Chaud Lapin · Oct 14, 2008

I didn't think he was trying to use a human in this instance
and that the IVR is playing the exact same speech
each time. So would you not be able to do a cross-correlation?

Yes, I guess that would work too, as long as the signals are
normalized first, as you pointed out in youir 2nd post.

You got me thinking about the pros and cons of the cross-correlation
method versus mean-squared-error method, and minimum distance
estimator.

-Le Chaud Lapin-

kieran · Oct 15, 2008

Yes, I guess that would work too, as long as the signals are
normalized first, as you pointed out in youir 2nd post.

You got me thinking about the pros and cons of the cross-correlation
method versus mean-squared-error method, and minimum distance
estimator.

-Le Chaud Lapin-

Hi all,
thanks for your posts. They have helped me a great deal and have
definatly steered me in the right dirtection.
Some more info:
I should have explained that I am comparing the same recording of the
voice but the differences I am trying to identify are caused by
interference from the mobile phone network. ie lost audio and noise.
I will be listning to one of the samples (the master or reference), by
ear to ensure the recording is clear and without interference. I will
then record the same piece of audio at various times through out the
day and compare it to the master. The comparison should identify which
recordings are of high quality (low interference) and identify the
recordings that are of low quality(lots of interference and lost
audio).
Kieran

JosephKK · Oct 16, 2008

Hi all,
thanks for your posts. They have helped me a great deal and have
definatly steered me in the right dirtection.
Some more info:
I should have explained that I am comparing the same recording of the
voice but the differences I am trying to identify are caused by
interference from the mobile phone network. ie lost audio and noise.
I will be listning to one of the samples (the master or reference), by
ear to ensure the recording is clear and without interference. I will
then record the same piece of audio at various times through out the
day and compare it to the master. The comparison should identify which
recordings are of high quality (low interference) and identify the
recordings that are of low quality(lots of interference and lost
audio).
Kieran

Ooh. In that case maybe you should look for audio forensics software.
I hear diamond cut AC5 can be useful.

JosephKK · Oct 16, 2008

Hi all,
thanks for your posts. They have helped me a great deal and have
definatly steered me in the right dirtection.
Some more info:
I should have explained that I am comparing the same recording of the
voice but the differences I am trying to identify are caused by
interference from the mobile phone network. ie lost audio and noise.
I will be listning to one of the samples (the master or reference), by
ear to ensure the recording is clear and without interference. I will
then record the same piece of audio at various times through out the
day and compare it to the master. The comparison should identify which
recordings are of high quality (low interference) and identify the
recordings that are of low quality(lots of interference and lost
audio).
Kieran

This appears to be the successor to the produce i heard about:

http://www.enhancedaudio.com/dc_live.htm

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

Comparing similiar audio files, FFT?

Comparing similiar audio files, FFT?

kieran

Steve

Jan Panteltje

JosephKK

[email protected]

kieran

Steve

whit3rd

Le Chaud Lapin

Steve

Le Chaud Lapin

kieran

JosephKK

JosephKK

Similar threads