C
C3
- Jan 1, 1970
- 0
Suppose you have a recording of a human voice singing a song. The recording
has been sped up or slowed down, such that both the tempo and pitch have
changed. The aim is to detect as close as possible exactly how much you need
to compress or expand the waveform (to speed it up or slow it down) in order
to restore it to the pitch it was originally recorded.
The human voice has a limited range, so you could easily get it within this
range, just by knowing that most people would not be able to sing outside
this range. You also know that the song is sung in tune, in equal
temperament, so the pitches will need to align exactly to a set of defined
notes.
If you knew anything about the singing ability of the singer, you might also
be able to infer something based on how strained the singing of each note
is, but assume you only have the recording, and no prior information about
the singer.
How would you do it?
has been sped up or slowed down, such that both the tempo and pitch have
changed. The aim is to detect as close as possible exactly how much you need
to compress or expand the waveform (to speed it up or slow it down) in order
to restore it to the pitch it was originally recorded.
The human voice has a limited range, so you could easily get it within this
range, just by knowing that most people would not be able to sing outside
this range. You also know that the song is sung in tune, in equal
temperament, so the pitches will need to align exactly to a set of defined
notes.
If you knew anything about the singing ability of the singer, you might also
be able to infer something based on how strained the singing of each note
is, but assume you only have the recording, and no prior information about
the singer.
How would you do it?