Paul Burridge said:
Hi,
In multiplex comms systems, what's the minimum sampling percentage of
plain speech necessary to make it intelligible at the other end?
Thanks,
p.
I'll confess to being confused by the question -- *percentage* of plain
speech.
If you use a 2400Hz bandwidth (used in ham radio SSB, not hi-fi, but it
works), then Nyquist says 4800 samples/second. The sampling percentage
goes to zero as the aperture time of the s/h goes to zero.
So, 4800 samples/second, before any games (coding, compression,
modeling, etc). You can make the most of available bandwidth, or
reduce bits/second using coding techniques (mu-law, adpcm), compression
(all sorts of techniques), modeling (12-point LPC is great for one
voice at a time).
War story:
Once upon a time, a long time ago, the (DARPA) Network Speech
Compression project investigated sending speech over the (then
ARPA)net. We used a 12-point LPC (Linear-predictor-Corrector) running
on a very advanced DEC PDP 11/45 with a butterfly box attached to do
the hard work. The LPC modeled the vocal tract (see, for example,
Markel and Gray). The digitized audio waveform (recorded in a very
quiet sound booth, more on that shortly) goes into the model. Rather
than sending the digitized audio over the net, the model parameters are
transmitted, resulting in an amazing drop in data rate. The LPC
modeling approach was also used by the TI Speak-N-Spell toys; a LOT of
preprocessing gives you a very low data rate. Reconstruction is fairly
easy.
But there's a price to be paid, and it isn't just in the amount of
preprocessing required (which is trivial today, but a pain in the ass
in the 70's). Remember, the LPC models the vocal tract. So if someone
comes into the booth and knocks a book into a metal trashcan, or slams
the door while the system is live, what the listener on the other end
gets is that sound -- as imitated by the human vocal tract! Two people
speaking at the same time? Comes out as one vocal tract trying to
create that sound (and not doing too well at it)!
Conference calls? No "easy" way to sum multiple LPC datastreams. Oh,
you can reconstruct them all to audio samples, and then sum, but you
can't re-encode and transmit as one LPC -- as all those sounds are
going to be modeled coming from *one* vocal tract. And when you
reconstruct multiple LPC streams and sum, it gets very confusing, since
a lot of what makes individual voices individual gets lost through the
LPC filtering process.
Still, it was a fun project and kept a bunch of us off the streets.