Basic overview of audio file formats


An audio file format is a data format used for storing computer sounds, (music, voices, etc) in digital form. The industry has produced many formats for a primary or exclusive application or production, or preservation and dissemination.

The program element that transforms the signal file is called a codec, short for coder-decoder.

Psychoacoustic audio encodings. They reduce the amount of information transmitted by limiting the description of the signal to the part that humans can perceive. All audio files limit the frequencies sent to the human ear.

For a smaller rate, codecs can take advantage of masking effects, frequency and time of human hearing, and the low spectral discrimination heights in the top two octaves of hearing.

Pursuing a goal of reducing the amount of information, it may be necessary to define a permissible reproduction quality, which is distinct from the highest quality possible.

Some encodings work better with lengthy calculations, which take into account or in several passes, the entire audio segment and are thus unsuitable for applications in real time.

At various points encodings thus compromise the cost of production, the bit rate and perceptual quality. Currently, the most used codec is by far the mp3, wma monitoring, and AAC.

Many files use the format (RIFF), which may contain a number of diverse elements (chunks).

A header which occupies the first four bytes indicates the type RIFF, monitoring necessary information about the location of other elements, constructed recursively in the same way. These elements can contain any type of data.

Those that encode these elements indicate the sound codec in their header. The machine skips elements that it can not decode.

Some systems and human users can use the extension of the file name data conventionally indicates the file format. However, this indication, mostly indicates a list of possible codes, not the encoding itself.

Features audio encodings

Number of audio channels encoded : mono , stereo, multichannel .
Sampling frequency : number of samples per second used to describe numerically the signal representing the sound wave for each channel. Bandwidth depends heavily on this feature .
Resolution of each sample bits. The signal to noise ratio depends heavily on this feature .
Digital Flow : file size compared to the duration of the sound.
data compression or reduction of flow
with recovery of the initial waveform (entropy coding) , or
reconstitution with (more or less accurately) the sound impression (psychoacoustic coding) .
Computing power required to code .
Computing power needed to decode .
Structure allowing or not
to start playing the file when we not yet know the end
to play a file from the middle without knowing the beginning ,
jumping on a particular location ,
save ancillary and auxiliary data (metadata)
manage digital reproduction rights (DRM )
automatically adapt to the local listener .

Depending on the use to which the file is intended, certain characteristics are more important compared to others.

A format for the players :

Two channels are sufficient.
The flow rate should be reduced to save sufficiently long in the memories of players.
Computing power needed to decode must be low to allow proper autonomy of readers.
Bandwidth must be good for listening to music.
The signal to noise ratio does not need to be very good, because the consumer does not happen in quiet and listening for local .
Management of human reproduction interested producers.
The possibility of automatically adapting the listening room ( raising the level of quiet passages when the atmosphere is loud with ancillary data ) is an advantage.
The reconstruction of the waveform is useless.
The computing power required to code can be significant.

A format for film production :

It takes two to eight channels.
Bandwidth must be excellent , it will only get worse in the future.
The signal to noise ratio must be excellent and the reconstruction of the waveform is preferable
signals are called to be retouched , mixing, processing
final consumption is in and for silent listening space.

As an industrial activity :

Throughput and computing power as coding decoding that are almost indifferent.
Management rights of reproduction, adaptation and automatic listening room have no interest at this stage.

In a given format, the files can be broken down into several levels of quantification (8, 16 or 24 bits) with different sampling frequencies (eg 22.05 kHz , 44.1 kHz, 48 kHz , 88.2 kHz; 96 kHz , 176.4 kHz , 192 kHz) applied to a number of channels (mono, stereo , 5.1 surround , etc).

Formats that use the flow reduction by psychoacoustic coding offer various grades of coding, corresponding to more or less flow restriction .

The number of sound channels can be real and separate or mixed discreetly with main signals and will be decoded and returned later using specific algorithms (Surround) . When there is flow reduction , it can utilize the redundancy between channels.

About the Author