MP3 (MPEG-1 Audio Layer 3) is a lossy compressed digital audio format developed by MPEG (Moving Picture Experts Group) to be part of version 1 of the MPEG video format.
What is an MP3 File and Extension?
The standard value of MP3 is 144 kHz and 317 kbps for the quality/aspect ratio. This term is short for MPEG-1 Audio Layer 3 and should not be confused with an MP3 player.
History
The MP3 extension was developed by Karlheinz Brandenburg, electronic media technologies director of the Fraunhofer IIS Institute, which is part of the Fraunhofer-Gesellschaft German research center network, which, along with Thomson Multimedia, controls most of the MP3-related patents.
The first version was released in 1986, but the improved versions continued in 1991. But Brandenburg used the .mp3 extension for the first time in July 1995 for the MP3-related files it kept on its computer.
A year later, his institute paid 1.2 million euros for patents, and ten years later, this amount reached 26.1 million.
MP3 has become the standard 12 or even 15 times less than the original uncompressed file for audio streaming and high-quality sound, thanks to the ability to adjust the compression quality in proportion to the size bit rate per second and, therefore, the final size adjustment the file can cover.
It is the first audio compression format that has become famous thanks to the Internet, as it enables the exchange of music files.
As a result of the ease of sharing such files, legal procedures were initiated for companies such as Napster and AudioGalaxy.
After the development of autonomous, portable, or integrated players in stereo channels (stereo), the MP3 format became an indispensable element of the computer world.
At the beginning of 2002, other compressed audio formats, such as Windows Media Audio and Ogg Vorbis, began to be included mainly in programs, operating systems, and players. As a result, it was predicted that MP3 would not be used slowly over other formats.
One of the factors affecting the fall of the MP3 audio format is patent issues.
Technically, it does not mean that its quality is low or superior. Still, it prevents the community from continuing to improve it and can force it to pay for the use of some codecs, something that happens with MP3 players.
Still, the MP3 file format and extension are among the most used and successful inventions today.
There are differences between the MPEG-1 and MPEG-2 standards, called the hybrid filter bank, which makes the audio format more complex.
This improvement in frequency resolution brings problems before predicted and corrected echo, worsens temporal resolution, and provides sound quality at low speeds up to 64 kbps.
Filter Bank
The filter bank used in this layer is a hybrid multi-phase MDCT filter bank.
It is responsible for the frequency mapping of the time domain for both encoder and decoder reconstruction filters.
Variable frequency resolution provides 6×32 or 18×32 subbands and is much better adjusted to critical bands of different frequencies.
When 18 points are used, the maximum number of frequency components is 32 x 18 = 576. As a result, 24000/576 = 41.67 Hz frequency resolution is obtained.
If 6 frequency lines are used, the frequency resolution is lower. Still, the temporal resolution is higher and applied in areas where sudden silence transitions are expected at high energy levels of pre-echo effects.
Layer III has three working block modes where all 32 filter bank outputs can go through windows and MDCT transforms, and a mixed block mode where the lowest two frequency bands use long blocks and the first 30 bands short blocks.
Specifies four types of windows for the particular state of MPEG-1 Audio Layer 3, which means the third audio layer for the MPEG-1 standard: normal, long window to short window (START), 3 short windows (SHORT) and a short window to long window (STOP).
File Structure
An MP3 file consists of different frames, including a title and data, and this data is called an elementary stream.
Each frame is independent; that is, a person can cut the frames of an MP3 file and then play them on any MP3 player.
The title consists of a synchronization word used to indicate the beginning of a valid frame.
Psychoacoustic Model
Compression is based on reducing the irrelevant dynamic range, that is, the hearing system’s inability to detect quantization errors under masking conditions.
This standard divides the signal into frequency bands that approach critical bands and then quantifies each subband according to the noise detection threshold within that band.
The psychoacoustic model is a change and uses a method called polynomial prediction.
It analyzes the audio signal and calculates the amount of noise that can be inserted as a function of frequency, that is, the masking amount/threshold as a function of frequency.
The encoder uses this information to decide the best way to spend the available bits.
This standard provides two psychoacoustic models, in which different complexity models, such as Model I, are less complex than the psychoacoustic Model II and greatly simplify calculations.
In this case, the distortion produced cannot be detected in the experienced ear in an optimal environment under 256 kbps and normal conditions.
For an inexperienced or common ear of up to 128 kbps or 96 kbps, you can hear a good sound as long as you do not have a sound quality where the bass deficiency is overly noticeable and the treble is loud.
People who have a lot of music and experience can hear the sound better than 192 or 256 kbps. Music circulating on the Internet is often encoded from 128 to 192 kbps.
Coding
The distribution of bits or noise by this standard is done in a repetition cycle consisting of an internal and external loop.
It examines both the filter bank output samples and the signal-mask ratio (SMR) provided by the psychoacoustic model, and a bit of noise can be adjusted depending on the scheme used to meet the bitrate requirements and masking simultaneously.
Internal Loop
The internal loop performs unequal quantization relative to the floating-point system, where each MDCT spectral value is raised to 3/4 power.
The cycle selects a specific quantization range, and Huffman encoding is applied to the quantized data in the next block.
The loop ends when the quantized values encoded by Huffman use less or an equal number of bits than the maximum number of allowed bits.
External Loop
The outer loop is now responsible for verifying whether the scale factor for each subband has more distortion than allowed. It compares each band of the scale factor with the data previously calculated in the psychoacoustic analysis.
The external loop ends when there is not much noise in both scale factor bands or when the next iteration increases one of the bands more than is allowed. As a result, the bands were amplified at least once.
BitStream Packaging
This block takes the samples quantized together with the audio data from the filter bank and stores the encoded audio and some additional data in the frames.
Each frame contains information from 1152 audio samples and consists of a header, audio data, and error checking and auxiliary data by the CRC.
The title explains which layer, bitrate, and sample rate are used for encoded audio.
Frames start with the exact synchronization and differentiation header, and their length can vary.
In addition to dealing with this information, it includes variable-length Huffman coding, a coding method that eliminates redundancy without a loss of information.
It moves at the end of compression to encode the information.
Variable-length methods are often characterized by assigning short words to the most common events, leaving long words for the most common.