Lossy compression
In information technology, lossy compression or irreversible compression is the class of data encoding methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. The different versions of the photo of the cat to the right show how higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression which does not degrade the data. The amount of data reduction possible using lossy compression is much higher than through lossless techniques.
Well-designed lossy compression technology often reduces file sizes significantly before degradation is noticed by the end-user. Even when noticeable by the user, further data reduction may be desirable. The most widely used lossy compression algorithm is the discrete cosine transform, first published by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974. Recently, a new family of sinusoidal-hyperbolic transform functions, which have comparable properties and performance with DCT, have been proposed for lossy compression.
Lossy compression is most commonly used to compress multimedia data, especially in applications such as streaming media and internet telephony. By contrast, lossless compression is typically required for text and data files, such as bank records and text articles. It can be advantageous to make a master lossless file which can then be used to produce additional copies from. This allows one to avoid basing new compressed copies off of a lossy source file, which would yield additional artifacts and further unnecessary information loss.
Types
It is possible to compress many types of digital data in a way that reduces the size of a computer file needed to store it, or the bandwidth needed to transmit it, with no loss of the full information contained in the original file. A picture, for example, is converted to a digital file by considering it to be an array of dots and specifying the color and brightness of each dot. If the picture contains an area of the same color, it can be compressed without loss by saying "200 red dots" instead of "red dot, red dot,......, red dot."The original data contains a certain amount of information, and there is a lower limit to the size of file that can carry all the information. Basic information theory says that there is an absolute limit in reducing the size of this data. When data is compressed, its entropy increases, and it cannot increase indefinitely. As an intuitive example, most people know that a compressed ZIP file is smaller than the original file, but repeatedly compressing the same file will not reduce the size to nothing. Most compression algorithms can recognize when further compression would be pointless and would in fact increase the size of the data.
In many cases, files or data streams contain more information than is needed for a particular purpose. For example, a picture may have more detail than the eye can distinguish when reproduced at the largest size intended; likewise, an audio file does not need a lot of fine detail during a very loud passage. Developing lossy compression techniques as closely matched to human perception as possible is a complex task. Sometimes the ideal is a file that provides exactly the same perception as the original, with as much digital information as possible removed; other times, perceptible loss of quality is considered a valid trade-off for the reduced data.
The terms 'irreversible' and 'reversible' are preferred over 'lossy' and 'lossless' respectively for some applications, such as medical image compression, to circumvent the negative implications of 'loss'. The type and amount of loss can affect the utility of the images. Artifacts or undesirable effects of compression may be clearly discernible yet the result still useful for the intended purpose. Or lossy compressed images may be 'visually lossless', or in the case of medical images, so-called Diagnostically Acceptable Irreversible Compression may have been applied.
Transform coding
Some forms of lossy compression can be thought of as an application of transform coding, which is a type of data compression used for digital images, digital audio signals, and digital video. The transformation is typically used to enable better quantization. Knowledge of the application is used to choose information to discard, thereby lowering its bandwidth. The remaining information can then be compressed via a variety of methods. When the output is decoded, the result may not be identical to the original input, but is expected to be close enough for the purpose of the application.The most common form of lossy compression is a transform coding method, the discrete cosine transform, which was first published by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974. DCT is the most widely used form of lossy compression, for popular image compression formats, video coding standards and audio compression formats.
In the case of audio data, a popular form of transform coding is perceptual coding, which transforms the raw data to a domain that more accurately reflects the information content. For example, rather than expressing a sound file as the amplitude levels over time, one may express it as the frequency spectrum over time, which corresponds more accurately to human audio perception. While data reduction is a main goal of transform coding, it also allows other goals: one may represent data more accurately for the original amount of space – for example, in principle, if one starts with an analog or high-resolution digital master, an MP3 file of a given size should provide a better representation than a raw uncompressed audio in WAV or AIFF file of the same size. This is because uncompressed audio can only reduce file size by lowering bit rate or depth, whereas compressing audio can reduce size while maintaining bit rate and depth. This compression becomes a selective loss of the least significant data, rather than losing data across the board. Further, a transform coding may provide a better domain for manipulating or otherwise editing the data – for example, equalization of audio is most naturally expressed in the frequency domain rather than in the raw time domain.
From this point of view, perceptual encoding is not essentially about discarding data, but rather about a better representation of data. Another use is for backward compatibility and graceful degradation: in color television, encoding color via a luminance-chrominance transform domain means that black-and-white sets display the luminance, while ignoring the color information. Another example is chroma subsampling: the use of color spaces such as YIQ, used in NTSC, allow one to reduce the resolution on the components to accord with human perception – humans have highest resolution for black-and-white, lower resolution for mid-spectrum colors like yellow and green, and lowest for red and blues – thus NTSC displays approximately 350 pixels of luma per scanline, 150 pixels of yellow vs. green, and 50 pixels of blue vs. red, which are proportional to human sensitivity to each component.
Information loss
Lossy compression formats suffer from generation loss: repeatedly compressing and decompressing the file will cause it to progressively lose quality. This is in contrast with lossless data compression, where data will not be lost via the use of such a procedure. Information-theoretical foundations for lossy data compression are provided by rate-distortion theory. Much like the use of probability in optimal coding theory, rate-distortion theory heavily draws on Bayesian estimation and decision theory in order to model perceptual distortion and even aesthetic judgment.There are two basic lossy compression schemes:
- In lossy transform codecs, samples of picture or sound are taken, chopped into small segments, transformed into a new basis space, and quantized. The resulting quantized values are then entropy coded.
- In lossy predictive codecs, previous and/or subsequent decoded data is used to predict the current sound sample or image frame. The error between the predicted data and the real data, together with any extra information needed to reproduce the prediction, is then quantized and coded.
Comparison
The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any lossless method, while still meeting the requirements of the application. Lossy methods are most often used for compressing sound, images or videos. This is because these types of data are intended for human interpretation where the mind can easily "fill in the blanks" or see past very minor errors or inconsistencies – ideally lossy compression is transparent, which can be verified via an ABX test. Data files using lossy compression are smaller in size and thus cost less to store and to transmit over the Internet, a crucial consideration for streaming video services such as Netflix and streaming audio services such as Spotify.Emotional effects
A study conducted by the Audio Engineering Library concluded that lossy compression formats such as MP3s have distinct effects on timbral and emotional characteristics, tending to strengthen negative emotional qualities and weaken positive ones. The study further noted that the trumpet is the instrument most affected by compression, while the horn is least.Transparency
When a user acquires a lossily compressed file, the retrieved file can be quite different from the original at the bit level while being indistinguishable to the human ear or eye for most practical purposes. Many compression methods focus on the idiosyncrasies of human physiology, taking into account, for instance, that the human eye can see only certain wavelengths of light. The psychoacoustic model describes how sound can be highly compressed without degrading perceived quality. Flaws caused by lossy compression that are noticeable to the human eye or ear are known as compression artifacts.Compression ratio
The compression ratio of lossy video codecs is nearly always far superior to that of the audio and still-image equivalents.- Video can be compressed immensely with little visible quality loss
- Audio can often be compressed at 10:1 with almost imperceptible loss of quality
- Still images are often lossily compressed at 10:1, as with audio, but the quality loss is more noticeable, especially on closer inspection.
Transcoding and editing
Editing of lossy files
By modifying the compressed data directly without decoding and re-encoding, some editing of lossily compressed files without degradation of quality is possible. Editing which reduces the file size as if it had been compressed to a greater degree, but without more loss than this, is sometimes also possible.JPEG
The primary programs for lossless editing of JPEGs arejpegtran
, and the derived exiftran
, and .These allow the image to be
While unwanted information is destroyed, the quality of the remaining portion is unchanged.
Some other transforms are possible to some extent, such as joining images with the same encoding or pasting images onto existing images, or scaling.
Some changes can be made to the compression without re-encoding:
- optimizing the compression
- converting between progressive and non-progressive encoding.
JPG_TRANSFORM
plugin.Metadata
Metadata, such as ID3 tags, Vorbis comments, or Exif information, can usually be modified or removed without modifying the underlying data.Downsampling/compressed representation scalability
One may wish to downsample or otherwise decrease the resolution of the represented source signal and the quantity of data used for its compressed representation without re-encoding, as in bitrate peeling, but this functionality is not supported in all designs, as not all codecs encode data in a form that allows less important detail to simply be dropped. Some well-known designs that have this capability include JPEG 2000 for still images and H.264/MPEG-4 AVC based Scalable Video Coding for video. Such schemes have also been standardized for older designs as well, such as JPEG images with progressive encoding, and MPEG-2 and MPEG-4 Part 2 video, although those prior schemes had limited success in terms of adoption into real-world common usage. Without this capacity, which is often the case in practice, to produce a representation with lower resolution or lower fidelity than a given one, one needs to start with the original source signal and encode, or start with a compressed representation and then decompress and re-encode it, though the latter tends to cause digital generation loss.Another approach is to encode the original signal at several different bitrates, and then either choose which to use, or broadcast several, where the best that is successfully received is used, as in various implementations of hierarchical modulation. Similar techniques are used in mipmaps, pyramid representations, and more sophisticated scale space methods. Some audio formats feature a combination of a lossy format and a lossless correction which when combined reproduce the original signal; the correction can be stripped, leaving a smaller, lossily compressed, file. Such formats include MPEG-4 SLS, WavPack, OptimFROG DualStream, and DTS-HD Master Audio in lossless.
Methods
Graphics
Image
- Discrete cosine transform
- * JPEG
- * WebP
- * High Efficiency Image Format
- * Better Portable Graphics
- * JPEG XR, a successor of JPEG with support for high dynamic range, wide gamut pixel formats
- Wavelet compression
- * JPEG 2000, JPEG's successor format that uses wavelets
- * DjVu
- * ICER, used by the Mars Rovers, related to JPEG 2000 in its use of wavelets
- * PGF, Progressive Graphics File
- Cartesian Perceptual Compression, also known as CPC
- Fractal compression
- JBIG2
- S3TC texture compression for 3D computer graphics hardware
3D computer graphics
- glTF
Video
- Discrete cosine transform
- *H.261
- *Motion JPEG
- *MPEG-1 Part 2
- *MPEG-2 Part 2
- *MPEG-4 Part 2
- *Advanced Video Coding
- *High Efficiency Video Coding
- *Ogg Theora
- *VC-1
- Wavelet compression
- *Motion JPEG 2000
- *Dirac
- Sorenson video codec
Audio
General
- Modified discrete cosine transform
- *Dolby Digital
- *Adaptive Transform Acoustic Coding
- *MPEG Layer III
- *Advanced Audio Coding
- *Vorbis
- *Windows Media Audio
- *LDAC
- * Opus
- Adaptive differential pulse-code modulation
- *Master Quality Authenticated
- MPEG-1 Audio Layer II
- Musepack
- aptX/ aptX-HD
Speech
- Linear predictive coding
- * Adaptive predictive coding
- * Code-excited linear prediction
- * Algebraic code-excited linear prediction
- * Relaxed code-excited linear prediction
- * Low-delay CELP
- * Adaptive Multi-Rate
- * Codec2
- * Speex
- Modified discrete cosine transform
- * AAC-LD
- * Constrained Energy Lapped Transform
- * Opus
Other data