Data Compression

Hello! This website is dedicated to the world of data compression, specifically how compression methods can reduce storage and speed up processing when storing data. The main reason that technology incorporates compression is due to redundancies – information that is often repeated in a series of data. In some occasions, characters that occur more frequently than others are represented with shorter codes while characters that occur less frequently are represented with longer codes. Compression techniques have the ability to remove those redundancies for easier delivery.

The two types of data compression are lossless compression and lossy compression. Lossless compression can be described as data that is never lost. Lossless compression is necessary in certain situations, like numerical data, textual data and financial documents. Lossy compression, on the other hand, can be described as some data that is lost. Lossy compression is appropriate for image and audio files. Because the human senses cannot notice the most subtle changes, there is a higher tolerance level even when some information disappears.

Both lossy and lossless compression comprise of methods that help send, receive and store data in a small number of bits. For lossless compressions, it uses techniques such as run-length codinghuffman coding and morse code. For lossy compressions, it uses techniques like JPEG, MPEG and MP3 to compress image files. Below, it gives more information on some of the methods in detail.

Huffman Coding

One of the methods used in lossless compressions is huffman coding. The huffman coding procedure finds the variable length code associated with a series of events, given their frequencies. The huffman coding is represented as a binary code tree – the nodes that are connected by branches, the root and and the two branches from each node. The steps include:

  1. Sort characters from smallest frequency to largest frequency (left to right)
  2. Connect the two smallest nodes together
  3. Label left branch 0 and right branch 1
  4. Create Huffman codebook by starting at the root and following the branches that connect to the character

Run-Length Coding

Another method used in lossless compression is run-length coding. When dividing the image into pixels, we evaluate the group of identical pixels, runs, to assign shorter codes to them if the pixel colors were the same. Run-length coding helps replace repeated data by encoding each symbol as a pair. The pair consists of the value of the run and the length of the run.


JPEG Encoding Algorithm

JPEG (Joint Photographic Experts Group) is a common form of image compression in lossy compression. JPEG consists of a group of researchers who have designed the standard industry for image compression. JPEG is able to change the image into a linear set of numbers to expose the redundancies present in the data. The redundancies are then removed by using one of the lossless compression methods.

For more information: