Unit 1.3.1 Compression, Encryption and Hashing
Compression
[edit | edit source]Lossy Compression
[edit | edit source]Lossy compression reduces the size of a file by removing certain data. The original file cannot be recovered from the compressed version as the removed data is lost. It is typically used for sound and video files, where the change will not necessarily be noticed. For example, in a sound file the compressor could clip off high inaudible frequencies, or a video/image file could have is resolution lowered or its colour range reduced.
Lossless Compression
[edit | edit source]Lossless compression shortens redundant, repeated data present in files to reduce their size. The original file is retained and can always be restored exactly
Explain how lossy and lossless compression is used in data transmission in a peer-to-peer network. [5]
Answer:
Allow any valid explanations of lossy (3) and lossless compression (2), With examples of their use in peer-to-peer networks to reduce file sizes and data transmission times Allow I mark for each point with valid example. (2)
There are two main methods:
Dictionary Encoding
[edit | edit source]The best known dictionary encoder is the Lemple-Ziv-Welch (LZW) encoder. It works by examining the file and creating a dictionary, for example:
"Tom Jones survived the jungle. Tom is in the jungle."
Each word in the sentence can be given a number:
Word | ID Number |
---|---|
Tom | 1 |
Jones | 2 |
Survived | 3 |
The | 4 |
Jungle | 5 |
Is | 6 |
In | 7 |
This long sentence can then be converted to: "1234516745" Where each number corresponds to a word in the sentence.
Run Length Encoding
[edit | edit source]This form of encoding replaces large runs of the same pieces of data with a single data value/quantity. For example, if there are 100 red pixels in a row in an image, this could be encoded as R100. The technique is found in TIFF and BMP files and works best if the data is in long sequences.
Lossless compression is not often as effective as lossy compression when trying to reduce file sizes. Lossless is the only option for files such as text files or executable programs, where any loss of data would destroy their functionality.
Encryption
[edit | edit source]Encryption is the process of scrambling data in such a way that only the intended recipient can decrypt it. There are two main types, symmetric and asymmetric:
Symmetric Encryption
[edit | edit source]In symmetric encryption, the same key used to encrypt the message is used to decrypt it. This requires both the sender and recipient to store the key safely. It is very quick to encode and decode, however it is easier to crack as if the key is shared it is likely to be easier to obtain.
Asymmetric Encryption
[edit | edit source]This method uses two separate keys - one key encrypts the message while the other decrypts it. A single public key is known by both, but when the encryption algorithm is implemented a second compatible private key is also used. To decrypt the message, the public key is applied alongside the private key again, which allows the message to be revealed. This approach is far safer, however does require more processing power. The keys used are typically very large random numbers, which are unlikely to be guessed.
Asymmetric encryption uses the principle that some mathematical operations are relatively easy to perform, however very difficult to undo. Examples of this are finding the modulus of an exponential and multiplying together two prime numbers.
Hashing Algorithms
[edit | edit source]These algorithms have two uses - they can provide a quick way to generate memory addresses for storing records in databases, and also used to store and check passwords. They are commonly used in network and online transactions. Hashing algorithms are one-way functions so it is very easy to convert a plain-text value into a hash but very difficult to convert a hash back to plain-text. This makes it useful for storing passwords, but not encrypting data.
Once a user enters a password, it is hashed and stored on a server. Each time after this, when the user attempts to login, the hashing algorithm is applied to the entered password and the result is compared to the stored hash. If the hashes are the same the correct password has been entered, otherwise the login is rejected. The hashes cannot easily be converted back to their original passwords, but brute-force attack techniques could be employed by hackers in an attempt to guess the original password.
// e.g. User A enters: password123
sha256($password); // SHA-256 is a one-way hashing function
// ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
To secure the password even further, a salt can be added during the hash. A salt is a random string that is appended to the password before hashing when it is first stored. This string is then stored so that when the user enters their password again, the same salt can be appended to the password and the check is performed similarly to above. This is advantageous because many people use common passwords like 'password' and by adding a salt, these will not all end up with the same hash. If someone were to gain access to your hashes, seeing that many were the same would narrow down the passwords they'd have to try for those users to just common passwords, allowing them to brute force their way past the password.
// e.g. User B enters: password123
// Random salt: wuphAT=5renE
sha256(sha256($password).$salt);
// a3a0f0c8ce9208fa3d122778b437e0287c70fd829ad07f6672143d114cb244c2