Fundamentals of Data Representation

Fundamentals of Data Representation
Encoding & Error Correcting

Encoding Computers can only store binary
1’s and 0’s They can’t store anything else So how can pictures, sounds, video, and more be stored? We use encoding Encoding: Converting information into a particular form Fundamentals of Data Representation: Encoding & Error Correcting

Fundamentals of Data Representation: Encoding & Error Correcting
To turn binary into other things, we use encoding schemes For example, a certain pattern of bits could represent a character Scientists come up with these encoding schemes And then program them into computers We’re looking at two encoding schemes for characters (text) ASCII Unicode Fundamentals of Data Representation: Encoding & Error Correcting

Encoding Characters: ASCII
ASCII stands for the American Standard Code for Information Interchange It’s a table that shows what binary numbers represent what characters This system turns a single byte into a character Note that characters include letters, numbers, symbols, and device codes (like new-line and delete). Fundamentals of Data Representation: Encoding & Error Correcting

As it uses a single byte Values can range from 0 to 255 In standard ASCII Values stop at 127 Most-significant bit is used for error checking In extended ASCII Values stop at 255 Most-significant bit used for more characters Fundamentals of Data Representation: Encoding & Error Correcting

These characters are stored as binary numbers That means we can perform the usual arithmetic on them E.g. add ‘a’ and ‘b’ A good example Subtracting two characters to find the ‘distance’ between them The “find the distance between them” part is in reference to the Caesar Cipher (where a shift is used to encrypt characters). Fundamentals of Data Representation: Encoding & Error Correcting

Encoding Characters: Unicode
Unicode works in much the same way Use binary to represent characters One big difference Uses two bytes instead of one Allows −1 different characters 65,536 Unicode has lots of different implementations: UTF-8 UCS-2 UTF-16 Most documents use UTF-8 by default (which is similar to ASCII, but allows for non-Latin characters too) ASCII was made to handle purely English characters and symbols. Other languages require non-Latin characters and symbols (like Mandarin or accents on characters), which can not be represented using ASCII. So, Unicode was made. Fundamentals of Data Representation: Encoding & Error Correcting

State what encoding scheme (ASCII or Unicode) a computer would use for the following characters T 1 - ∑ “New Line” Ã  # ANSWERS From top-left to bottom-right: ASCII ASCII ASCII Unicode ASCII Unicode Unicode ASCII Fundamentals of Data Representation: Encoding & Error Correcting

Error Checking There’s one big problem with storing/sending data Sometimes the computer gets it wrong This could be because of Faulty hardware A break in a line between two computers A ‘corruption’ of the data in transit In each case, an error could be introduced to the data Example We want to send the letter ‘T’ In ASCII, this is 8410 or In transit, this turns into a ‘U’! 8510 or The least-significant bit changed! This is an error Fundamentals of Data Representation: Encoding & Error Correcting

Error Checking We’ve come up with ways of checking for errors Not fixing them The methods we’ll be looking at are Parity bit Checksums Check Digit Some errors can be corrected We’ll look at one way: Majority Voting What does parity mean? Usually refers to the condition of equality in a state It is also the fact of a number being even or odd (i.e. the parity of a number) This is how we’ll look at it for this course On the parity section, we mean something like “the parity of this number is odd”. Fundamentals of Data Representation: Encoding & Error Correcting

Parity Bit Parity bits are single bits added to data Usually at most-significant bit For example, could be 8th bit in a byte Like normal bits, they only store 0 or 1 However, the 0 and 1 mean something It depends on the scheme being used Even parity: the data has to have an even number of 1’s Odd parity: the data has to have an odd number of 1’s Fundamentals of Data Representation: Encoding & Error Correcting

Parity Bit Odd Parity Examples We are sending This already has an odd number of 1’s. So the parity is set to 0: We are sending This doesn’t have an off number of 1’s. So the parity is set to 1: Even Parity Examples We are sending This doesn’t have an even number of 1’s. So the parity is set to 1: We are sending This already has an even number of 1’s. So the parity is set to 0: In both parity systems, we aim to set the data to have an odd number of 1’s, or an even number of 1’s: An even parity needs an even number of 1’s after adding the parity bit. An odd parity needs an odd number of 1’s after adding the parity bit. The reason for using this scheme is that we can tell if a bit has changed on the receiving end of the data. If using the even parity system, and the receiver sees the data has an odd number of 1’s, then something has happened to the data during transit. Fundamentals of Data Representation: Encoding & Error Correcting

For the given parity scheme, add the correct parity bit to the data Even Odd ANSWERS From top-left to bottom-right: Fundamentals of Data Representation: Encoding & Error Correcting

Checksum There are two problems with using a parity bit It doesn’t catch the following errors: A 1 being moved to a 0 An even number of changes (0 to 1, or 1 to 0) We can make this better by using 2D checking Checksum is a way of doing that Fundamentals of Data Representation: Encoding & Error Correcting

Checksum Checksums still follow parity rules Try and end up with an even/odd number of 1’s However, this is done across two dimensions Works on data that can be split into a ‘block’ of equal-numbered bits Example We are sending the message “HELLO”. Using ASCII, these characters are stored as: We can write that in a ‘block’, like so: 1 The large binary number was found from finding the binary value of the different characters (in ASCII), and writing them from left-to-right. Fundamentals of Data Representation: Encoding & Error Correcting

Checksum We do the same as before Add a parity bit for each word But we do it for each row and column! Example We are going to use an even parity. So we write 1 or 0 along each row/column to create a line with an even number of 1’s: 1 The large binary number was found from finding the binary value of the different characters (in ASCII), and writing them from left-to-right. Fundamentals of Data Representation: Encoding & Error Correcting

Checksum We finish by making sure the parity for both the parity row and column is the same The sender will send the message and the parity row/column Can now detect the errors that single parity bits cannot Example And do the same for the parity row and column. The should end up at the same thing (which they do here): 1 The large binary number was found from finding the binary value of the different characters (in ASCII), and writing them from left-to-right. As far as the ‘detecting the errors that parity bits cannot’ part, its to do with the fact that a change in a bit will affect its row and its column now. Even if an even number of changes happen on a single row, two columns will have a single change in them, which will be detected. Fundamentals of Data Representation: Encoding & Error Correcting

For an odd parity scheme, calculate the checksum for the word “TWO” This word uses the ASCII scheme for encoding Write your checksum in a table (like the previous example) ANSWER 0 | 0 | 0 | 0 | Fundamentals of Data Representation: Encoding & Error Correcting

Check Digit The final error detection technique is check digit This technique works best with numerical data It performs arithmetic on digits in data, eventually reducing down to a single digit This digit is appended on to the data There are different schemes of Check Digit, with examples including: UPC (Universal Product Codes) ISBN (International Standard Book Number) Fundamentals of Data Representation: Encoding & Error Correcting

Check Digit We can use any scheme we want We’ll use simple addition as an example to start with Example A product has the an ID number of 2534. It needs a check digit. We can do that by adding together all of the digits: = 14 We then do the same with the result (adding the digits) and repeat this process until we have a single digit: 1 + 4 = 5 So, 2534 has a check digit of 5 (which we can add to the end of the ID): 25345 Fundamentals of Data Representation: Encoding & Error Correcting

Using the simple addition scheme, calculate the check digit for the following ID’s 1678 5987 0340 8617 ANSWERS From top-left to bottom-right: Fundamentals of Data Representation: Encoding & Error Correcting

Check Digit You may have noticed two of the ID’s in the exercise had the same check digit: 1678 and 8617 That makes simple addition a bad choice for a check digit scheme Let’s look at a different one – modulus 11 Fundamentals of Data Representation: Encoding & Error Correcting

Check Digit The modulus 11 technique runs fairly similarly to before But with a bit more setup We assign each digit a value From least-significant to most-significant Values go from 2 to 7 When 7 is reached, next digit goes back to 2 Example Employee has ID 2534 We can apply values like so: Fundamentals of Data Representation: Encoding & Error Correcting

Check Digit We then multiply each digit by its value Sum the results
And divide the sum by 11 We get a remainder from this Example We have: From here, we have: (2 x 5) + (5 x 4) + (3 x 3) + (4 x 2) = = 47 The remainder of 47 when divided by 11 is 3 Fundamentals of Data Representation: Encoding & Error Correcting

Check Digit We take that remainder and subtract it from 11 This gives us a single digit This is the check digit Two notes about this If the remainder is 0, the check digit is 0 If the remainder is 1, we can’t use that as a valid piece of data Example The remainder of 47 when divided by 11 is 3 So the check digit is 11 – 3 = 8 That means we would need to include 8 somewhere in the ID (or sent data). Something like this: Fundamentals of Data Representation: Encoding & Error Correcting

Using the modulus 11 scheme, calculate the check digit for the following ID’s 1678 5987 0340 8617 ANSWERS From top-left to bottom-right: Fundamentals of Data Representation: Encoding & Error Correcting

Majority Voting The last thing to look at is how we can correct errors Everything we’ve seen so far lets us detect them Parity Bit Checksum Check digit There is a way to correct errors So the received data is the same as the sent data Fundamentals of Data Representation: Encoding & Error Correcting

Majority Voting That’s by using majority voting In this scheme, we send the data out repeatedly For each bit of data, we may send it 3, 5, 7 etc. times Any odd amount Makes sure there is a clear answer when voting We then look at the group of bits And decide what the value is based on the majority Fundamentals of Data Representation: Encoding & Error Correcting

Majority Voting The result is the corrected data Without any of the repeats Example We are trying to send the ASCII word ‘HI’, which is We’re going to use majority voting to send this over. Start by repeating each bit 3 (or any larger odd number) times: This data is then sent over to another device. In transit, some of the bits change (from 0 to 1 and vice versa). The receiving device see this: Which compresses (after doing the majority vote) to this:  ‘AI’ Fundamentals of Data Representation: Encoding & Error Correcting

Fundamentals of Data Representation

Similar presentations

Presentation on theme: "Fundamentals of Data Representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fundamentals of Data Representation

Similar presentations

Presentation on theme: "Fundamentals of Data Representation"— Presentation transcript:

Similar presentations

About project

Feedback