Encoding, Validation and Verification Chapter 1
Introduction This presentation covers the following: – Data encoding – Data validation – Data verification
Data encoding This is a method of changing the way we represent data. We do this to standardise the data we are dealing with. The original data is not stored...only the representation of it.
Data encoding Some codes are easier to work out than others: MONTUEWEDJANFEBMAR For some, you will need a key. VXCORBLA FDFOCGRE VX = VauxhallFD = Ford COR = CorsaFOC = Focus BLA = BlackGRE = Green
Take note: Create your own encoded data with a key so someone else will understand how it works. Looking at someone else’s, are there any limitations with their code? Could it cause any problems or confusion? Explain to that person why you think it is either fine or needs some improvement.
Problems with encoding If you encode data it may become less accurate. You may end up limiting the possible number of data entries. For example, cars come in lots of different colours, but if you limit the choices to Red, Blue, Black, Silver, etc, you prevent the actual colours being entered. – Star Silver and Lightning Silver are different...but encoding may regard them both as silver. – This would be inaccurate and the validity of your data could be questioned.
Problems with encoding Asking questions to people often returns different responses. – “Did you enjoy the race?” – “It was good” – “It was alright...got a bit boring in places” – “Fantastic...I am glad he won”. Responses can be similar but not always the same. This means that we sometimes have to apply a judgement on how best to collect the response. If we had a scale from 1-4 (1=good, 4=rubbish) then where would we put the comments? Again, if more than one person is collecting the data we their judgements be the same?
Problems with encoding Another problem occurs when you come across some data that wont fit in with your encoding system. This means that you have to re-encode your data again which takes time and can also lead to some mistakes being made. If inaccuracies do occur how do you know if that data is incorrect? People might still assume the data is fine which could lead to more problems!
Encoding = Good Stuff! Computers have a limited storage capacity. If you encode data you can reduce the amount of storage space needed. When you are dealing with thousands of records the space saved is huge! Also, it can be quicker to enter coded data. It doesn’t have to be less accurate either. – For example, M = Male, F = Female. A computer can also carry out validation checks on the encoded data to make sure it is valid. – For example, if it is not M or F then there must be a mistake.
Take note: What is meant by encoding data? Describe three advantages of encoding data. Describe three disadvantages of encoding data. Give an example of how data can be encoded. Give two situations where the encoding of data is appropriate. For each situation, explain why data needs to be encoded.
Validation Validating data can be done using the following methods: – Range check – Type check – Presence check – Length check – Lookup check – Picture check – Check digit
Range Check Range is very simple. This involves a lower and an upper boundary for which a value can be entered. For instance, The number 50 would be accepted as it falls within the boundaries, but the number 101 would exceed the boundary and thus be rejected.
Type Check This check prevents incorrect data types to be submitted. For example, entering the word “two” into a field which was expecting a numerical value would return an error as “two” is in text format.
Presence Check You come across these all the time on websites which ask for certain information to be included. The system will insist that you enter these pieces of data before proceeding to the next section.
Length Check Length checks prevent more characters being entered than is allowed. The word “shoe” has a length of 4. If we set the limit to 4 then “shoes” wouldn’t be allowed.
Lookup Check A lookup check takes a value and compares it to a set of values in another table. If a match is made then a result is returned. If no match is made then an error is returned. An example of this would be entering a student’s test score into a field and the system returning the student’s grade.
Picture Check Also known as an Input Mask or Format Check. This type of check ensures data is entered in a predefined way. A good example of this is when dealing with dates. There are many ways to submit a date: – 01/Jan/2008 – 01/01/2008 – 1/1/08 – Etc A Picture check will define how the date must be entered.
Check Digit A check digit is a value which is worked out by performing a calculation on a number and then is added to the end of that number. ISBN numbers have check digits. The ISBN for the text book is: – The check digit is 5. Before 2007, when ISBN numbers had 10 numbers, the check digit was calculated using Modulus-11. New ISBN numbers are calculated using the modulus 10 method.
Modulus-10 ISBN Code ISBN Code Remove the check digit. Then write out the numbers in a table like this. The code starts at 2, and increments by 1, going from right to left. Multiply the number by the code below. Add up all the numbers = 192 Divide the number by /11 = 17 remainder 5 Take the remainder from 11.Check Digit = = 6 If the remainder is 0 the check digit is 0. If the remainder is 1 then the check digit is X.
Modulus-13 ISBN Code ISBN Code Remove the check digit. Then write out the numbers in a table like this. From right to left, alternate the weighting code from 3 and 1. Multiply the number by the code below. Add up all the numbers = 135 Divide the number by /10 = 13 remainder 5 Take the remainder from 10.Check Digit = = 5 If the remainder is 0 the check digit is 0. If the remainder is 1 then the check digit is X.
Take note: In a spreadsheet, try creating a working Check Digit Checker. The spreadsheet should be able to calculate a check digit using the ISBN number and then compare the result with the actual check digit. It should say whether it is valid or not. To work out a remainder use the =MOD() function.
Take note: Use modulus-11 on these ISBN numbers. For numbers with incorrect digits replace them with correct ones. – – X – – X –
Verification Verification is not making sure that data is correct, but rather making sure data hasn’t been changed in any way. There are two ways of carrying out verification checks: – Double Entry – Manual verification
Double Entry Basically, entering in data twice. For example, some websites ask you to type in your address twice. This lowers the risk of entering in an address incorrectly. If the s do not match the website will ask you to check them. However, if you enter the address incorrectly both times and make the same mistake, then the website will miss the mistake!
Manual verification This is like proof reading. A person may read data from a paper source and then type them into a computer system. Humans aren’t very reliable and often make mistakes. Common mistakes include: – Transcription errors – Transposition errors
Transcription Errors This may involve pressing the wrong key accidently. For example, – Surname: MouseMowse or Mouce
Transposition Errors This is where two characters have been accidently reversed. For example: – Surname: MouseMuose or Moues
Accuracy Just because we have use of validation and verification checks doesn’t mean data is accurate. For example, a number entered could still pass a range check, or a presence check can be validated because someone pressed the space bar in the field.
Take note: Describe two methods of verification. Give two disadvantages of double entry verification. Give one advantage of manual verification. Explain why verification and validation can not ensure that data is entered accurately but do explain why they are useful despite these problems.