Programming Techniques :: Defensive Design jamie@drfrostmaths.com www.drfrostmaths.com @DrFrostMaths Last modified: 4th July 2019
www.drfrostmaths.com ? Everything is completely free. Why not register? Registering on the DrFrostMaths platform allows you to save all the code and progress in the various Computer Science mini-tasks. It also gives you access to the maths platform allowing you to practise GCSE and A Level questions from Edexcel, OCR and AQA. With Computer Science questions by: Your code on any mini-tasks will be preserved. Note: The Tiffin/DFM Computer Science course uses JavaScript as its core language. Most code examples are therefore in JavaScript. Using these slides: Green question boxes can be clicked while in Presentation mode to reveal. Slides are intentionally designed to double up as revision notes for students, while being optimised for classroom usage. The Mini-Tasks on the DFM platform are purposely ordered to correspond to these slides, giving your flexibility over your lesson structure. ?
What problems might occur in our program? Suggest problems that might occur either in the development of code or problems when the code is running. Dodgy data input from the user, e.g. Unexpected characters. Data not in the expected format. Inserting JavaScript code that then subsequently runs on the page! Hijacking SQL queries (we’ll see this in a sec) Badly structured/unreadable code. Fellow programmers unable to maintain your code. Lack of ‘commenting’. Repeated code, meaning that changes to one instance might not be replicated in the other. Runtime errors. Errors can potentially bring down the entire system! (and subsequently require restarting) Often a result of an unexpected situation. Defensive design therefore in general is attempting to: Avoid users unintentionally or intentionally exploiting your system. Keeping code well-maintained. Minimising bugs in your code.
Input Validation vs Sanitisation Input Santisation: Cleaning up the data to remove any unwanted characters. Often we need to check whether the input from a user is what we expect. Input Validation: Checking the data meets a certain criteria. Data Sanitisation: I remove any HTML students attempt to insert into their name. This is the student registration page for DrFrostMaths Data Validation: First name and surname must be entered. Data Validation: Email must be in a valid format, e.g. have characters before and after a @ and at least one dot after @. Data Sanitisation: I also get rid of any space characters before or after the name, as this otherwise causes problems later when trying to search for that person. Data Validation: The two passwords must match.
Types of Input Validation Range Check e.g. Between 1 and 100 Presence Check Was the value entered? Check digit (We will see this when we cover ‘bits’) Format check e.g. A valid email, date. Lookup- table One of a restricted set of values. Length check e.g. Between 5 and 15 characters. ? ? ? ? ? ? Note to be safe, data should be checked both on the client end (where the data is entered) and the server end (where the data might be entered into a database). Without the latter check, it’s possible a ‘rogue client’ might be able to input invalid data directly to the server end, bypassing your client-end validation checks.
The importance of sanitising database inputs Source: www.xkcd.com How could this break your system? Suppose your user entered their name and you wanted to insert this as a row in your “Students” table, but appending strings together: “INSERT TO Students VALUES (‘“+name+”’)” But suppose the student entered their name as “Robert'); DROP TABLE Students; --” The resulting SQL query would be: “INSERT INTO Students VALUES (‘Robert’); DROP TABLE Students; --” This dodgy value finished the INSERT query, then deletes the entire Students table. The -- is to comment out any SQL that might have appeared after. The solution is to use SQL injection. We instead use “INSERT INTO Students VALUES (?)”, replacing our values with ?, and then use proper database library methods to ‘inject’ the query with our values.
The importance of sanitising web comments When user input is displayed on a webpage, a common hijacking technique is to put JavaScript code in your comment. This means when your comment gets displayed on the page, the JavaScript code runs! A simple solution is to use stripHTML(str) which removes any HTML and JavaScript from the input. But while I had anticipated this for the comment itself, I forgot to strip it from the commenter’s name… One commenter used: <script>document.body.innerHTML = "";</script> as their name on a Edexcel GCSE Predicted Paper resource. This completely blanked out the page, preventing anyone accessing the resource until I spotted it an hour later!
Input Validation in JavaScript How might we do each of the following checks in JavaScript, on an input x? Was a value entered? var x = prompt(“Enter a value”); if(!x)alert(“No input entered”); if([“bob”,”mike”,”dave”].indexOf(x)==-1)alert(“…”); var n = Number(x); if(n<100 || n>200)alert(“Number not in range 100-200”); if(x.length > 10)alert(“Value too long”); Note that empty strings (“”), when cast to a Boolean, give a value of false. ? Was “bob”, “mike” or “dave” entered? ? Was number in range 100 to 200? ? ? Was length of the input at most 10 characters?
Harder Ones :: Input Validation in JavaScript Here’s some further ones you would not be expected to reproduce in an exam! Was value a valid date format? var x = prompt(“Enter a value”); if(!/dd\/dd\/dddd/.test(x))alert(“Invalid date format”); var re = /^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/; if(!re.test(x))alert(“Invalid email”); ? Regular expressions are an advanced way of matching strings which fit a specified pattern. In Javascript we put these between forward slashes: / … / d means ‘any single digit’. We want dd/dd/dddd. But because “/” denotes the end of the regular expression, we need to ‘escape’ the /s by adding a backslash on front of them. This tells the code that the character immediately after is a symbol we want to use rather than a special character. The test function sees whether x matches the regular expression. Valid email address format? ? This is a much more complicated regular expression. It is a standard pattern that can be easily found on the internet.
Input Sanitisation in JavaScript And how about cleaning up data by removing unwanted characters? var x = prompt(“Enter a value”); x = x.trim(); var filename = prompt(“Enter a filename to save as.”); filename = filename.replace(“@”,””) ? Remove any whitespace characters (e.g. spaces) from the ends of the string (e.g. so “Jamie ” becomes “Jamie”) ? Filenames are not allowed to contains an @ symbol so remove.
Authentication Authentication is the process of confirming the identity of the user. This is typically done with usernames and an associated password. Passwords on DrFrostMaths are not stored in their original form, but in encrypted form, just in case, hypothetically, anyone were to break into the database. One population encryption function is md5, which puts the string into a coded form. It’s intended to be a one-way process, so that it’s difficult to recover the original string. For example: md5(“Pythagoras”) “d9af1fd83c9a1c30a7cc38c59acb31d7” Try it here: www.md5online.org When someone logs in, the input password is encrypted in the same way and checked against encrypted password in database. Password from registration md5 Equal? Password when logging in md5
Authentication It’s possible to password protect online directories, such that the user is directly prompted for a username and password by the browser rather than through a webpage. There are few sensitive pages on DFM which use such authentication. To set this up just put a file with the filename “.htaccess” within a directory on your web server. There are various online tutorials which outline what to put in this file.
#2 Student then waited a few days in the hope I’d approve the account. Verifying an email If a user supplies an email address when they register, there’s generally two options: Require that the user clicks an ‘activation URL’ within an automated email in order to make their account active. Because the activation URL contains a passcode that is only communicated via email, it ensures that the email address registered with belongs to the user. Many sites allow you to start using the site straight away after registering, as users often want to be able to access a service immediately without requiring further activation (i.e. authentication can be a deterrent). Sometimes functionality is limited until their account is activated, even if they’re already logged in. An early security flaw in the DrFrostMaths registration process for teachers is that accounts approved (by myself) didn’t subsequently have to be activated my email. A student then exploited the process as follows: #1 Student registered as a teacher, using a made up teacher-looking email address. #2 Student then waited a few days in the hope I’d approve the account. #3 Student then able to log in as a teacher at their school, despite their email address not existing!
Writing Nice Code Most of the code on DrFrostMaths is largely uncommented, because I’m the only person who has access it! But were I to collaborate with anyone else, it would be essential to make my code consistently clear what it is doing to other coders, and prevents them from accidentally doing something the shouldn’t. Can be used to explain functions, blocks of code, the purpose of variables/constants or even what individual lines of code do. Use constants when you don’t want another coder to accidentally change its value somewhere in the code. Code comments Constants vs Variables // Mario’s current position. var pos = {x: 30, y: 50 } Strategies Correct scope of variables while(x > 4) { if(x%2==0) { x = x / 2; } else { x = x + 1; } Indentation Recall that scope is the parts of your program where a declared variable can be used. You shouldn’t for example make a variable ‘global’ unless you need to make it accessible in every part of your program. Makes the flow of your program clear by using an appropriate amount of spacing at the start of each line. Helpful naming of functions and variables e.g. isPrime for a function name makes clear it returns a Boolean value.
Criticise Dr Frost’s Code Here’s some code I adapted for my DFM algebra libraries. I’ll let you work out what it might do! But identify ways in which my code is helpful or unhelpful to other coders… You might be able to tell from the function name that it converts floats/reals to fractions, e.g. 0.4⇒ 2 5 , but a comment before this line explaining its purpose, and what the inputs are, would have been helpful; why the ‘tolerance’? What on earth are these variables?! Unhelpful variable names! For loops it would be helpful to describe overall what happens on each iteration of the loop. The code uses something called continued fractions (I have a poster on this here: https://www.drfrostmaths.com/resource.php?rid=293 ) We should make clear, via commenting, what we expect the function to output.
Coding Mini-Tasks Return to the DrFrostMaths site to complete the various mini-coding tasks on defensive design.