Lesson Objectives Aims From the spec: Learners should apply their knowledge of: Backtracking Data Mining Heuristics Performance Modelling Pipelining Visualisation Thinking abstractly
This lesson This will take a couple of lessons It is exam question heavy Some questions you should be able to answer from what you’ve done so far – computational thinking
Backtracking An algorithmic approach to problem solving A set of rules have been defined (the algorithm) and these form a path The path is followed, if a rule fails on a piece of data, then the algorithm returns to the last known good point
Data Mining “Big data” is a big problem and massive opportunity at the same time Modern systems enable us to collect and analyse extraordinary amounts of data Usually from multiple sources Great examples are google, facebook and twitter – why?
Data mining can be used to find relationships between seemingly unrelated data Applies to data with dissimilar structures May throw up unexpected correlations
Applications Supermarkets Insurance companies Business Modelling Disease pattern modelling
Methodology The method of searching through seemingly unconnected data Methods are “pattern matching” and “anomaly detection” May include some method of correlation calculation Does not need a pre-determined “matching criteria” – it’s job is to find what matches! Only possible because of modern processing power on high speed machines (a brute force approach)
Exam Questions – Data mining
Heuristics Many problems in computer science are either unsolvable or would take too long to solve in a sensible amount of time. Sat Nav is a classic example – how do you know you’re taking the best, fastest or shortest route without analysing ALL roads?
Heuristics provide a solution by providing a “best fit” or “good enough” answer. This will solve problems much more quickly than an “exact” or brute force Can also find an approximate solution when there is no exact solution available
Heuristics are a trade off: Optimality Completeness Accuracy Precision The outcome of these trade offs are: Speed A solution that is “good enough”
Solving a problem “heuristically” Gather all relevant data, but not necessarily ALL data Data gathered should be the most likely to help in the given situation Make judgements based on rules or previous experiences
Example – Virus scanning Needs to happen quickly Not all viruses are know about! Looks for code or behaviour that is similar to known viruses (classes) Has different rules depending on virus type or family If patterns match or are similar then it infers a virus Can work for new/unidentified viruses
Exam Questions - Heuristics
9 Marks
We’ve seen this in CPU’s Pipelining We’ve seen this in CPU’s The output of one task is the input of another Enables jobs to be queued and also run in parallel to increase productivity or throughput Used in: CPU (RISC especially) Command line systems (Linux Pipe command) Task Scheduling
In real use… To pipeline Identify processes that MUST run in sequence Identify processes that may run in parallel Identify when processes must converge (“obligatory sequence”) What must happen before another process can start? E.g. Underwear must be on before trousers.
Exam Question - Pipelining
Answers
Performance Modelling Modelling or simulating real world objects or situations For the purpose of: Finding out effectiveness Tweaking and changing parameters to observe their effects Safety Experimentation Finding optimal solutions
Simply converting raw data into a visual form. Examples: Visualisation Simply converting raw data into a visual form. Examples: Graphs Heat Maps Mapping (in general) Touch maps (football?) A much more powerful method of interpreting data
Question 2 – Comp. thinking
Answers
Exam Questions - Abstraction
Review/Success Criteria You should know Lots of things.