CDT PROJECTS John Keane, Software Systems Group 1. Data Analytics / Big Data 2. Parallel & Distributed Systems 3. Decision Support Systems HAPPY TO DISCUSS
With Nenadic CHALLENGE Investigate: –Applications: characteristics and predictability –Data Analytic / Machine Learning Algorithms – relatively simple so far –Software: Map-Reduce, Hadoop –Hardware: various platforms Big Data Analytics (IBM funded)
With Nenadic, Zeng, Stivaros (Consultant, RMCH) Adverse drug event detection (EU funded) –Bayesian/Fuzzy association rules algorithms CHALLENGE –Compare/contract accuracy of prediction Clinical Outcome Mining (Christie Hospital) –Data/text-based clinical records – better diagnose and predict CHALLENGE –Illness staging; multi-modal data; changes over time; Decision Support for Radiology (NIHR-funded) –Decision aid to assist better description of scans CHALLENGE –Usability; Integration with existing tools; Link to literature Bio-medical data analytics
Colossal itemsets: - Very high dimensional datasets - Run-time increases exponentially as average row length increases; Minimal unique itemsets (MUI) SUDA: Special Unique Detection - “risky” records, those likely to be linked– 16 years old + widow - Records of most concern have many, small MUIs - SUDA s/w used by ONS, UK; licensed by Singaporean govt; - Algorithm used by UN/World Bank International Household Survey CHALLENGES: Data structure to represent itemsets during search process Search space pruning Algorithm: bottom-up; top-down; hybrid; Parallelism Itemset Mining Algorithms {baby nappies}->{beer}
Eco-service composition (EU funded) with Mehandjiev, MBS Aims to determine conditions for achieving eco-friendly, resilient and optimal service compositions on a distributed cloud infrastructure Two service optimisation approaches deployed: 1. Global: analyses end-to-end interaction between services 2. Local: computes local optimization by creating dynamic service chains between service provider/consumer CHALLENGE Energy-efficient load balance and scheduling
HPC + Finance (EU funded, UK Government) High Frequency Trading –Flash crashes: dramatic sudden drop in share price describe/predict –Working paper: High Frequency Trading and Mini Flash Crashes HPCFinance New models of risk analysis (diverse data integration) Role of HPC in Finance and comparison of technologies Trade-off: accuracy, speed, cost comparison: Cloud; GPGPUs, FPGA (Maxeler box) CHALLENGES: Data engineering; Analytics; Algorithms; High performance;
Preference Elicitation from Pairwise Comparison with Mikhailov, MBS; Siraj, COMSATS IIT, Pakistan Decision making is complex in presence of uncertainty and insufficient knowledge. Aim to estimate preference using pairwise comparison: PC used when unable to assign scores to available options; judgements provided may be inconsistent Work has proposed consistency measures and prioritization measures where revision not allowed. PriEsT tool now has sensitivity analysis -> best solution. CHALLENGES –Evolutionary approach to multi-criteria DSS –Work on preference elicitation model and tool –Group decision making –Bridge PriEsT and R (popular data mining tool) via XMCDA