Rob Gleasure R.Gleasure@ucc.ie www.robgleasure.com IS6125/IS6145 Database Analysis and Design Lecture 1: Introduction to IS6125/IS6145 and the changing nature of data Rob Gleasure R.Gleasure@ucc.ie www.robgleasure.com
IS6145 Module objective To provide students with the concepts and skills required to analyse organisational activities, information flows and to subsequently create the data models required to support these activities Topics covered include Requirements analysis Introduction to datafication and emerging data capabilities Data modelling (ERDs and normalisation)
IS6145 Learning objective Analyse organisational activities to identify key data requirements Generate ERDs to identify data sources and their relationships Employ normalisation processes to assist in meeting the data integrity requirements Identify and strategize around semi-structured and unstructured data capabilities
IS6145 Course Assessment Continuous assessment: 30 marks In-class exam – 20 marks Group report – 30 marks Exam: 50 marks Skills to add to your LinkedIn (or equivalent) Data modelling Entity Relationship Diagrams (ERDs) Data Strategy
Course structure Or more specifically Week 1: Introduction Week 2: Foundational Concepts of Data Modelling Week 3: ER Modelling and Beyond the Presentation Layer Week 4: Fine-Granular Design-Specific ER Modelling Week 5: Enhanced Entity-Relationship (EER) Modelling Week 6: Practice with ERDs Week 7: In-Class Data Modelling Exam Week 8: Data Normalisation Week 9: NoSQL and Hadoop Week 10: Blockchain Week 11: The Data Value Map Week 12: Revision
Data a few years ago Image from http://www.hotcleaner.com/web_storage.html
Data center in the cloud Web is overtaking/has overtaken desktop Mobile is replacing local Utility-based computing is replacing once-off purchase Makes resources seem endless Lowers risk in terms of usage (pay as you go) Demand Capacity Time Resources Demand Capacity Time Resources Static data center Data center in the cloud Slide Credits: Berkeley RAD Lab
The Cloud The ‘Internet of things’ was born in about 2009 More (far more) devices connected to the Web than people… Image from http://computinged.com/edge/become-part-of-the-cloud-computing-revolution/
The Cloud This has meaningful implications for data in terms of Capacity Measurement Integration Security Privacy
Big data All of this interaction with one linked information system means vast quantities of data can be captured throughout user interaction, often in real-time ‘big data’ The idea is that the vast amounts of interaction data allow for systems that are nuanced and responsive in ways that were previously not possible Also a realisation that, if it can be analysed, this data is a huge commodity, meaning new business models are possible Firms like Google, eBay, LinkedIn, Facebook, etc. are based on the principles of big data
3 Vs of Big data: Volume Volume Facebook generates 10TB of new data daily, Twitter 7TB A Boeing 737 generates 240 terabytes of flight data during a flight from one side of the US to the other We can use all of this data to tell us something, if we know the right questions to ask
All available information Analyze small subsets of data 3 Vs of Big data: Volume From http://www.slideshare.net/ibmcanada/big-dataturning-data-into-insights?qid=0b4c69bc-3db2-4e12-ae47-a362a25752eb&v=qf1&b=&from_search=3 Traditional Approach Big Data Approach Analyzed information All available information analyzed All available information Analyze small subsets of data Analyze all data
3 Vs of Big data: Velocity Clickstreams and asynchronous data transfer can capture what millions of users are doing right now Make a change, then watch the response. No guesswork required up front as to what to gather, we can induce the interesting stuff as we see it
3 Vs of Big data: Velocity From http://www.slideshare.net/ibmcanada/big-dataturning-data-into-insights?qid=0b4c69bc-3db2-4e12-ae47-a362a25752eb&v=qf1&b=&from_search=3 Traditional Approach Big Data Approach Hypothesis Question Data Exploration Answer Data Insight Correlation Start with hypothesis and test against selected data Explore all data and identify correlations
3 Vs of Big data: Variety Variety Move from structured data to unstructured data, including image recognition, text mining, etc. Gathered from users, applications, systems, sensors Increasingly comprehensive data view of our ecosystem The Internet of Things
The Internet of Things From http://www.pcworld.com/article/2039413/new-intel-ceo-creates-mysterious-new-devices-division.html
The Internet of Things RFID sensors, bluetooth, microprocessors, wifi all becoming easier to embed in ‘dumb’ devices Move to mobile also means more data streaming from us at all times, e.g. location, call activity, net use
The Internet of Things Smart homes/smart cities Temperature, lighting, food stocks, energy, security Smart cars Diagnostics, traffic suggestions, sensors, self-driving Smart healthcare Worn and intravenous computing detects issues early and monitors care outcomes remotely Smart factories, farms Machines coordinated efficiently, linked dynamically to consumption models
Big data Success stories Books Barnes and Noble: Discovered that readers often quit nonfiction books less than halfway through. Introduced highly successful new series of short books on topical themes Amazon: originally used a panel of expert reviewers for books. Data surplus allowed them to create increasingly predictive recommendations. Panel has since been disbanded and 1/3 of sales are now driven by the recommender system
Big data and the Internet of Things Success stories (continued) Transport Flyontime.us: used historical weather and flight delay information to predict likelihood of flights get delayed Farecast: looked at ticket prices for specific flights based on historical data, then advised users to buy or wait according to predicted fare costing trajectory UPS: Uses a range of traffic data to calculate most efficient time/fuel efficient routes according to complex algorithm
Big data and the Internet of Things Famous success stories (continued) Healthcare Modernizing Medicine EMA dermatology system https://www.youtube.com/watch?v=jMGaGtK9nzU
Big data and the Internet of Things Famous success stories (continued) Social media Google (data for information relevance) Twitter (c.f. #RescuePH) Facebook (social data)
Big data and privacy Morey et al. argue people create roughly three types of data of increasing sensitivity Self-reported data Digital exhaust Profiling data Due to the growth in biotechnologies and sensors, there’s an argument that ‘profiling data’ could be further broken down to differentiate between ‘digital profile’ data and ‘biometric data’
Data beneficiaries Companies may then use the data in three different ways Improve product or service Facilitate targeted marketing Sell data to third parties Google search is an example of a digital business that combines all of these Your search behaviour becomes customised Ads are placed in front of you according to your history and location Click-through behaviour and user overviews are provided to third parties
Issues with big data Google Flu Trends Life imitating data, imitating life? No one is really average height Your Xbox knows you like that Katy Perry song Also, Target called to say your teenage daughter is pregnant. Icecream sales and shark attacks…
Icecream sales and shark attacks continued (correlation, not causation) From http://xkcd.com/552/
Correlation, not causation (continued) From http://www.tylervigen.com/spurious-correlations
Descriptive vs prescriptive analysis Image from http://timoelliott.com/blog/2013/02/gartnerbi-emea-2013-part-1-analytics-moves-to-the-core.html
Target’s family monitoring continued
Assignment 1 UCC has established a business school in the last couple of years. This business school is now seeking accreditation from the Association to Advance Collegiate Schools of Business (AACSB). Among other things, this requires data is gathered to quantify What academic staff actually do How those activities relate to the quality of research How those activities relate to the quality of teaching How those activities relate to the quality of other responsibilities Working in groups, you will each create a report that provides A data model (ERD) to capture and relate activities A data strategy for capturing and analysing important data
Readings Mayer-Schönberger, V. and Cukier, K. (2014). Big Data: A Revolution That Will Transform How We Live, Work, and Think, John Murray Publishers, UK. Morey, T., Forbath, T. T., & Schoop, A. (2015). Customer Data: Designing for Transparency and Trust. Harvard Business Review, 93(5), 96-105. http://nextcity.org/daily/entry/rescuers-use-social-media-twitter-to-find-disaster-victims On Modernizing Medicine https://www.modmed.com/ On Spotify http://www.bigdata-startups.com/BigData-startup/big-data-enabled-spotify-change-music-industry/#!prettyPhoto