David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

Slides:



Advertisements
Similar presentations
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Advertisements

C SC 421: Artificial Intelligence …or Computational Intelligence Alex Thomo
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Data Mining By Archana Ketkar.
Chapter 15 Data Warehousing, OLAP, and Data Mining
Data Mining Adrian Tuhtan CS157A Section1.
Chapter 14 The Second Component: The Database.
Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
BUS1MIS Management Information Systems Semester 1, 2012 Week 6 Lecture 1.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
CISB113 Fundamentals of Information Systems Data Management.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Supplemental Chapter: Business Intelligence Information Systems Development.
Data Mining.
Data Mining – Intro.
Presented by Khawar Shakeel
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Market Research.
Data Mining 101 with Scikit-Learn
Data and Applications Security Introduction to Data Mining
ACS1803 Lecture Outline 2   DATA MANAGEMENT CONCEPTS Text, Ch. 3
Data Mining (and machine learning)
Adrian Tuhtan CS157A Section1
MIS5101: Data Analytics Advanced Analytics - Introduction
Market Research.
Advanced Embodiment Design 26 March 2015
Data Analysis.
GCSE Computing Databases.
CSE591: Data Mining by H. Liu
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Supporting End-User Access
Course Introduction CSC 576: Data Mining.
Integrating Deep Learning with Cyber Forensics
Data Warehousing Data Mining Privacy
Databases This topic looks at the basic concept of a database, the key features and benefits of a Database Management System (DBMS) and the basic theory.
Kenneth C. Laudon & Jane P. Laudon
Welcome! Knowledge Discovery and Data Mining
CSE591: Data Mining by H. Liu
Presentation transcript:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining and Machine Learning DM Lecture 1: Overview of DM, and overview of the DM part of the DM&ML module

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Overview of My Lectures Bookmark this: Lecture 1: about data and data mining; Lectures 2 and 3: Basic and useful ways to process and understand data Lectures 4, 5, 6, 7, 8 Details of useful algorithms for finding knowledge from data; Lecture 9: overview of what else there is.

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Module assessment 100% by coursework Three main items of coursework, 40% (DM), 40% (DM), 20% (ML) Two small items of coursework (A and B), worth nothing, but if you don’t do them adequately you fail the module.

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: This Semester DWC lectures on Mondays(data mining) PC lectures on Fridays (machine learning) Thursday slot usually unused – we may use it, and will let you know in advance

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Coursework submission ALL coursework must be submitted as follows as PDF by to the c/w is an attachment Subject line: DMML Coursework A –(… or B, 1, 2, 3) Body of the includes your Name and your Course (e.g. Joe Smith, BSc CS – Jill Brown, MSc AI)

DWC lectures and c/w, key dates weekdateMonday 12:15 EM183Thursday 13:15 EM183Friday 12:15 EM183coursework 1 w/b Mon 12th SepDavid PaulDC Coursework A handout 2 w/b Mon 19th Sep Paul 3 w/b Mon 26th SepDavid PaulDC Coursework B handout 4 w/b Mon 3th OctDavid PaulDC Coursework 1 handout 5 w/b Mon 10th Oct David Paul DC Courseworks A and B handin PC Coursework 2 handout? 6 w/b Mon 17th Oct 7 w/b Mon 24th Oct PaulDC Coursework 1 handin 8 w/b Mon 31st OctDavid PaulDC Coursework 3 handout 9 w/b Mon 7th NovDavid Paul PC Coursework 2 handin 10 w/b Mon 14thNovDavid 11 w/b Mon 21st NovDavid 12 w/b Mon 28th Nov David DC Coursework 3 handin

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: At last, the lecture

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: What some people think can be done with data Answer simple questions like: How many female clients do we have? How much paint did we sell in 2007? Which is the most profitable branch of our supermarket? Which postcodes suffered the most dropped calls in July?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: that is so

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: that is so Boring

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: More interesting things that can be done with data Answer difficult and valuable questions like: How can we predict Ovarian cancer early enough to treat it successfully? How can I make significant profit on the stock market next month? Two different authors claim to have written this story – how can we resolve the dispute? How can we get our customers to spend more money in the store? Is this loan applicant a good credit risk? Is this sonar image a mine, or a rock? What other websites will this browser be interested in?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining - Definition & Goal Definition – Data Mining is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules Goal – To permit some other goal to be achieved or performance to be improved through a better understanding of the data

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Some examples of large databases Retail basket data: much commercial DM is done with this. In one store, 18,000 baskets per month Tesco has >500 stores. Per year, 100,000,000 baskets ? The Internet ~ >15,000,000,000 pages Lots of datasets: UCI Machine Learning repository How can we begin to understand and exploit such datasets? Especially the big ones?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Like this …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: and this …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: and this …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: or this … see ndemo/html/root.html

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining - Basics Data Mining is the process of discovering patterns and inferring associations in raw data Data Mining is a collection of techniques intended to analyse small or large amounts of data Data Mining can employ a range of techniques, either individually or in combination with each other

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining – Why is it important? Data are being generated in enormous quantities Data are being collected over long periods of time Data are being kept for long periods of time Computing power is formidable and cheap A variety of Data Mining software is available All of these data contain `hidden knowledge’ – facts, rules, patterns, that can be usefully exploited if we can find them.

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining – History The approach has its roots over 40 years ago In the early 1960s Data Mining was called statistical analysis, and the pioneers were statistical software companies such as SPSS By the late 1980s these traditional techniques had been augmented by new methods such as machine induction, artificial neural networks, evolutionary computing, etc.

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Some basic terminology GenderweightheightAge in mths100m time Male52kg1.71m s Male89kg1.92m s Female48kg1.67m s Male86kg1.96m s Male80kg1.88m s etc …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: This is called a data instance or a record or just a line of data GenderweightheightAge in mths100m time Male52kg1.71m s Male89kg1.92m s Female48kg1.67m s Male86kg1.96m s Male80kg1.88m s etc …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: This is called a field or an attribute; the value of the Age field in the 4 th record is 274 GenderweightheightAge in mths100m time Male52kg1.71m s Male89kg1.92m s Female48kg1.67m s Male86kg1.96m s Male80kg1.88m s etc …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Usually we are interested in predicting the value of a particular field, given the values of the other fields. What we want to predict is called the class field, or the target class GenderweightheightAge in mths100m time Male52kg1.71m s Male89kg1.92m s Female48kg1.67m s Male86kg1.96m s Male80kg1.88m s etc …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Some data-mining related projects that I am currently working on (either myself, or with a PhD student or RA) Predicting whether or not two textures will be considered similar by humans. Predicting which of two or more writers is the author of a given piece of text (you will do some work on this) Discovering which subsets of many thousands of genes play a role in specific diseases (cancer, diabetes, etc) (you may do a little work on this too) Discovering technical trading rules for stock market trading (you may do a little work on this too)

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Which pair of textures is most similar?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Which pair of textures is most similar? A line of data … … … ,000 features for texture1 5,000 features for texture2 %age of people who think they are similar

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Who wrote text chunk 4? … AuthorA … AuthorA … AuthorB … ? Word usage `Fingerprint’ of a 1,000 word chunk of text

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Did the Dow Jones go up or down in the following week?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Down

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Will the Dow Jones go up or down tomorrow?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining – Two Major Types Directed (Farming) – Attempts to explain or categorise some particular target field such as income, medical disorder, genetic characteristic, etc. Undirected (Exploring) – Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes Compare with Supervised and Unsupervised systems in machine learning

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining – Tasks Classification - Example: high risk for cancer or not Estimation - Example: household income Prediction - Example: credit card balance transfer average amount Affinity Grouping - Example: people who buy X, often also buy Y with a probability of Z Clustering - similar to classification but no predefined classes Description and Profiling – Identifying characteristics which explain behaviour - Example: “More men watch football on TV than women”

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Warehousing Note that Data Mining is very generic and can be used for detecting patterns in almost any data – Retail data – Genomes – Climate data – Etc. Data Warehousing, on the other hand, is almost exclusively used to describe the storage of data in the commercial sector

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: What you should do this week Browse the UCI Machine Learning repository datasets and associated information; get acquainted with data Browse the statlib datasets archive, get acquainted with that too. Browse the website - to give you some idea of how hot data mining ishttp:// And then …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Coursework A (0 marks, but you fail if you don’t submit an adequate attempt) Find three other dataset repositories as follows: 1.One that specialises in financial data 2.One that specialises in time series data 3.One that specialises in anything else. For each of these three, tell me the URL, and write one paragraph, ~100 words, in your own words, describing the contents of this repository, Submit on or before Friday October 14th

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Au revoir

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: If time available … Some slides about data warehousing; I don’t consider this an essential part of this module, but in case you want to know what data warehousing is …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Warehousing - Definitions “ A subject-oriented, integrated, time-variant and nonvolatile collection of data in support of management's decision making process ” W. H. Inmon, "What is a Data Warehouse?" Prism Tech Topic, Vol. 1, No. 1, a very influential definition. “ A copy of transaction data, specifically structured for query and analysis ” Ralph Kimball, from his 2000 book, “The Data Warehouse Toolkit”

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Warehouse – why? For organisational learning to take place data from many sources must be gathered together over time and organised in a consistent and useful way Data Warehousing allows an organisation to remember its data and what it has learned about its data Data Mining techniques make use of the data in a Data Warehouse and subsequently add their results to it

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Warehouse - Contents A Data Warehouse is a copy of transaction data specifically structured for querying, analysis and reporting The data will normally have been transformed when it was copied into the Data Warehouse The contents of a Data Warehouse, once acquired, are fixed and cannot be updated or changed later by the transaction system - but they can be added to of course

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Marts A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse A Data Mart will normally reflect the business rules of a specific business unit within an enterprise – identifying data relevant to that unit’s acitivities

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: From Data Warhousing to Machine Learning, via Data Marts

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: The Big Challenge for Data Mining The largest challenge that a Data Miner may face is the sheer volume of data in the Data Warehouse It is very important, then, that summary data also be available to get the analysis started The sheer volume of data may mask the important relationships in which the Data Miner is interested Being able to overcome the volume and interpret the data is essential to successful Data Mining

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: What happens in practice … Data Miners, both “farmers” and “explorers”, are expected to utilise Data Warehouses to give guidance and answer a limitless variety of questions The value of a Data Warehouse and Data Mining lies in a new and changed appreciation of the meaning of the data There are limitations though - A Data Warehouse cannot correct problems with its data, although it may help to more clearly identify them