Download presentation
Presentation is loading. Please wait.
Published byElvin Henry Modified over 9 years ago
1
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Introduction Peixiang Zhao
2
Welcome to CIS4930 Course Website: – http://www.cs.fsu.edu/~zhao/cis4930/main.html http://www.cs.fsu.edu/~zhao/cis4930/main.html – Everything about the course can be found here Syllabus, announcements, policies, schedules, slides, assignments, resource… – Make sure you check the course website periodically Please read the class syllabus, policies, and lecture schedule; ask now if you have questions 1
3
Teaching Staff Instructor: Peixiang Zhao – Research interest Generally, data and information science including database systems and data mining Specifically, graph data, information network analysis, large-scale data-intensive computation and analytics – Brief history Illinois (Ph.D. from UIUC) Florida (Assistant professor at FSU starting from Aug. 2012) TA: – Yongjiang Liang (liang@cs.fsu.edu)liang@cs.fsu.edu – Office hours: Tuesday 10am – 11am 2
4
Prerequisite Must know how to program, and have data structure and algorithm background – COP3330: Object-oriented Programming – COP4530: Data structures and algorithms – Knowledge on probability theory, statistics, and linear algebra 3
5
Textbook Data Mining: Concepts and Techniques. 3 rd edition – Jiawei Han, Micheline Kamber, Jian Pei References – Introduction to Data Mining Introduction to Data Mining – Data Mining: The Textbook Data Mining: The Textbook – The Elements of Statistical Learning The Elements of Statistical Learning – Pattern recognition and Machine Learning Pattern recognition and Machine Learning4
6
Course Format Two 75-min lectures/week – Lecture slides are used to complement the lectures, not to substitute the textbook Four homework (40%) – Written assignments and machine problems Datasets or software might be provided – Individual work – Due right before the class starts in the due date – No late homework will be accepted One midterm (15%) and one final (40%) – Check dates and make sure no conflict! Quizzes (5%) 5
7
You Tell Me -- Why Are You Taking this Course? – https://www.youtube.com/watch?v=vbb-AjiXyh0 https://www.youtube.com/watch?v=vbb-AjiXyh0 – https://www.youtube.com/watch?v=1i6uESo98Yo https://www.youtube.com/watch?v=1i6uESo98Yo – Data mining tops LinkedIn’s list of the “hottest skills of 2014” – Data scientist: the sexiest job of 21 st century (Harvard Business Review) – Data scientist: 2015’s hottest profession (Mashable) 6
8
Why Data Mining? 7 Big Data However, we are drowning in data, but starving for knowledge! – There is often information “ hidden ” in the data that is not readily evident – Human analysts may take weeks to discover useful information – Much of the data is never analyzed at all
9
What is Data Mining Non-trivial extraction of implicit, previously unknown, and potentially useful information from data – a.k.a. KDD (knowledge discovery in databases) – Data to be mined Relational databases, data warehouses; Data streams and sensor data; Time-series data, temporal data, sequence data; Graphs, social networks and multi-linked data; Spatial data and spatiotemporal data; Multimedia data; Text data; WWW data – Knowledge to be obtained Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis 8
10
The Goal: Decision Support Typical procedure – Data Knowledge Action/Decision Goal Examples – Netflix collects user ratings of movies What types of movies you will like Recommend new movies to you Users stay with Netflix – Gene sequences of cancer patients Which genes lead to cancer? Appropriate treatment Save life – Road traffic Which road is likely to be congested? Suggest better routes to drivers Save time and energy 9
11
Example: Association Rule Mining Data – A set of transactions, each of which consists of a set of items Association rules – A set of rules that characterize associations between items 10 Market-Basket transactions Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
12
Example: Classification Process – Construct models (functions) based on training data with known class labels – Describe and distinguish classes or concepts for future prediction – Predict testing data with unknown class labels Applications – Spam identification – Treatment prediction – Document categorization – …… 11
13
Ads Targeting 12 featuresclass labels a classifier: f(x)=y: features class labels training testing
14
Fraud Detection 13 categorical continuous class Test Set Training Set Model Learn Classifier
15
Example: Clustering Goal – Finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups 14
16
Example: Outlier Detection Outliers (Anomalies) – Global: observations inconsistent with rest of the dataset – Local: Observations inconsistent with their neighborhoods A local instability or discontinuity Applications – Fraud/intrusion detection – Customized marketing – Weather prediction 15 One persons noise could be another person’s signal. - Edward Ng
17
Data Mining Tasks Prediction Methods: Use some variables to predict unknown or future values of other variables – Classification – Regression – Outlier detection Description Methods: Find human-interpretable patterns that describe the data – Clustering – Association rule mining 16
18
Data Mining: Confluence of Multiple Disciplines 17 Data Mining Machine Learning Statistics Applications Algorithm Pattern Recognition High-Performance Computing Visualization Database Technology
19
The Top 10 Data Mining Algorithms 1.C4.5: classification 2.K-Means: clustering 3.SVM: classification 4.Apriori: association analysis 5.EM: statistical learning 6.PageRank: link mining 7.AdaBoost: bagging and boosting 8.kNN: classification 9.Naive Bayes: classification 10.CART: classification 18
20
Questions Any questions? Please feel free to raise your hands. 19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.