CSE591: Data Mining by H. Liu

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

CS583 – Data Mining and Text Mining
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
CS583 – Data Mining and Text Mining
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining – Intro.
Data mining By Aung Oo.
CS 5941 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
CS583 – Data Mining and Text Mining
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Academic Year 2014 Spring Academic Year 2014 Spring.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining Concept Submitted TO: Mrs. MONIKA SUBMITTED BY: SHALU 4717.
CS583 – Data Mining and Text Mining
Data Mining – Intro.
Presented by Khawar Shakeel
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
CS583 – Data Mining and Text Mining
Statistics 202: Statistical Aspects of Data Mining
CS583 – Data Mining and Text Mining
MIS 451 Building Business Intelligence Systems
Data Mining 101 with Scikit-Learn
Introduction C.Eng 714 Spring 2010.
Adrian Tuhtan CS157A Section1
Data Mining: Concepts and Techniques Course Outline
CS583 – Data Mining and Text Mining
Data Mining Modified from
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
CS583 – Data Mining and Text Mining
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Web Mining Department of Computer Science and Engg.
Data Mining: Introduction
CS583 – Data Mining and Text Mining
Data Warehousing Data Mining Privacy
CSE/CBS 572: Data Mining by Huan Liu
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
CS583 – Data Mining and Text Mining
CSE591: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
Presentation transcript:

CSE591: Data Mining by H. Liu Huan Liu, CSE, CEAS, ASU http://www.public.asu.edu/~huanliu/DM04S/cse591.html Agenda Self introduction of all What’s the course Questions/Answers Organization 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Contents Classification, Clustering, Association, and Applications Format - A seminar course with a lot of assignments and Work Paper reading, discussion, project, presentation Assessment Class participation, assignments, project proposal, presentations, exam(s) 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu You TA: Jigar Mody, jigar.mody@asu.edu Me: Huan Liu, huanliu@asu.edu Where: Brickyard 566 When: Right after class, other times by appointment MyASU will be used, so make sure your email address is correct & won’t miss important announcement 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Course Format An experiment since Fall 2000 about effective teaching of graduate data mining Research papers - the main categories to be found on the course web site You can choose one of the textbooks listed. A reference list is an entering point for you to access related subjects Everyone is expected to read the papers and participate in class discussion Presenters will be evaluated on the spot 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Projects (25%, 10%) Exam(s) (40%) Assignment, quizzes and class participation (25%) Late penalty, YES. Academic integrity (http://www.public.asu.edu/~huanliu/conduct.html) 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Paper presentation Each student will be responsible for one topic. All are expected to search for and read the selected material(s) before the presentation. What is it about? What are points to discuss and improve? What can we do with it? Each presentation is about 30 minutes including discussion, question & answer 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Project Proposal Proposal presentation, discussion, revision A project should be completed in a semester Project Presentation and demo Report 11/28/2018 CSE591: Data Mining by H. Liu

Topic Distribution (tentative) 11/28/2018 CSE591: Data Mining by H. Liu

Categories of interests (including design and implementation) Data and application security Data mining and privacy Data reduction and selection Streaming data reduction Dealing with large data (column- & row-wise) Selection bias Learning algorithms Ensemble methods Incremental learning Active learning and co-training Bioinformatics for CBS 591 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Your first assignment Think about what you want to accomplish. List 2 your areas of interests (don’t be restricted by the previous list). Pick an area of interest and choose a general topic for paper presentation. Complete the above and submit it in the 2nd class. 11/28/2018 CSE591: Data Mining by H. Liu

2nd Assignment due in two weeks (2/5/04) – due date revised Choose your category of interest Find at least 2 quality papers in that category TA will help you and compile a list of all papers at the end Write a (< 1 page) summary for each paper What is it about Why is it significant and relevant Where is it published and when 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Introduction The need for data mining Data mining Web mining Applications 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu What is data mining Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web, image. the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Patterns (1) Patterns are the relationships and summaries derived through a data mining exercise. Patterns must be: valid novel potentially useful understandable 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Patterns (2) Patterns are used for prediction or classification describing the existing data segmenting the data (e.g., the market) profiling the data (e.g., your customers) etc. 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Data mining typically deals with data that have already been collected for some purpose other than data mining. Data miners usually have no influence on data collection strategies. Large bodies of data cause new problems: representation, storage, retrieval, analysis, ... 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Even with a very large data set, we are usually faced with just a sample from the population. Data exist in many types (continuous, nominal) and forms (credit card usage records, supermarket transactions, government statistics, text, images, medical records, human genome databases, molecular databases). 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Some DM tasks Classification: mining patterns that can classify future data into known classes. Association rule mining mining any rule of the form X  Y, where X and Y are sets of data items. Clustering identifying a set of similarity groups in the data 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Sequential pattern mining: A sequential rule: A B, says that event A will be immediately followed by event B with a certain confidence Deviation detection: discovering the most significant changes in data Data visualization: using graphical methods to show patterns in data. 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Why data mining Rapid computerization of businesses produces huge amounts of data How to make best use of data? A growing realization: knowledge discovered from data can be used for competitive advantage. 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Make use of your data assets Many interesting things you want to find cannot be found using database queries “find me people likely to buy my products” “Who are likely to respond to my promotion” Fast identify underlying relationships and respond to emerging opportunities 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Why now The data is abundant. The data is being warehoused. The computing power is affordable. The competitive pressure is strong. Data mining tools have become available. 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu DM fields Data mining is an emerging multi-disciplinary field: Statistics Machine learning Databases Visualization OLAP and data warehousing ... 11/28/2018 CSE591: Data Mining by H. Liu

CSE591: Data Mining by H. Liu Summary What is data mining? KDD - knowledge discovery in databases: non-trivial extraction of implicit, previously unknown and potentially useful information Why do we need data mining? Wide use of computer systems - data explosion - knowledge is power - but we’re data rich, knowledge lean - actionability ... 11/28/2018 CSE591: Data Mining by H. Liu

An Overview of KDD Process (Guess which is which) Various databases Integration Datawarehouse Preprocessing Data mining Post processing Knolwedge 11/28/2018 CSE591: Data Mining by H. Liu

Web mining – an application The Web is a massive database Semi-structured data XML and RDF Web mining Content Structure Usage 11/28/2018 CSE591: Data Mining by H. Liu