1 CSE591 (575) Data Mining 1/21/2003 - 5/6/2003 Computer Science & Engineering ASU.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

The Robert Gordon University School of Engineering Dr. Mohamed Amish
Chapter 1 Business Driven Technology
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Machine Learning and Data Mining Course Summary. 2 Outline  Data Mining and Society  Discrimination, Privacy, and Security  Hype Curve  Future Directions.
EXPERT SYSTEMS apply rules to solve a problem. –The system uses IF statements and user answers to questions in order to reason just like a human does.
Data warehouse example
The Experience Factory May 2004 Leonardo Vaccaro.
Oklahoma Supercomputing Symposium 2008 Oct 7 th 2008 Mining for Science and Engineering Presented by: Kenji Yoshigoe.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Presented by Zeehasham Rasheed
Topics in Computational Biology (COSI 230a) Pengyu Hong 09/02/2005.
Business Intelligence: Essential of Business
CS 5941 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
11 C H A P T E R Artificial Intelligence and Expert Systems.
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Data Mining By Dave Maung.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
CSE 102 Introduction to Computer Engineering What is Computer Engineering?
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Data Mining and Decision Support
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Books Visualizing Data by Ben Fry Data Structures and Problem Solving Using C++, 2 nd edition by Mark Allen Weiss MATLAB for Engineers, 3 rd edition by.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
LECTURE 2: DATA MINING. WHAT IS DATA MINING? 2 D ATA M INING AND D ATA W AREHOUSES ? It evolved in to being as the science of databases evolved Database.
Mining of Massive Datasets Edited based on Leskovec’s from
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
DATA MINING © Prentice Hall.
Introduction to Data Mining
Introduction C.Eng 714 Spring 2010.
Data Mining: Concepts and Techniques Course Outline
CSE591: Data Mining by H. Liu
Data Warehousing and Data Mining
Overview of Machine Learning
Course Introduction CSC 576: Data Mining.
CSE 5290: Algorithms for Bioinformatics Fall 2009
Welcome! Knowledge Discovery and Data Mining
CSE591: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
Presentation transcript:

1 CSE591 (575) Data Mining 1/21/ /6/2003 Computer Science & Engineering ASU

2 Introduction Introduction to this Course Introduction to Data Mining

3 Introduction to the Course First, about you - why take this course? Your background and strength AI, DBMS, Statistics, Biology, … Your interests and requests What is this course about? Problem solving Handling data transform data to workable data Mining data turn data to knowledge validation and presentation of knowledge

4 This course What can you expect from this course? Knowledge and experience about DM Problem solving and solution presentation How is this course conducted? Presentations Individual projects Course Format Individual Projects 40% Exams and/or quizzes 40% Class participation 20% off-campus students?

5 Projects - Start NOW! How to start? Projects should be sufficiently challenging but reasonable, suitable for one semester How to choose your individual project Real-world problems Problems that might make differences Two types of projects Available projects Self-proposed projects (Approval’s needed)

6 Some project ideas Dealing with high dimensional data Data of supervised, unsupervised learning Image mining Feature extraction, clustering of images Active sampling Various data structures (kd-trees, R-trees, Multi-Dimen Scaling) Meta data (RDF, namespace) for mining Ensemble learning Sequence mining (HMM learning) Bioinformatics and applications (feature selection) Intelligent driving data analysis Data integration, data reduction (random projection)

7 How is a project evaluated? It depends on What do you want to achieve Its impact Your effort The sooner you start, the better The beginning is not easy

8 Course Web Site html My office and office hours GWC 342 T 10: :30am and Th 4:00-5:00pm My Slides and relevant information will be made available at the course web site

9 Any questions and suggestions? Your feedback is most welcome! I need it to adapt the course to your needs. Please feel free to provide yours anytime. Share your questions and concerns with the class – very likely others may have the same. No pain no gain – no magic for data mining. The more you put in, the more you get Your grades are proportional to your efforts.

10 Introduction to Data Mining Definitions Motivations of DM Interdisciplinary Links of DM

11 What is DM? Or more precisely KDD (knowledge discovery from databases)? Many definitions A process, not plug-and-play raw data  transformed data  preprocessed data  data mining  post-processing  knowledge One definition is A non-trivial process of identifying valid, novel, useful and ultimately understandable patterns in data

12 Need for Data Mining Data accumulate and double every 9 months There is a big gap from stored data to knowledge; and the transition won’t occur automatically. Manual data analysis is not new but a bottleneck Fast developing Computer Science and Engineering generates new demands Seeking knowledge from massive data Any personal experience?

13 When is DM useful Data rich Two invited talks so far have convincingly demonstrate it Large data (dimensionality and size) Image data (size) Gene data (dimensionality) Little knowledge about data (exploratory data analysis) What if we have some knowledge?

14 DM perspectives Prediction, description, explanation, optimization, and exploration Completion of knowledge (patterns vs. models) Understandability and representation of knowledge Some applications Business intelligence (CRM) Security (Info, Comp Systems, Networks, Data, Privacy) Scientific discovery (bioinformatics)

15 Challenges Increasing data dimensionality and data size Various data forms New data types Streaming data, multimedia data Efficient search and data access Intelligent update and integration

16 Interdisciplinary Links of DM Statistics Databases AI Machine Learning Visualization High Performance Computing supercomputers, distributed/parallel/cluster computing

17 Statistics Discovery of structures or patterns in data sets hypothesis testing, parameter estimation Optimal strategies for collecting data  efficient search of large databases Static data  constantly evolving data Models play a central role  algorithms are of a major concern  patterns are sought

18 Relational Databases A relational databases can contain several tables Tables and schemas The goal in data organization is to maintain data and quickly locate the requested data Queries and index structures Query execution and optimization Query optimization is to find the best possible evaluation method for a given query Providing fast, reliable access to data for data mining

19 AI Intelligent agents Perception-Action-Goal-Environment Search uniform cost and informed search algorithms Knowledge representation FOL, production rules, frames with semantic networks Knowledge acquisition Knowledge maintenance and application

20 Machine Learning Focusing on complex representations, data-intensive problems, and search-based methods Flexibility with prior knowledge and collected data Generalization from data and empirical validation statistical soundness and computational efficiency constrained by finite computing & data recourses Challenges from KDD scaling up, cost info, auto data preprocessing

21 Visualization Producing a visual display with insights into the structure of the data with interactive means zoom in/out, rotating, displaying detailed info Various branches of visualization methods show summary properties and explore relationships between variables investigate large databases and convey lots of information analyze data with geographic/spatial location A pre- and post-processing tool for KDD

22 Bibliography W. Klosgen & J.M. Zytkow, edited, 2001, Handbook of Data Mining and Knowledge Discovery.