9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Chapter 1 Business Driven Technology
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
CS583 – Data Mining and Text Mining
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
The Experience Factory May 2004 Leonardo Vaccaro.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 Data Mining Techniques Instructor: Ruoming Jin Fall 2006.
Introduction to Data Mining with Case Studies
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
1 CSE591 (575) Data Mining 1/21/ /6/2003 Computer Science & Engineering ASU.
Topics in Computational Biology (COSI 230a) Pengyu Hong 09/02/2005.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Business Intelligence: Essential of Business
CS 5941 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
1. INTERNET MARKET RESEARCH 2. OPERATIONAL DATA TOOLS Info. for Competitive Marketing Advantages Maher ARAFAT, June, 2010.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
CS525 DATA MINING COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Data Mining By Dave Maung.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
AN INTELLIGENT AGENT is a software entity that senses its environment and then carries out some operations on behalf of a user, with a certain degree of.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
General Information 439 – Data Mining Assist.Prof.Dr. Derya BİRANT.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Books Visualizing Data by Ben Fry Data Structures and Problem Solving Using C++, 2 nd edition by Mark Allen Weiss MATLAB for Engineers, 3 rd edition by.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
Mining of Massive Datasets Edited based on Leskovec’s from
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Introduction to Data Mining
Introduction C.Eng 714 Spring 2010.
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
Course Introduction CSC 576: Data Mining.
Dept. of Computer Science University of Liverpool
Data Mining.
Welcome! Knowledge Discovery and Data Mining
CSE591: Data Mining by H. Liu
Presentation transcript:

9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU

9/03 Data Mining – Introduction Guozhu Dong2 Introduction Introduction to this Course Introduction to Data Mining

9/03 Data Mining – Introduction Guozhu Dong3 Introduction to the Course First, about you - why take this course? Your background and strength AI, DBMS, Statistics, Biology, Business, … Your interests and requests What is this course about? Problem solving Handling data transform data to workable data Mining data turn data to knowledge validation and presentation of knowledge

9/03 Data Mining – Introduction Guozhu Dong4 This course What can you expect from this course? Knowledge and experience about DM Problem solving skills How is this course conducted? Home works, projects, exams, classes Course Format Individual Projects: 30% Exams and/or quizzes: 60% Homeworks: 10%

9/03 Data Mining – Introduction Guozhu Dong5 Course Web Site cs.wright.edu/~gdong/mining03/WSUCS499DataMining.htm My office and office hours RC 430 4:30-5:30, T Th My Slides and relevant information will be made available at the course web site

9/03 Data Mining – Introduction Guozhu Dong6 Any questions and suggestions? Your feedback is most welcome! I need it to adapt the course to your needs. Please feel free to provide yours anytime. Share your questions and concerns with the class – very likely others may have the same. No pain no gain – no magic for data mining. The more you put in, the more you get Your grades are proportional to your efforts.

9/03 Data Mining – Introduction G Dong (WSU)7 Introduction to Data Mining Definitions Motivations of DM Interdisciplinary Links of DM

9/03 Data Mining – Introduction Guozhu Dong8 What is DM? Or more precisely KDD (knowledge discovery from databases)? Many definitions An iterative process, not plug-and-play raw data  transformed data  preprocessed data  data mining  post-processing  knowledge One definition is A non-trivial process of identifying valid, novel, useful and ultimately understandable patterns in data

9/03 Data Mining – Introduction Guozhu Dong9 Need for Data Mining Data accumulate and double every 9 months There is a big gap from stored data to knowledge; and the transition won’t occur automatically. Manual data analysis is not new but a bottleneck Fast developing Computer Science and Engineering generates new demands Seeking knowledge from massive data Any personal experience?

9/03 Data Mining – Introduction Guozhu Dong10 When is DM useful Data rich world Large data (dimensionality and size) Image data (size) Gene chip data (dimensionality) Little knowledge about data (exploratory data analysis) What if we have some knowledge?

9/03 Data Mining – Introduction Guozhu Dong11 DM perspectives KDD “goals”: Prediction, description, explanation, optimization, and exploration Knowledge forms: patterns vs. models Understandability and representation of knowledge Some applications Business intelligence (CRM) Security (Info, Comp Systems, Networks, Data, Privacy) Scientific discovery (bioinformatics, medicine)

9/03 Data Mining – Introduction Guozhu Dong12 Challenges Increasing data dimensionality and data size Various data forms New data types Streaming data, multimedia data Efficient search and access to data/knowledge Intelligent update and integration

9/03 Data Mining – Introduction Guozhu Dong13 Interdisciplinary Links of DM Statistics Databases AI Machine Learning Visualization High Performance Computing supercomputers, distributed/parallel/cluster computing

9/03 Data Mining – Introduction Guozhu Dong14 Statistics  Discovery of structures or patterns in data sets hypothesis testing, parameter estimation Optimal strategies for collecting data  efficient search of large databases Static data  constantly evolving data Models play a central role  algorithms are of a major concern  patterns are sought

9/03 Data Mining – Introduction Guozhu Dong15 Relational Databases A relational database can contain several tables Tables and schemas The goal in data organization is to maintain data and quickly locate the requested data Queries and index structures Query execution and optimization Query optimization is to find the “best” possible evaluation method for a given query Providing fast, reliable access to data for data mining

9/03 Data Mining – Introduction Guozhu Dong16 AI Intelligent agents Perception-Action-Goal-Environment Search Uniform cost and informed search algorithms Knowledge representation FOL, production rules, frames with semantic networks Knowledge acquisition Knowledge maintenance and application

9/03 Data Mining – Introduction Guozhu Dong17 Machine Learning Focusing on complex representations, data-intensive problems, and search-based methods Flexibility with prior knowledge and collected data Generalization from data and empirical validation statistical soundness and computational efficiency constrained by finite computing & data resources Challenges from KDD scaling up, cost info, auto data preprocessing, more knowledge types

9/03 Data Mining – Introduction Guozhu Dong18 Visualization Producing a visual display with insights into the structure of the data with interactive means zoom in/out, rotating, displaying detailed info Various types of visualization methods show summary properties and explore relationships between variables investigate large DBs and convey lots of information analyze data with geographic/spatial location A pre- and post-processing tool for KDD

9/03 Data Mining – Introduction Guozhu Dong19 Bibliography J. Han and M. Kamber. Data Mining – Concepts and Techniques Morgan Kaufmann. D. Hand, H. Mannila, P. Smyth. Principals of Data Mining MIT. W. Klosgen & J.M. Zytkow, edited, 2001, Handbook of Data Mining and Knowledge Discovery.