Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
/faculteit technologie management Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro)
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Week 9 Data Mining System (Knowledge Data Discovery)
Automated Procedures for Improving the Accuracy of Sensor-Based Monitoring Data Rebecca Buchheit AIS Lab.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Lecture 2 Themes in this session Knowledge discovery in databases
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Data Mining By Archana Ketkar.
Data Mining – Intro.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
More on Data Mining KDnuggets Datanami ACM SIGKDD
Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Introduction: The essential background
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
2 Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion Objectives, Prerequisite and.
1 SHIM 413 Database Applications for Healthcare Fall 2006 Slides by H. T. Bao.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Copyright  2003 by Dr. Gallimore, Wright State University Department of Biomedical, Industrial Engineering & Human Factors Engineering Human Factors Research.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Machine Learning with Spark MLlib
Data Mining – Intro.
SNS COLLEGE OF TECHNOLOGY
A Methodology for Finding Bad Data
Preface to the special issue on context-aware recommender systems
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
A Unifying View on Instance Selection
Data Warehousing and Data Mining
Classification and Prediction
Data Warehousing Data Mining Privacy
Presentation transcript:

Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003

Civil and Environmental Engineering Carnegie Mellon University Recap of Last Week Sensors - what are they? Sensor Networks - how they help us Sensor Signal Acquisition and Use Effects of Digital, analog conversions Range, power, frequency, other constraints Next - how to use the data!

Civil and Environmental Engineering Carnegie Mellon University Life Cycles of Sensor Networks Currently, sensors and sensor systems are fairly proprietary e.g. a ‘Johnson Controls’ HVAC sensor system uses only their equipment Need to design more robust networks that are standards-driven and open

Civil and Environmental Engineering Carnegie Mellon University Life Cycles (2) In addition, sensor networks then to have very short ‘lifetimes’ i.e. We build one, use it for a few years, and then replace it with a newer/better one Need to plan for, and design architectures for sensor networks that will last the life of the infrastructure we are monitoring e.g years for bridges (to manage LCC)

Civil and Environmental Engineering Carnegie Mellon University A Knowledge Discovery Framework for Civil Infrastructure Contexts Rebecca Buchheit Department of Civil and Environmental Engineering Carnegie Mellon University

Civil and Environmental Engineering Carnegie Mellon University Motivation condition and usage patterns of critical infrastructure attracting increased attention deteriorating infrastructure + cheap data collection methods = health monitoring, transportation management, other data intensive civil infrastructure techniques

Civil and Environmental Engineering Carnegie Mellon University Motivation amount of data, relationships between attributes, context-sensitivity, observational collection methods => data mining and knowledge discovery in databases (KDD) process our ability to collect data far outstrips our ability to analyze and understand the data at a high level of abstraction

Civil and Environmental Engineering Carnegie Mellon University Databases + Statistics + and Machine Learning = Data Mining databases statistics machine learning data mining

Civil and Environmental Engineering Carnegie Mellon University Definitions Data Mining algorithms to extract patterns from large data sets Knowledge Discovery in Databases “... the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.” [Fayyad, et al] Uses observational, not controlled, data

Civil and Environmental Engineering Carnegie Mellon University Knowledge Discovery Process Steps domain understanding data understanding data preparation data modeling (a.k.a “data mining”) results evaluation deployment

Civil and Environmental Engineering Carnegie Mellon University CRISP-DM CRoss-Industry Standard Process for Data Mining high-level, hierarchical, iterative process model for KDD provides framework for applying KDD consistently

Civil and Environmental Engineering Carnegie Mellon University Domain Understanding evaluate fit between KDD and the problem how much data? what type of data? perceived quality of data? what is being measured? right data to answer the question? organizational support?

Civil and Environmental Engineering Carnegie Mellon University Data Understanding summary statistics plotting and visualization missing values randomly missing influenced by a measured factor influenced by an unmeasured factor evaluate quality of existing data what is “good” data? what do we do with “bad” data?

Civil and Environmental Engineering Carnegie Mellon University Data Preparation most time-consuming part of KDD data selection which records (“rows”) to use which attributes (“columns”) to use data cleaning do something to bad and missing data integrate data from different sources transform data

Civil and Environmental Engineering Carnegie Mellon University Data Modeling/Data Mining choose an algorithm choose parameters for that algorithm apply algorithm to data evaluate results –predictive accuracy –descriptive coverage repeat as necessary

Civil and Environmental Engineering Carnegie Mellon University Data Mining Goals Prediction predict the value of one or more variables based on the values of other variables Description describe the data set in a compact, human- understandable form

Civil and Environmental Engineering Carnegie Mellon University Data Mining Tasks Classification Regression Clustering Deviation detection Summarization Dependency modeling

Civil and Environmental Engineering Carnegie Mellon University Classification learn how to classify data items into predefined groups

Civil and Environmental Engineering Carnegie Mellon University Regression map a real- valued dependent variable to one or more independent variables

Civil and Environmental Engineering Carnegie Mellon University Clustering learn “natural” classes or clusters of data

Civil and Environmental Engineering Carnegie Mellon University Deviation Detection detect changes or deviations from “normal” or baseline state

Civil and Environmental Engineering Carnegie Mellon University Summarization summarize subsets of data set computer industry mean salary = $65k service industry mean salary = $20k

Civil and Environmental Engineering Carnegie Mellon University Dependency Modeling learn relationships between attributes or between items in the data set pattern recognition time series analysis association rules In 80% of the cases, an engineer with a PE and 10 years experience is a project manager.

Civil and Environmental Engineering Carnegie Mellon University Data Mining in the IW concept description using classification environmental conditions affect hot water energy consumption used outside temperature, solar radiation and wind speed solar radiation and wind speed not significant above 80F and below 50F IF temperature between 20F and 30F THEN energy usage between 47,393 kJ and 131,875 kJ describes >50% instances in energy usage range

Civil and Environmental Engineering Carnegie Mellon University Results Evaluation do results meet client’s criteria? novel? understandable? valid (modeling phase)? useful?

Civil and Environmental Engineering Carnegie Mellon University Results Deployment explain results to client improvements to data collection? ongoing process applied to new data?

Civil and Environmental Engineering Carnegie Mellon University Benefits of KDD Intelligent Workplace confirmation that system is (not) working continue to monitor control system in future, predict missing values to complete energy studies

Civil and Environmental Engineering Carnegie Mellon University Apply Data Mining to Civil Infrastructure? civil infrastructure meets guidelines for selecting potential data mining problems significant impact no good alternatives exist prior/domain knowledge effects of noisy data are mitigated sufficient data relevant attributes are being measured

Civil and Environmental Engineering Carnegie Mellon University Background sporadic use of KDD techniques in civil infrastructure relative youth of data mining research difficult to systematically apply KDD process KDD process tools (CRISP-DM) still under development KDD process highly domain dependent time consuming to teach data mining analysts domain knowledge

Civil and Environmental Engineering Carnegie Mellon University Research Objectives develop a framework for systematically applying KDD process to civil infrastructure data analysis needs set of guidelines for inexperienced analysts checklist for more experienced analysts describe intersection of KDD process characteristics and civil infrastructure what problems are well-suited to KDD? what characteristics are unique to infrastructure?

Civil and Environmental Engineering Carnegie Mellon University Summary increased data collection => increased need to intelligently analyze data KDD process as a “power tool” for analyzing data for high-level knowledge civil infrastructure problems are well-suited to data mining but will need to apply entire KDD process to get good results proposed framework will help researchers to systematically apply KDD process to their data analysis problems