Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Copyright © 2010 SAS Institute Inc. All rights reserved. A Quick Introduction to JMP Dara Hammond JMP Account Rep.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
1. Abstract 2 Introduction Related Work Conclusion References.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining By Archana Ketkar.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
1 Chapter 1: Introduction 1.1 Introduction to SAS Enterprise Miner.
Chapter 1: Introduction
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Data Mining Chun-Hung Chou
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Appendix: The WEKA Data Mining Software
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Mining By Dave Maung.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
A way to integrate IR and Academic activities to enhance institutional effectiveness. Introduction The University of Alabama (State of Alabama, USA) was.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Copyright © 2015, SAS Institute Inc. All rights reserved. Business & Analytics unite VS.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Artificial Neural Networks for Data Mining. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-2 Learning Objectives Understand the.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Business Intelligence Overview. What is Business Intelligence? Business Intelligence is the processes, technologies, and tools that help us change data.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
MIS2502: Data Analytics Advanced Analytics - Introduction
Machine Learning with Weka
Comparisons of Clustering Detection and Neural Network in E-Miner, Clementine and I-Miner Jong-Hee Lee and Yong-Seok Choi.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning – a Probabilistic Perspective
Presentation transcript:

Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz, MPH SAS October 18, 2002 Disease and Adverse Event Reporting, Surveillance, and Analysis DIMACS, October 16 – 18, 2002 Rutgers University, Piscataway, NJ

Copyright © 2001, SAS Institute Inc. All rights reserved. Outline n Data Mining Methods Used in Surveillance Classification & Prediction Association Clustering Link Analysis n Applications n Problems n Opportunities

Copyright © 2001, SAS Institute Inc. All rights reserved. What Is Data Mining? SAS Institute defines data mining as the process of selecting, exploring, and modeling large amounts of data to uncover previously unknown patterns of data for an information advantage.

Copyright © 2001, SAS Institute Inc. All rights reserved. What Is Data Mining? “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It involves statistical and visualization techniques to discover and present knowledge in a form that may be easily comprehended.”

Copyright © 2001, SAS Institute Inc. All rights reserved. SAS Enterprise Miner

Copyright © 2001, SAS Institute Inc. All rights reserved. n Classification and Regression Trees n Logistic Regression n Neural Networks… Classification and Prediction

Copyright © 2001, SAS Institute Inc. All rights reserved. Classification and Prediction Comparison Selection Tuning Final Assessment

Copyright © 2001, SAS Institute Inc. All rights reserved. n Principal Components/ Dmneural Network Classification and Prediction The Princomp/Dmneural node enables users to fit an additive nonlinear model that uses bucketed principal components as inputs to predict a binary or an interval target variable. The node can also perform a principal components analysis, and then pass the scored principal components to successor nodes for further analysis.

Copyright © 2001, SAS Institute Inc. All rights reserved. n User Defined Model Classification and Prediction You can use the User Defined Model node to import and assess a model(s) that was not created with one of the Enterprise Miner modeling nodes.

Copyright © 2001, SAS Institute Inc. All rights reserved. n Ensemble Models Classification and Prediction The Ensemble node enables users to combine the results from multiple models to create a single, integrated model for their data. This node performs: stratified modeling bagging boosting combined modeling.

Copyright © 2001, SAS Institute Inc. All rights reserved. n Stratified Models Classification and Prediction When you have a stratification variable (for example, a group variable such as GENDER or REGION) defined in a Group Processing node, the modeling node creates a separate model for each level of the stratification variable.

Copyright © 2001, SAS Institute Inc. All rights reserved. n Bagging and Boosting Classification and Prediction Bagging and boosting models are created by resampling the training data and fitting a separate model for each sample. The predicted values (for interval targets) or the posterior probabilities (for a class target) are then averaged to form the ensemble model.

Copyright © 2001, SAS Institute Inc. All rights reserved. n Combined Models Classification and Prediction

Copyright © 2001, SAS Institute Inc. All rights reserved. n Two Stage Model Classification and Prediction

Copyright © 2001, SAS Institute Inc. All rights reserved. n Memory Based Reasoning Classification and Prediction Uses k-nearest neighbor approach to categorize or predict observations. Search algorithms include: scan, Reduced Dimensionality Tree.

Copyright © 2001, SAS Institute Inc. All rights reserved. Association Discovery “If item A is part of an event, then item B is also part of the event X percent of the time.” Sequence Discovery “If item A is part of an event, then item B occurs after event A occurs.” Association

Copyright © 2001, SAS Institute Inc. All rights reserved. Clustering places objects into groups or clusters suggested by the data. Methods perform disjoint cluster analysis on the basis of Euclidean distances computed from one or more quantitative variables and seeds that are generated and updated by the algorithm. Clustering

Copyright © 2001, SAS Institute Inc. All rights reserved. Kohonen Vector Quantization is a clustering method, whereas Self Organizing Maps (SOMs) are primarily dimension-reduction methods. As with Clustering, after the network maps have been created, the characteristics of the clusters can be profiled graphically and cluster IDs can be assigned to the data. Self Organizing Maps Kohonen Vector Quantization

Copyright © 2001, SAS Institute Inc. All rights reserved. Link Analysis

Copyright © 2001, SAS Institute Inc. All rights reserved. n National Database for clinical data centralized from 42 out of 49 hospitals with web access Indian Health Service Applications

Copyright © 2001, SAS Institute Inc. All rights reserved. n Real-Time Emergency Medical Services Surveillance Health and Human Services, San Diego County Applications

Copyright © 2001, SAS Institute Inc. All rights reserved. n Aberration detection methods during short-term syndrome-based surveillance CDC, California/Florida Departments of Health Applications

Copyright © 2001, SAS Institute Inc. All rights reserved. n Trends in Syndromic Surveillance data for Washington DC District of Columbia Department of Health Applications

Copyright © 2001, SAS Institute Inc. All rights reserved. n Ambulance dispatch and ER data sent via FTP to health department database. New York City Health Department Applications

Copyright © 2001, SAS Institute Inc. All rights reserved. n Considerations for a Surveillance System What are the objectives/purpose? What are the data sources? What information needs to be gathered? Who are the data providers? How is the data to be collected? How often? Voluntary or mandatory? Who will collect data? How should the data be processed, maintained and analyzed? How will the data reach those who need to know in order that decisions/actions may be taken? Problems

Copyright © 2001, SAS Institute Inc. All rights reserved. n Data Format: XML… n Text Mining n Modeling Format: Predictive Modeling Markup Language (PMML) n Score Code: C Code n Software: Java Based Opportunities

Copyright © 2001, SAS Institute Inc. All rights reserved. Thank You! Copyright © 2001, SAS Institute Inc. All rights reserved.