Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,

Similar presentations


Presentation on theme: "Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,"— Presentation transcript:

1 Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz, MPH SAS October 18, 2002 Disease and Adverse Event Reporting, Surveillance, and Analysis DIMACS, October 16 – 18, 2002 Rutgers University, Piscataway, NJ

2 Copyright © 2001, SAS Institute Inc. All rights reserved. Outline n Data Mining Methods Used in Surveillance Classification & Prediction Association Clustering Link Analysis n Applications n Problems n Opportunities

3 Copyright © 2001, SAS Institute Inc. All rights reserved. What Is Data Mining? SAS Institute defines data mining as the process of selecting, exploring, and modeling large amounts of data to uncover previously unknown patterns of data for an information advantage.

4 Copyright © 2001, SAS Institute Inc. All rights reserved. What Is Data Mining? “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It involves statistical and visualization techniques to discover and present knowledge in a form that may be easily comprehended.”

5 Copyright © 2001, SAS Institute Inc. All rights reserved. SAS Enterprise Miner

6 Copyright © 2001, SAS Institute Inc. All rights reserved. n Classification and Regression Trees n Logistic Regression n Neural Networks… Classification and Prediction

7 Copyright © 2001, SAS Institute Inc. All rights reserved. Classification and Prediction Comparison Selection Tuning Final Assessment

8 Copyright © 2001, SAS Institute Inc. All rights reserved. n Principal Components/ Dmneural Network Classification and Prediction The Princomp/Dmneural node enables users to fit an additive nonlinear model that uses bucketed principal components as inputs to predict a binary or an interval target variable. The node can also perform a principal components analysis, and then pass the scored principal components to successor nodes for further analysis.

9 Copyright © 2001, SAS Institute Inc. All rights reserved. n User Defined Model Classification and Prediction You can use the User Defined Model node to import and assess a model(s) that was not created with one of the Enterprise Miner modeling nodes.

10 Copyright © 2001, SAS Institute Inc. All rights reserved. n Ensemble Models Classification and Prediction The Ensemble node enables users to combine the results from multiple models to create a single, integrated model for their data. This node performs: stratified modeling bagging boosting combined modeling.

11 Copyright © 2001, SAS Institute Inc. All rights reserved. n Stratified Models Classification and Prediction When you have a stratification variable (for example, a group variable such as GENDER or REGION) defined in a Group Processing node, the modeling node creates a separate model for each level of the stratification variable.

12 Copyright © 2001, SAS Institute Inc. All rights reserved. n Bagging and Boosting Classification and Prediction Bagging and boosting models are created by resampling the training data and fitting a separate model for each sample. The predicted values (for interval targets) or the posterior probabilities (for a class target) are then averaged to form the ensemble model.

13 Copyright © 2001, SAS Institute Inc. All rights reserved. n Combined Models Classification and Prediction

14 Copyright © 2001, SAS Institute Inc. All rights reserved. n Two Stage Model Classification and Prediction

15 Copyright © 2001, SAS Institute Inc. All rights reserved. n Memory Based Reasoning Classification and Prediction Uses k-nearest neighbor approach to categorize or predict observations. Search algorithms include: scan, Reduced Dimensionality Tree.

16 Copyright © 2001, SAS Institute Inc. All rights reserved. Association Discovery “If item A is part of an event, then item B is also part of the event X percent of the time.” Sequence Discovery “If item A is part of an event, then item B occurs after event A occurs.” Association

17 Copyright © 2001, SAS Institute Inc. All rights reserved. Clustering places objects into groups or clusters suggested by the data. Methods perform disjoint cluster analysis on the basis of Euclidean distances computed from one or more quantitative variables and seeds that are generated and updated by the algorithm. Clustering

18 Copyright © 2001, SAS Institute Inc. All rights reserved. Kohonen Vector Quantization is a clustering method, whereas Self Organizing Maps (SOMs) are primarily dimension-reduction methods. As with Clustering, after the network maps have been created, the characteristics of the clusters can be profiled graphically and cluster IDs can be assigned to the data. Self Organizing Maps Kohonen Vector Quantization

19 Copyright © 2001, SAS Institute Inc. All rights reserved. Link Analysis

20 Copyright © 2001, SAS Institute Inc. All rights reserved. n National Database for clinical data centralized from 42 out of 49 hospitals with web access Indian Health Service Applications

21 Copyright © 2001, SAS Institute Inc. All rights reserved. n Real-Time Emergency Medical Services Surveillance Health and Human Services, San Diego County Applications

22 Copyright © 2001, SAS Institute Inc. All rights reserved. n Aberration detection methods during short-term syndrome-based surveillance CDC, California/Florida Departments of Health Applications

23 Copyright © 2001, SAS Institute Inc. All rights reserved. n Trends in Syndromic Surveillance data for Washington DC District of Columbia Department of Health Applications

24 Copyright © 2001, SAS Institute Inc. All rights reserved. n Ambulance dispatch and ER data sent via FTP to health department database. New York City Health Department Applications

25 Copyright © 2001, SAS Institute Inc. All rights reserved. n Considerations for a Surveillance System What are the objectives/purpose? What are the data sources? What information needs to be gathered? Who are the data providers? How is the data to be collected? How often? Voluntary or mandatory? Who will collect data? How should the data be processed, maintained and analyzed? How will the data reach those who need to know in order that decisions/actions may be taken? Problems

26 Copyright © 2001, SAS Institute Inc. All rights reserved. n Data Format: XML… n Text Mining n Modeling Format: Predictive Modeling Markup Language (PMML) n Score Code: C Code n Software: Java Based Opportunities

27 Copyright © 2001, SAS Institute Inc. All rights reserved. Thank You! Copyright © 2001, SAS Institute Inc. All rights reserved.


Download ppt "Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,"

Similar presentations


Ads by Google