DIMACS Working Group on Data Mining and Epidemiology.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Advertisements

4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Bayesian Biosurveillance Gregory F. Cooper Center for Biomedical Informatics University of Pittsburgh The research described in this.
1. Abstract 2 Introduction Related Work Conclusion References.
Data warehouse example
University of Buffalo The State University of New York Spatiotemporal Data Mining on Networks Taehyong Kim Computer Science and Engineering State University.
Challenges for Discrete Mathematics and Theoretical Computer Science in Defense Against Bioterrorism.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 HOMELAND SECURITY RESEARCH AT DIMACS. 2 Working Group on Adverse Event/Disease Reporting, Surveillance, and Analysis Health surveillance a core activity.
Massive Data Analysis Lab (MassDAL) S. Muthukrishnan CS Dept.
Data Mining By Archana Ketkar.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
Data Mining – Intro.
Data mining By Aung Oo.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Techniques As Tools for Analysis of Customer Behavior
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
CSIAC is a DoD Information Analysis Center (IAC) sponsored by the Defense Technical Information Center (DTIC) Presentation to: Insider Threat SOAR Workshop.
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Data Mining for Security Applications Dr. Bhavani Thuraisingham The University of Texas at Dallas January 2006.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
DATABASE MANAGEMENT SYSTEMS AND THEIR USES
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
1 INSTYTUT PODSTAW INFORMATYKI PAN OR INSTITUTE OF COMPUTER SCIENCE, POLISH ACAD. SCI. (56 scientists, including 49 Ph.D.’s) IN THE FOLLOWING AREAS: OFFER.
Business intelligence systems. Data warehousing. An orderly and accessible repositery of known facts and related data used as a basis for making better.
Collaboration Network in Healthcare E-RISE 2011 By Yudistira Asnar, Federica Paci (UNITN) May 13, 2011.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Chapter 1 Overview of Databases and Transaction Processing.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining.
Data Mining – Intro.
TRUST Area 3 Overview: Privacy, Usability, & Social Impact
Introduction C.Eng 714 Spring 2010.
Data and Applications Security Introduction to Data Mining
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Homeland Security Research at Rutgers University
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

DIMACS Working Group on Data Mining and Epidemiology

What are the challenges for mathematical scientists in the defense against disease? This question led DIMACS, the Center for Discrete Mathematics and Theoretical Computer Science, to launch a “special focus” on this topic.

DIMACS Special Focus on Computational and Mathematical Epidemiology Anthrax

Post-September 11 events soon led to an emphasis on bioterrorism. smallpox

Working Groups

Working Groups Continued Interdisciplinary, international groups of researchers. Come together at DIMACS. Informal presentations, lots of time for discussion. Emphasis on collaboration. Return as a full group or in subgroups to pursue problems/approaches identified in first meeting. By invitation; but contact the organizer. Junior researchers welcomed. Nominate them.

Working Groups WG’s on Large Data Sets: Adverse Event/Disease Reporting, Surveillance & Analysis. Spin-off: Health Care Data Privacy and Confidentiality Data Mining and Epidemiology.

WG’s on Analogies between Computers and Humans: Analogies between Computer Viruses/Immune Systems and Human Viruses/Immune Systems Distributed Computing, Social Networks, and Disease Spread Processes

WG’s on Methods/Tools of TCS Phylogenetic Trees and Rapidly Evolving Diseases Order-Theoretic Aspects of Epidemiology

WG’s on Computational Methods for Analyzing Large Models for Spread/Control of Disease Spatio-temporal and Network Modeling of Diseases Methodologies for Comparing Vaccination Strategies

WG’s on Mathematical Sciences Methodologies Mathematical Models and Defense Against Bioterrorism Predictive Methodologies for Infectious Diseases Statistical, Mathematical, and Modeling Issues in the Analysis of Marine Diseases

Data Mining and Epidemiology –Interest sparked in part by availability of large and disparate computerized databases on subjects relating to disease

Early warning is critical in public health This is a crucial factor underlying government’s plans to place networks of sensors/detectors to warn of a bioterrorist attack Sensors will be a source of huge amounts of data The BASIS System

The DIMACS Bioterrorism Sensor Location Project

Data Mining and Epidemiology: Some Research Issues:

1. Streaming Data Analysis: When you only have one shot at the data Widely used to detect trends and sound alarms in applications in telecommunications and finance AT&T uses this to detect fraudulent use of credit cards or impending billing defaults Columbia has developed methods for detecting fraudulent behavior in financial systems Uses algorithms based in TCS Needs modification to apply to disease detection

Research Issues: Modify methods of data collection, transmission, processing, and visualization Explore use of decision trees, vector-space methods, Bayesian and neural nets How are the results of monitoring systems best reported and visualized? To what extent can they incur fast and safe automated responses? How are relevant queries best expressed, giving the user sufficient power while implicitly restraining him/her from incurring unwanted computational overhead?

2. Cluster Analysis Used to extract patterns from complex data Application of traditional clustering algorithms hindered by extreme heterogeneity of the data Newer clustering methods based on TCS for clustering heterogeneous data need to be modified for infectious disease and bioterrorist applications.

3. Visualization Large data sets are sometimes best understood by visualizing them.

3. Visualization (continued) Sheer data sizes require new visualization regimes, which require suitable external memory data structures to reorganize tabular data to facilitate access, usage, and analysis. Visualization algorithms become harder when data arises from various sources and each source contains only partial information.

4. Data Cleaning Disease detection problem: Very “dirty” data:

4. Data Cleaning (continued) Very “dirty” data due to –manual entry –lack of uniform standards for content and formats –data duplication –measurement errors TCS-based methods of data cleaning –duplicate removal –“merge purge” –automated detection

5. Dealing with “Natural Language” Reports Devise effective methods for translating natural language input into formats suitable for analysis. Develop computationally efficient methods to provide automated responses consisting of follow- up questions. Develop semi-automatic systems to generate queries based on dynamically changing data.

6. Cryptography and Security Devise effective methods for protecting privacy of individuals about whom data is provided to biosurveillance teams -- data from emergency dept. visits, doctor visits, prescriptions Develop ways to share information between databases of intelligence agencies while protecting privacy?

6. Cryptography and Security (continued) Specifically: How can we make a simultaneous query to two datasets without compromising information in those data sets? (E.g., is individual xx included in both sets?) Issues include: –insuring accuracy and reliability of responses –authentication of queries –policies for access control and authorization

7. Spatio-Temporal Mining of Sensor Data Sensors provide observations of the state of the world localized in space and time. Finding trends in data from individual sensors: time series data mining. Detecting general correlations in multiple time series of observations. This has been studied in statistics, database theory, knowledge discovery, data mining. Complications: proximity relationships based on geography; complex chronological effects.