KDD for Science Data Analysis Issues and Examples.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Advertisements

Seismo-Surfer a tool for collecting, querying, and mining seismic data Yannis Theodoridis University of Piraeus
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
A Paradigm for Space Science Informatics Kirk D. Borne George Mason University and QSS Group Inc., NASA-Goddard or
Week 9 Data Mining System (Knowledge Data Discovery)
Recognition Of Textual Signs Final Project for “Probabilistic Graphics Models” Submitted by: Ezra Hoch, Golan Pundak, Yonatan Amit.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Presented by Zeehasham Rasheed
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Scalable Text Mining with Sparse Generative Models
Data Mining – Intro.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Introduction to Machine Learning Approach Lecture 5.
19 April, 2017 Knowledge and image processing algorithms for real-life applications. Dr. Maria Athelogou Principal Scientist & Scientific Liaison Manager.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Artificial Intelligence (AI) Addition to the lecture 11.
Data Mining Techniques
Quakefinder : A Scalable Data Mining System for detecting Earthquakes from Space A paper by Paul Stolorz and Christopher Dean Presented by, Naresh Baliga.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Bala Lakshminarayanan AUTOMATIC TARGET RECOGNITION April 1, 2004.
2 Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion Objectives, Prerequisite and.
1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood.
1 SHIM 413 Database Applications for Healthcare Fall 2006 Slides by H. T. Bao.
Perception-Based Classification (PBC) System Salvador Ledezma April 25, 2002.
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
1 Peter Fox Data Science – ITEC/CSCI/ERTH Week 6, October 5, 2010 Introduction to Data Mining.
Machine Learning An Introduction. What is Learning?  Herbert Simon: “Learning is any process by which a system improves performance from experience.”
CSCE 5013 Computer Vision Fall 2011 Prof. John Gauch
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
AI in Space Exploration Stephen Dabideen Yizenia Mora.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Automated Detection and Classification Models SAR Automatic Target Recognition Proposal J.Bell, Y. Petillot.
Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Data Mining and Decision Support
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
Brief Intro to Machine Learning CS539
Design and Use of Earth Observation Image Content Tools Mihai Datcu(1, 2), Daniele Cerra(1), Houda Chaabouni-Chouayakh(1), Amaia de Miguel(1), Daniela.
Data Mining – Intro.
Machine Learning for Computer Security
What Is Cluster Analysis?
Introduction Characteristics Advantages Limitations
School of Computer Science & Engineering
What is Pattern Recognition?
Data Warehousing and Data Mining
Data Mining: Introduction
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Presentation transcript:

KDD for Science Data Analysis Issues and Examples

Contents Introduction Data Considerations Brief Case Studies Sky Survey Cataloging Finding Volcanoes on Venus Biosequence Databases Earth Geophysics Atmospheric Science Issues and Challenges Conclusion

Data Considerations Image Data Time-series and sequence data Numerical Vs Categorical values Structured and sparse data Reliability of Data

Brief Case Studies  Sky Survey Cataloging  Finding Volcanoes on Venus  Earth Geophysics  Atmospheric Science  Biosequence Databases

Sky Survey Cataloging The survey consists of 3 terabytes of image data containing an estimated 2 billion sky objects The basic problem is to generate a survey catalog which records the attributes of each object along with its class: star or galaxy To achieve this scientists developed the SKICAT system

Reasons why SKICAT was successful  The astronomers solved the feature extraction problem  Data mining methods contributed to solving difficult classification problems  Manual approaches were simply not feasible. Astronomers needed an automated classifier to make the most out of the data  Decision tree methods proved to be an effective tool for finding the important dimensions for this problem

Finding Volcanoes on Venus  Data collected by Magellan spacecraft  The first pass of Venus using the left looking radar resulted in 30, x 1000 pixel images  To help geologists analyze this data set, the JPL Adaptive Recognition Tool (JARtool) was developed

Motivation for using Data mining methods  Scientists did not know much about image processing or about the SAR properties. Hence they could easily label images but not design recognizers  There was little variation in illumination and orientation of objects of interest. Hence mapping from pixel space to feature space can be performed automatically  Geologists did not have any other easy means for finding the small volcanoes, hence they were motivated to cooperate by providing training data and other help

Earth Geophysics Two images taken before and after an earthquake and by repeatedly registering different local regions of the two images, it is possible to infer the direction and magnitude of ground motion due to the earthquake. Example of a geoscientific data mining system is Quakefinder which automatically detects and measures tectonic activity in the earths crust by examination of Satellite data

Atmospheric Science Data mining tool used is called CONQUEST Parallel testbeds were employed by Conquest to enable rapid extraction of spatio-temporal features for content based access. Some of the goals of the this tool is the development of “learning” algorithms which look for novel patterns, event clusters etc.

Retrieved Sea Level Pressure Fields

Biosequence Databases  The largest DNA database is GENBANK with a database of about 400 million letters of DNA from a variety of organisms  The pressing data mining tasks for biosequence are Find genes in the DNA sequences of various organisms. Some of the gene finding programs such as GRAIL, GeneID, GeneParser, Genie use neural nets and other AI or statistical methods

Issues and Challenges Feature Extraction Minority Classes High degree of Confidence Data mining task Relevant domain Knowledge Scalable machines and Algorithms

Conclusions KDD applications in science may in general be easier than applications in business, finance, or other areas. This is due to the fact that science end users typically know the data in intimate detail.