Download presentation
Presentation is loading. Please wait.
1
Data Mining and Virtual Observatory
Yanxia Zhang National Astronomical Observatories,CAS DEC
2
Outline Why What How
3
Astronomy is Facing a Major Data Avalanche
Multi-Terabyte Sky Surveys and Archives (Soon: Multi-Petabyte), Billions of Detected Sources, Hundreds of Measured Attributes per Source … Astronomy is Facing a Major Data Avalanche
4
DM VO Understanding of Complex Astrophysical Phenomena
Necessity Is the Mother of Invention Understanding of Complex Astrophysical Phenomena Requires Complex and Information-Rich Data Sets, and the Tools to Explore them … … This Will Lead to a Change in the nature of the Astronomical Discovery Process … DM … Which Requires A New Research Environment for Astronomy: VO VO
5
DM: Confluence of Multiple Disciplines
Database system, Data warehouse, OLAP statistics ML&AI DM Visualization Information science Other disciplines
6
What is DM? The search for interesting patterns, in large databases,
that were collected for other applications, using machine learning algorithms, high-performance computers and others methods for science and society!
7
Data Mining: A KDD Process
Knowledge Data mining: the core of knowledge discovery process. Pattern Evaluation Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases
8
Data Mining Kwonledge Discovery Data Presentation
Increasing potential to support decisions End User Kwonledge Discovery Data Presentation scientist Analyst Visualization Techniques Data Mining Data Analyst Information Discovery Data Exploration OLAP, MDA, Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts DBA Data Sources (Paper, Files, Information Providers, Database Systems, OLTP)
9
Architecture: Typical Data Mining System
Graphical user interface Pattern evaluation Data mining engine Knowledge-base Database or data warehouse server Filtering Data cleaning & data integration Data Warehouse Databases
10
The ratio of every DM step
Decide target Data preparing Data mining Evaluation
11
DM: On What Kind of Data? Relational databases Data warehouses
Transactional databases Advanced DB systems and information repositories Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases and multimedia databases Heterogeneous and legacy databases WWW
12
Data Mining Functionality
Concept description Association Classification and Prediction Clustering Time-series analysis Other pattern-directed or statistical analysis
13
Taking a Broader View: The Observable Parameter Space
Flux Non-EM … Morphology / Surf.Br. Time Wavelength Polarization Proper motion What is the coverage? Where are the gaps? Where do we go next? Dec RA Along each axis the measurements are characterized by the position, extent, sampling and resolution. All astronomical measurements span some volume in this parameter space.
14
How and Where are Discoveries Made?
Conceptual Discoveries: e.g., Relativity, QM, Brane World, Inflation … Theoretical, may be inspired by observations Phenomenological Discoveries: e.g., Dark Matter, QSOs, GRBs, CMBR, Extrasolar Planets, Obscured Universe … Empirical, inspire theories, can be motivated by them New Technical Capabilities Observational Discoveries Theory IT/VO (VO) Phenomenological Discoveries: Pushing along some parameter space axis VO useful Making new connections (e.g., multi-) VO critical! Understanding of complex astrophysical phenomena requires complex, information-rich data (and simulations?)
15
Exploration of observable parameter spaces and searches for rare or new types of objects
16
But Sometimes You Find a Surprise…
17
Precision Cosmology and LSS Better matching of theory and observations
Clustering on a clustered background Clustering with a nontrivial topology DPOSS Clusters (Gal et al.) LSS Numerical Simulation (VIRGO)
18
Exploration of the Time Domain: Optical Transients
DPOSS A Possible Example of an “Orphan Afterglow” (GRB?) discovered in DPOSS: an 18th mag transient associated with a 24.5 mag galaxy. At an estimated z ~ 1, the observed brightness is ~ 100 times that of a SN at the peak. Or, is it something else, new? Keck
19
Exploration of the Time Domain: Faint, Fast Transients (Tyson et al.)
20
Exploring the Low Surface Brightness (Low Contrast) Universe
Comparison between HI, Ha, and 100m Diffuse Emission DPOSS red image IRAS 100 Micron Image Brunner et al.
21
Background Enhancement Technique demonstrated
on two known M31 dwarf spheroidals (Brunner et al.)
22
Data Mining in the Image Domain: Can We Discover New
Types of Phenomena Using Automated Pattern Recognition? (Every object detection algorithm has its biases and limitations)
23
An OLAM Architecture OLAM Engine OLAP Engine MDDB Mining query
Mining result Layer4 User Interface User GUI API OLAM Engine OLAP Engine Layer3 OLAP/OLAM Data Cube API Layer2 MDDB MDDB Meta Data Database API Filtering&Integration Filtering Layer1 Data Repository Data cleaning Data Warehouse Databases Data integration
24
View of Warehouses and Hierarchies
Importing data Table Browsing Dimension creation Dimension browsing Cube building Cube browsing
25
Selecting a Data Mining Task
Major data mining functions: Summary (Characterization) Association Classification Prediction Clustering Time-Series Analysis
26
Mining Characteristic Rules
Characterization: Data generalization/summarization at high abstraction levels. An example query: Find a characteristic rule for Cities from the database ‘CITYDATA' in relevance to location, capita_income, and the distribution of count% and amount%.
27
Browsing a Data Cube Powerful visualization OLAP capabilities
Interactive manipulation
28
Visualization of Data Dispersion: Boxplot Analysis
29
Mining Association Rules ( Table Form )
30
Association Rule in Plane Form
31
Association Rule Graph
32
Mining Classification Rules
33
Prediction: Numerical Data
34
Prediction: Categorical Data
35
Graphic User Interface Database and Cube Server
DMiner: Architecture Graphic User Interface Characterizer Cluster Analyzer Comparator Associator Classifier Future Modules Future Modules Database and Cube Server Radio DB Infrared DB Optical DB ……. DB
36
MultiMediaMiner A System Prototype for MultiMedia Data Mining
Simon Fraser University WWW Image features Internet Domain Hierarchy Pre-built Concept Hierarchies for colour, texture, format, etc. Keywords Metadata WordNet Pattern discoveries Keyword Hierarchy Data Cubes and Numeric Hierarchies Pre-processing Real-time Interaction
37
MultiMediaMiner Simon Fraser University
38
WebLogMiner Architecture
Web log is filtered to generate a relational database A data cube is generated form database OLAP is used to drill-down and roll-up in the cube OLAM is used for mining interesting knowledge Database Knowledge Web log Data Cube Sliced and diced cube 2 Data Cube Creation 1 Data Cleaning 3 OLAP 4 Data Mining
39
VO: Conceptual Architecture
User Discovery tools Analysis tools Gateway Data Archives
40
Conclusion The next golden age of discovery in astronomy come eariler!
◆ Development and application of DM in astronomy; ◆ Automated DM, visulized DM and audio DM; ◆ Integrate VO and DM. The next golden age of discovery in astronomy come eariler!
41
Q&A? Thank you !!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.