24/25 October 2002 SDMIV workshop – Julian Gallop1 Potential applications in CLRC/RAL collaborations Julian Gallop October 2002.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Particle physics – the computing challenge CERN Large Hadron Collider –2007 –the worlds most powerful particle accelerator –10 petabytes (10 million billion.
Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
Animal, Plant & Soil Science
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Climate modeling Current state of climate knowledge – What does the historical data (temperature, CO 2, etc) tell us – What are trends in the current observational.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining – Intro.
Data mining By Aung Oo.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
October 4, 2012 Kim Lott Utah State University
Page 1GMES - ENSEMBLES 2008 ENSEMBLES. Page 2GMES - ENSEMBLES 2008 The ENSEMBLES Project  Began 4 years ago, will end in December 2009  Supported by.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
The Climate Prediction Project Global Climate Information for Regional Adaptation and Decision-Making in the 21 st Century.
OUCE Oxford University Centre for the Environment “Applying probabilistic climate change information to strategic resource assessment and planning” Funded.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Challenges in Urban Meteorology: A Forum for Users and Providers OFCM Workshop Summaries Lt Col Rob Rizza Assistant Federal Coordinator for USAF/USA Affairs.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
CSED Computational Science & Engineering Department CHEMICAL DATABASE SERVICE The Current Service is Well Regarded The CDS has a long and distinguished.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Inter-comparison and Validation Task Team Breakout discussion.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Downscaling and its limitation on climate change impact assessments Sepo Hachigonta University of Cape Town South Africa “Building Food Security in the.
® Kick off meeting. February 17th, 2011 QUAlity aware VIsualisation for the Global Earth Observation system of systems GEOVIQUA workshop February, the.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
Semantically-Enabled Science Data Integration (SESDI) and The Virtual Solar-Terrestrial Observatory (VSTO) Semantically-enabled (large-scale) Scientific.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Modern Era Retrospective-analysis for Research and Applications: Introduction to NASA’s Modern Era Retrospective-analysis for Research and Applications:
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
…building the next IT revolution From Web to Grid…
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Page 1 Validation Workshop, 9-13 th December 2002, ESRIN ENVISAT Validation Workshop AATSR Report Marianne Edwards Space Research Centre Department of.
WP9/JRA3: ESM Evaluation: developing an Infrastructure Participants: AA, CNRS-IPSL, DLR, FMI, SMHI, MPG OBJECTIVES Create an interdisciplinary infrastructure.
Climate Modeling Research & Applications in Wales John Houghton C 3 W conference, Aberystwyth 26 April 2011.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Welcome to the PRECIS training workshop
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
1 Alison Pamment, 2 Calum Byrom, 1 Bryan Lawrence, 3 Roy Lowry 1 NCAS/BADC,Science and Technology Facilities Council, 2 Tessella plc, 3 British Oceanogrphic.
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
Reading e-Science Centre Technical Director Jon Blower ESSC Director Rachel Harrison CS Director Keith Haines ESSC Associated Personnel External Collaborations.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
RAL, 2012, May 11 Research behaviour Martin Juckes, 11 May, 2012.
1. Analysis and Reanalysis Products
By Arijit Chatterjee Dr
making certain the uncertainties
CSE591: Data Mining by H. Liu
Data Mining: Introduction
Statistical Data Analysis
Data Warehousing Data Mining Privacy
Technologies, Tools, Methods for capacity building: UK
School of Information Studies, Syracuse University, Syracuse, NY, USA
CSE591: Data Mining by H. Liu
Presentation transcript:

24/25 October 2002 SDMIV workshop – Julian Gallop1 Potential applications in CLRC/RAL collaborations Julian Gallop October 2002

24/25 October 2002 SDMIV workshop – Julian Gallop2 commercial / scientific Data mining well known in commercial applications –should the own brand cornflakes be located next to the beer Less well known in scientific applications Among scientists, it’s common to find –“not sure that what I need is data mining, but instead ….” Perhaps data mining is regarded too narrowly

24/25 October 2002 SDMIV workshop – Julian Gallop3 Definitions an early (1991) definition of Knowledge Discovery in databases (KDD) was given as: –"the non-trivial extraction of implicit, previously unknown, and potential useful information from data" (Frawley et. al. 1991). this was subsequently (1996) revised to: –"the non-trivial process of identifying valid, potentially useful and ultimately understandable patterns in data" (Fayyad et al 1996). data mining is one step in the KDD process - concerned with applying computational techniques to find patterns in data

24/25 October 2002 SDMIV workshop – Julian Gallop4 CLRC scientific fields and collaborations Sciences: space, earth observation, particle physics, microstructures, synchrotron radiation... Holds (or provides access to) significant data collections Partnerships between E-science centre, BITD, computational science and science departments E-science projects include: –Ones that are mainly CLRC (e.g. Data Portal) –UK e-science collaborations (e.g. Astrogrid, NERC Data Grid, gViz) –EU collaborations (e.g. DataGrid) –And also the UK Grid Support Centre

24/25 October 2002 SDMIV workshop – Julian Gallop5 Sample CLRC e-science project – Data Portal Data Portal project – pilot project within CLRC: –To enable a scientist to discover, explore and retrieve disparate datasets through one interface, independent of the data location. –CLRC sciences - space science, synchrotron science and neutron science - as well as e-science and IT. –Part of the work is the development of a scientific metadata model

24/25 October 2002 SDMIV workshop – Julian Gallop6 Sample e-science projects involving CLRC Astrogrid (UK) –Building a virtual observatory –Ideas on data mining: Finding: association rules; deviations from a rule; similarity; clustering and classification Datagrid (EU): aims to enable next generation scientific exploration which requires intensive computation and analysis of shared large- scale databases, millions of Gigabytes, across widely distributed scientific communities. –Applications are: biomedical, earth observation, particle physics NERC Data Grid (UK)

24/25 October 2002 SDMIV workshop – Julian Gallop7 NERC Data Grid Funded by NERC & UK e-science core programme Involves: –CLRC (RAL & DL – including British Atmospheric Data Centre) –Program for Climate Model Data Intercomparison (PCMDI) (U.S. Lawrence-Livermore National Lab) Relevant to: –energy; water management; food chain; health; weather risk

24/25 October 2002 SDMIV workshop – Julian Gallop8 NERC Data Grid – relevance to knowledge discovery Aims to address problem that –at present searching metadata to discover and retrieve what you want is a manual process –Datasets in multiple locations involve multiple logins and retrieval in multiple formats indicators of success: –that it will be possible to find, reformat and visualize disparate datasets from disparate organisations within one organisation –Ability to test data and comparison ideas without learning foreign formats and establishing personal relationships every time Clearly will provide a basis for knowledge discovery if successful

24/25 October 2002 SDMIV workshop – Julian Gallop9 Earth observation instruments For example ENVISAT Instrument AATSR Low orbit, 14/day Returns to same place every 3 days Picture shows plume from Mt Etna in 2001 (previous instrument ATSR2) NASA AQUA TBs/day

24/25 October 2002 SDMIV workshop – Julian Gallop10 Earth observation patterns For particular location, what patterns emerge on: –A daily basis –Or a yearly basis Knowing the conventional pattern day by day, can observe out of the ordinary events e.g. an oil slick

24/25 October 2002 SDMIV workshop – Julian Gallop11 climateprediction.net Makes use of spare compute capacity on office and home PC’s to run a climate prediction model Different PC’s run different parameters and collectively run a Monte Carlo simulation Results will be studied to find out which subsets of the parameter space correspond to observation Better understanding of uncertainties Public understanding of climate change Oxford U, CLRC RAL, Reading U, with Met Office and OU

24/25 October 2002 SDMIV workshop – Julian Gallop12 Data in climateprediction.net base –Latitude96 –Longitude72 –Levels19 –Timesteps calculated every 30mins / 1hr and output for every day over a period of 50 years registered in advance of launch variables –Horizontal velocity –Temperature –Surface pressure –Water vapour (atmosphere) –Salinity (ocean) Possible others, such as ocean carbon content and atmospheric ozone and sulphates

24/25 October 2002 SDMIV workshop – Julian Gallop13 parameters in climateprediction.net Physics parameters that may be varied between one run and another: –Representation of cloud variability –Rate at which water droplets collide and cohere –# of nucleation particles for coloud droplet formation –Light scattering in the atmosphere –Cloud convection –Surface processes such as rate of transpiration by plants Also, runs will be duplicated to detect tampering

24/25 October 2002 SDMIV workshop – Julian Gallop14 Data distribution in climateprediction.net Results dataset will be distributed at several (possibly 20) climate modelling institutions A subset of data is returned from a PC to a data server. Remainder is therefore kept on the (home or office) PC and available – if the owner so chooses. Program attempting to data mine needs to be isolated from these details, by appropriate portal, metadata and/or catalogue

24/25 October 2002 SDMIV workshop – Julian Gallop15 Climateprediction.net questions Some questions that need to be askable –What features of the response are robust as we change the physics? –What kind of changes have similar effects to each other? –What models that are consistent with current observations give changes in extreme events in the future Unclear whether this is data mining in strict sense, but certainly multivariate statistical techniques

24/25 October 2002 SDMIV workshop – Julian Gallop16 Summing up NERC Data Grid project, for example, exposes current difficulties of doing data mining on large scientific datasets –In commercial situation, data is warehoused under single operational control –In science, access is needed to different datasets which are under different managements –Multiple logins, multiple metadata systems Current e-science projects are providing a mechanism, which future data mining could use Applications include: earth observation; particle physics; astronomy; biology;.....