Download presentation
Presentation is loading. Please wait.
Published byShawn Manning Modified over 9 years ago
1
24/25 October 2002 SDMIV workshop – Julian Gallop1 Potential applications in CLRC/RAL collaborations Julian Gallop October 2002
2
24/25 October 2002 SDMIV workshop – Julian Gallop2 commercial / scientific Data mining well known in commercial applications –should the own brand cornflakes be located next to the beer Less well known in scientific applications Among scientists, it’s common to find –“not sure that what I need is data mining, but instead ….” Perhaps data mining is regarded too narrowly
3
24/25 October 2002 SDMIV workshop – Julian Gallop3 Definitions an early (1991) definition of Knowledge Discovery in databases (KDD) was given as: –"the non-trivial extraction of implicit, previously unknown, and potential useful information from data" (Frawley et. al. 1991). this was subsequently (1996) revised to: –"the non-trivial process of identifying valid, potentially useful and ultimately understandable patterns in data" (Fayyad et al 1996). data mining is one step in the KDD process - concerned with applying computational techniques to find patterns in data
4
24/25 October 2002 SDMIV workshop – Julian Gallop4 CLRC scientific fields and collaborations Sciences: space, earth observation, particle physics, microstructures, synchrotron radiation... Holds (or provides access to) significant data collections Partnerships between E-science centre, BITD, computational science and science departments E-science projects include: –Ones that are mainly CLRC (e.g. Data Portal) –UK e-science collaborations (e.g. Astrogrid, NERC Data Grid, gViz) –EU collaborations (e.g. DataGrid) –And also the UK Grid Support Centre
5
24/25 October 2002 SDMIV workshop – Julian Gallop5 Sample CLRC e-science project – Data Portal Data Portal project – pilot project within CLRC: –To enable a scientist to discover, explore and retrieve disparate datasets through one interface, independent of the data location. –CLRC sciences - space science, synchrotron science and neutron science - as well as e-science and IT. –Part of the work is the development of a scientific metadata model
6
24/25 October 2002 SDMIV workshop – Julian Gallop6 Sample e-science projects involving CLRC Astrogrid (UK) –Building a virtual observatory –Ideas on data mining: Finding: association rules; deviations from a rule; similarity; clustering and classification Datagrid (EU): aims to enable next generation scientific exploration which requires intensive computation and analysis of shared large- scale databases, millions of Gigabytes, across widely distributed scientific communities. –Applications are: biomedical, earth observation, particle physics NERC Data Grid (UK)
7
24/25 October 2002 SDMIV workshop – Julian Gallop7 NERC Data Grid Funded by NERC & UK e-science core programme Involves: –CLRC (RAL & DL – including British Atmospheric Data Centre) –Program for Climate Model Data Intercomparison (PCMDI) (U.S. Lawrence-Livermore National Lab) Relevant to: –energy; water management; food chain; health; weather risk
8
24/25 October 2002 SDMIV workshop – Julian Gallop8 NERC Data Grid – relevance to knowledge discovery Aims to address problem that –at present searching metadata to discover and retrieve what you want is a manual process –Datasets in multiple locations involve multiple logins and retrieval in multiple formats indicators of success: –that it will be possible to find, reformat and visualize disparate datasets from disparate organisations within one organisation –Ability to test data and comparison ideas without learning foreign formats and establishing personal relationships every time Clearly will provide a basis for knowledge discovery if successful
9
24/25 October 2002 SDMIV workshop – Julian Gallop9 Earth observation instruments For example ENVISAT Instrument AATSR Low orbit, 14/day Returns to same place every 3 days Picture shows plume from Mt Etna in 2001 (previous instrument ATSR2) NASA AQUA TBs/day
10
24/25 October 2002 SDMIV workshop – Julian Gallop10 Earth observation patterns For particular location, what patterns emerge on: –A daily basis –Or a yearly basis Knowing the conventional pattern day by day, can observe out of the ordinary events e.g. an oil slick
11
24/25 October 2002 SDMIV workshop – Julian Gallop11 climateprediction.net Makes use of spare compute capacity on office and home PC’s to run a climate prediction model Different PC’s run different parameters and collectively run a Monte Carlo simulation Results will be studied to find out which subsets of the parameter space correspond to observation Better understanding of uncertainties Public understanding of climate change Oxford U, CLRC RAL, Reading U, with Met Office and OU
12
24/25 October 2002 SDMIV workshop – Julian Gallop12 Data in climateprediction.net base –Latitude96 –Longitude72 –Levels19 –Timesteps calculated every 30mins / 1hr and output for every day over a period of 50 years 17000 registered in advance of launch variables –Horizontal velocity –Temperature –Surface pressure –Water vapour (atmosphere) –Salinity (ocean) Possible others, such as ocean carbon content and atmospheric ozone and sulphates
13
24/25 October 2002 SDMIV workshop – Julian Gallop13 parameters in climateprediction.net Physics parameters that may be varied between one run and another: –Representation of cloud variability –Rate at which water droplets collide and cohere –# of nucleation particles for coloud droplet formation –Light scattering in the atmosphere –Cloud convection –Surface processes such as rate of transpiration by plants Also, runs will be duplicated to detect tampering
14
24/25 October 2002 SDMIV workshop – Julian Gallop14 Data distribution in climateprediction.net Results dataset will be distributed at several (possibly 20) climate modelling institutions A subset of data is returned from a PC to a data server. Remainder is therefore kept on the (home or office) PC and available – if the owner so chooses. Program attempting to data mine needs to be isolated from these details, by appropriate portal, metadata and/or catalogue
15
24/25 October 2002 SDMIV workshop – Julian Gallop15 Climateprediction.net questions Some questions that need to be askable –What features of the response are robust as we change the physics? –What kind of changes have similar effects to each other? –What models that are consistent with current observations give changes in extreme events in the future Unclear whether this is data mining in strict sense, but certainly multivariate statistical techniques
16
24/25 October 2002 SDMIV workshop – Julian Gallop16 Summing up NERC Data Grid project, for example, exposes current difficulties of doing data mining on large scientific datasets –In commercial situation, data is warehoused under single operational control –In science, access is needed to different datasets which are under different managements –Multiple logins, multiple metadata systems Current e-science projects are providing a mechanism, which future data mining could use Applications include: earth observation; particle physics; astronomy; biology;.....
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.