Mini-workshop: e-Science & Data Mining. e-Science & Data Mining Special Interest Group Bob Mann Institute for Astronomy & NeSC University of Edinburgh.

Slides:



Advertisements
Similar presentations
Closing the Gap Between Global Environmental Sensing Needs and Cyber Infrastructure Tools Jim Gray Jeff Burch Mark Ellisman Miron Livny David Maidment.
Advertisements

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Long-term Digital Metadata Curation Arif Shaon University of Reading 16 April 2014.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service King’s College London.
Peter Clarke UK National e-Science Centre University of Edinburgh e-Infrastructure in the UK.
Geographical Information Systems and Science Longley P A, Goodchild M F, Maguire D J, Rhind D W (2001) John Wiley and Sons Ltd 1. Systems, Science and.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
Dr Matthew Stiff CEH Director Environmental Informatics Presentation to CRM SIG NeSC Edinburgh 12 July 2007 The Environmental Informatics Programme.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Eesti Maaülikool 1 Application of soil information and agro- economic models for spatial land use optimization: a case study in Estonia ALAR ASTOVER, PhD.
The aims of SC4DEVO and SC4DEVO-1 Bob Mann Institute for Astronomy and National e-Science Centre, University of Edinburgh.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
FREMA Lester Gilbert Dave Millard Yvonne Howard An ELF Reference Model Project In the JISC Distributed e-Learning Programme e-Learning Framework Reference.
BinX and Astronomy Bob Mann Institute for Astronomy and National e-Science Centre.
Requirements from astronomy in the Virtual Observatory era Bob Mann Institute for Astronomy & NeSC University of Edinburgh.
Department of Computer Science, University of California, Irvine Site Visit for UC Irvine KD-D Project, April 21 st 2004 The Java Universal Network/Graph.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
INFSO-SSA International Collaboration to Extend and Advance Grid Education ICEAGE Forum Meeting at EGEE Conference, Geneva Malcolm Atkinson & David.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Chapter 1 Introduction to Data Mining
1. Systems, Science, and Study. Outline What is geographic information? Definition of data, information, knowledge and wisdom Kinds of decisions that.
DAME: Distributed Engine Health Monitoring on the Grid
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data.
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
E-science in the Netherlands Maria Heijne TU Delft Library Director / Chair Consortium of University Libraries and National Library.
4 th Annual EPSRC e-science meeting The need for e-Science An industrial perspective Stephen Calvert – VP Cheminformatics GSKYike Guo – Imperial College.
GGF-16 Athens Production Grid Computing in the UK Neil Geddes CCLRC Director, e-Science.
قسم الجيوماتكس Geomatics Department King AbdulAziz University Faculty of Environmental Design GIS Components GIS Fundamentals GEOM 121 Reda Yaagoubi, Ph.D.
OGC/Grid activities in UK Chris Higgins (EDINA), Phil James (Uni of Newcastle), Andrew Woolf (CCLRC)
The DAME project Professor Jim Austin University of York.
E-Science: The view from the social sciences Jenny Fry and Ralph Schroeder Oxford Internet Institute Oxford e-Social Science Project.
Making Geological Map Data for the Earth Accessible OneGeology: assisting Geological Surveys worldwide to interoperate seamlessly on the Next Generation.
Workflow in Grid Systems Workshop Dave Berry, Research Manager UK National e-Science Centre GGF10, Mar 2004.
Data and the UK e-Science Programme Paul Watson Director North-East Regional e-Science Centre School of Computing Science University of.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific.
Sky Survey Database Design National e-Science Centre Edinburgh 8 April 2003.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Edinburgh e-Science MSc Bob Mann Institute for Astronomy & NeSC University of Edinburgh.
Web Technologies for Bioinformatics Ken Baclawski.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
National e-Science Institute and National e-Science Centre The Way Ahead Prof. Malcolm Atkinson Director 30 th September 2003.
An Introduction to UK e-Science Anne E Trefethen Deputy Director UK e-Science Core Programme.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Books Visualizing Data by Ben Fry Data Structures and Problem Solving Using C++, 2 nd edition by Mark Allen Weiss MATLAB for Engineers, 3 rd edition by.
Annotation of “special structures” in astronomy Bob Mann Institute for Astronomy and National e-Science Centre University of Edinburgh.
E-Science and Industry Chair: Ian Osborne Grid Computing Now! and Intellect.
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
A Collaborative e-Science Architecture towards a Virtual Research Environment Tran Vu Pham 1, Dr. Lydia MS Lau 1, Prof. Peter M Dew 2 & Prof. Michael J.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
J. Douglas Armstrong Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh. Bioinformatics at Edinburgh.
An Approach to Software Preservation
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Scientific Computing Department
Welcome to National e-Science Centre Official Opening
Data Warehousing and Data Mining
Data Science with Python
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Bird of Feather Session
Presentation transcript:

Mini-workshop: e-Science & Data Mining

e-Science & Data Mining Special Interest Group Bob Mann Institute for Astronomy & NeSC University of Edinburgh

Outline Motivation for the SIG Motivation for the SIG Goals of the SIG Goals of the SIG Activities to date Activities to date Next steps Next steps

Motivation for the SIG Many disciplines have a data deluge Many disciplines have a data deluge Data integration major part of e-science Data integration major part of e-science What to do with data once integrated? What to do with data once integrated? –Some standard analyses won’t scale –Some new science made possible Look to data mining as a way of separating wheat from chaff Look to data mining as a way of separating wheat from chaff

“Scientific Data Mining, Integration & Visualization” NeSC, October 2002 NeSC, October Participants from wide range of domains Participants from wide range of domains – –astronomy, atmospheric science, bioinformatics, chemistry, digital libraries, engineering, environmental science, experimental physics, marine sciences, oceanography, and statistics…plus CS researchers and software engineers But a common set of problems But a common set of problems

Problems from SDMIV Lots of DM packages, lots of data formats Lots of DM packages, lots of data formats How to mine distributed data sources? How to mine distributed data sources? How to mine large data volumes? How to mine large data volumes? –and high-dimensional datasets How to do data exploration? How to do data exploration? –Coupling data mining and visualization How to work iteratively & interactively? How to work iteratively & interactively? –Tracking provenance, building workflows…

Goals of the SIG Forum for e-science data miners Forum for e-science data miners –Application scientists, algorithm writers, infrastructure developers Identify requirements on infrastructure and algorithms from science drivers Identify requirements on infrastructure and algorithms from science drivers –Are there generic problems to solve? Can there be an OGSA-DAI for data mining? Can there be an OGSA-DAI for data mining? Stimulate/foster R&D where needed Stimulate/foster R&D where needed

SIG activities to date Set up steering group Set up steering group –Niall Adams (ICTSM), Jim Austin (York), Lisa Blanshard (CCLRC), Ken Brodlie (Leeds), Yike Guo (ICSTM), Bob Mann (Edinburgh), Bob Nichol (Portsmouth), Adrian Shepherd (Birkbeck), Amos Storkey (Edinburgh) Review of data mining in UK e-Science Review of data mining in UK e-Science –Who’s doing (wants to do) what, how & why –Started with initial questionnaire

Questionnaire Responses by 30 September Responses by 30 September Initial responses indicate wide ranges of: Initial responses indicate wide ranges of: –Data types: text, numerical, images –Storage: DBMS, files – XML, bespoke ASCII –Location: local, distributed, warehoused –Analysis: clustering, decision trees, etc etc –Disciplines: both academic & commercial –Infrastructure: Grid/web services, standalone

Next Steps Follow up questionnaire responses Follow up questionnaire responses –Develop detailed case studies Produce review of e-science DM activities Produce review of e-science DM activities Two-day Workshop in week Nov 29–Dec 3 Two-day Workshop in week Nov 29–Dec 3 –[Visit from Andrew Moore (CMU)] –Debate issues arising from review –Develop research agenda –Plan SIG activities –Details on NeSC WWW site very soon

Summary e-Science Data Mining SIG launched e-Science Data Mining SIG launched Initial requirements questionnaire Initial requirements questionnaire – –Responses by 30 September Two-day NeSC Workshop w/c Nov 29 Two-day NeSC Workshop w/c Nov 29 Questions: Questions: –Bob Mann

Mini-workshop programme Mark Jessop (York): Pattern matching against distributed databases Mark Jessop (York): Pattern matching against distributed databases Yike Guo (IC): Why Grid-based data mining matters? Yike Guo (IC): Why Grid-based data mining matters? Olusola Idowu (Newcastle): e-Science tools for analysing complex systems Olusola Idowu (Newcastle): e-Science tools for analysing complex systems Angela O’Brien (IC): Mapping of scientific workflow within the e-Protein project to distributed resources Angela O’Brien (IC): Mapping of scientific workflow within the e-Protein project to distributed resources Peter Li (Newcastle): Association of variations in Peter Li (Newcastle): Association of variations in I kappa B-epsilon with Graves’ disease using classical and myGrid methodologies