Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mini-workshop: e-Science & Data Mining. e-Science & Data Mining Special Interest Group Bob Mann Institute for Astronomy & NeSC University of Edinburgh.

Similar presentations


Presentation on theme: "Mini-workshop: e-Science & Data Mining. e-Science & Data Mining Special Interest Group Bob Mann Institute for Astronomy & NeSC University of Edinburgh."— Presentation transcript:

1 Mini-workshop: e-Science & Data Mining

2 e-Science & Data Mining Special Interest Group Bob Mann Institute for Astronomy & NeSC University of Edinburgh

3 Outline Motivation for the SIG Motivation for the SIG Goals of the SIG Goals of the SIG Activities to date Activities to date Next steps Next steps

4 Motivation for the SIG Many disciplines have a data deluge Many disciplines have a data deluge Data integration major part of e-science Data integration major part of e-science What to do with data once integrated? What to do with data once integrated? –Some standard analyses won’t scale –Some new science made possible Look to data mining as a way of separating wheat from chaff Look to data mining as a way of separating wheat from chaff

5 “Scientific Data Mining, Integration & Visualization” NeSC, October 2002 NeSC, October 2002 www.nesc.ac.uk/talks/sdmiv/report.pdf www.nesc.ac.uk/talks/sdmiv/report.pdf Participants from wide range of domains Participants from wide range of domains – –astronomy, atmospheric science, bioinformatics, chemistry, digital libraries, engineering, environmental science, experimental physics, marine sciences, oceanography, and statistics…plus CS researchers and software engineers But a common set of problems But a common set of problems

6 Problems from SDMIV Lots of DM packages, lots of data formats Lots of DM packages, lots of data formats How to mine distributed data sources? How to mine distributed data sources? How to mine large data volumes? How to mine large data volumes? –and high-dimensional datasets How to do data exploration? How to do data exploration? –Coupling data mining and visualization How to work iteratively & interactively? How to work iteratively & interactively? –Tracking provenance, building workflows…

7 Goals of the SIG Forum for e-science data miners Forum for e-science data miners –Application scientists, algorithm writers, infrastructure developers Identify requirements on infrastructure and algorithms from science drivers Identify requirements on infrastructure and algorithms from science drivers –Are there generic problems to solve? Can there be an OGSA-DAI for data mining? Can there be an OGSA-DAI for data mining? Stimulate/foster R&D where needed Stimulate/foster R&D where needed

8 SIG activities to date Set up steering group Set up steering group –Niall Adams (ICTSM), Jim Austin (York), Lisa Blanshard (CCLRC), Ken Brodlie (Leeds), Yike Guo (ICSTM), Bob Mann (Edinburgh), Bob Nichol (Portsmouth), Adrian Shepherd (Birkbeck), Amos Storkey (Edinburgh) Review of data mining in UK e-Science Review of data mining in UK e-Science –Who’s doing (wants to do) what, how & why –Started with initial questionnaire

9 Questionnaire www.nesc.ac.uk/resources/sig/esdm-sig www.nesc.ac.uk/resources/sig/esdm-sig Responses by 30 September Responses by 30 September Initial responses indicate wide ranges of: Initial responses indicate wide ranges of: –Data types: text, numerical, images –Storage: DBMS, files – XML, bespoke ASCII –Location: local, distributed, warehoused –Analysis: clustering, decision trees, etc etc –Disciplines: both academic & commercial –Infrastructure: Grid/web services, standalone

10 Next Steps Follow up questionnaire responses Follow up questionnaire responses –Develop detailed case studies Produce review of e-science DM activities Produce review of e-science DM activities Two-day Workshop in week Nov 29–Dec 3 Two-day Workshop in week Nov 29–Dec 3 –[Visit from Andrew Moore (CMU)] –Debate issues arising from review –Develop research agenda –Plan SIG activities –Details on NeSC WWW site very soon

11 Summary e-Science Data Mining SIG launched e-Science Data Mining SIG launched Initial requirements questionnaire Initial requirements questionnaire –www.nesc.ac.uk/resources/sig/esdm-sig –Responses by 30 September Two-day NeSC Workshop w/c Nov 29 Two-day NeSC Workshop w/c Nov 29 Questions: Questions: –Bob Mann (rgm@roe.ac.uk)

12 Mini-workshop programme Mark Jessop (York): Pattern matching against distributed databases Mark Jessop (York): Pattern matching against distributed databases Yike Guo (IC): Why Grid-based data mining matters? Yike Guo (IC): Why Grid-based data mining matters? Olusola Idowu (Newcastle): e-Science tools for analysing complex systems Olusola Idowu (Newcastle): e-Science tools for analysing complex systems Angela O’Brien (IC): Mapping of scientific workflow within the e-Protein project to distributed resources Angela O’Brien (IC): Mapping of scientific workflow within the e-Protein project to distributed resources Peter Li (Newcastle): Association of variations in Peter Li (Newcastle): Association of variations in I kappa B-epsilon with Graves’ disease using classical and myGrid methodologies


Download ppt "Mini-workshop: e-Science & Data Mining. e-Science & Data Mining Special Interest Group Bob Mann Institute for Astronomy & NeSC University of Edinburgh."

Similar presentations


Ads by Google