Searching for KDD in MDS standards… …the DAME experience Marianna Annunziatella, Massimo Brescia, Stefano Cavuoti, Raffaele DAbrusco, George S. Djorgovski,

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

INDIANAUNIVERSITYINDIANAUNIVERSITY GENI Global Environment for Network Innovation James Williams Director – International Networking Director – Operational.
INAF experience in Grid projects C. Vuerli, G. Taffoni, V. Manna, A. Barisani, F. Pasian INAF – Trieste.
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
The Internet and the Web
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
© 2006 Open Grid Forum The Astro Community and DCIs in Europe and the role of Astro-CG C. Vuerli - INAF.
C. Grimme, A. Papaspyrou Scheduling in C3-Grid AstroGrid-D Workshop Project: C3-Grid Collaborative Climate Community Data and Processing Grid Scheduling.
CNRIS CNRIS 2.0 Challenges for a new generation of Research Information Systems.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Technical Review Group (TRG)Agenda 27/04/06 TRG Remit Membership Operation ICT Strategy ICT Roadmap.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Asper School of Business University of Manitoba Systems Analysis & Design Instructor: Bob Travica System architectures Updated: November 2014.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
SP2 Mikael Nystrom. Agenda Översikt Installation.
DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING Carlos de Alfonso Andrés García Vicente Hernández.
DYNAMICS CRM AS AN xRM DEVELOPMENT PLATFORM Jim Novak Solution Architect Celedon Partners, LLC
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI CloudBroker Platform integration into WS-PGRADE/gUSE Zoltán Farkas MTA.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
R. Smareglia 1, O. Laurino 2, M. Brescia 3 Jenam 2011 – Saint Petersburg 1 INAF - OATs 2 Smithsonian Astr. Obs. 3 INAF - OANa.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Astronomical Data Mining: Renaissance or dark age? Giuseppe Longo & the DAME Group
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
DAME Astrophysical DAta Mining & Exploration on GRID M. Brescia – S. G. Djorgovski – G. Longo & DAME Working Group Istituto Nazionale di Astrofisica –
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
7. Grid Computing Systems and Resource Management
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware A Cloud Computing Methodology Study of.
THE GLUE DOMAIN DEPLOYMENT The middleware layer supporting the domain-based INFN Grid network monitoring activity is powered by GlueDomains [2]. The GlueDomains.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Computational Science and new perspectives for the analysis of massive data sets Giuseppe Longo University Federico II – Napoli Associate California Institute.
ENEA GRID & JPNM WEB PORTAL to create a collaborative development environment Dr. Simonetta Pagnutti JPNM – SP4 Meeting Edinburgh – June 3rd, 2013 Italian.
The StratusLab Distribution and Its Evolution 4ème Journée Cloud (Bordeaux, France) 30 November 2012.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Unit 3 Virtualization.
Accessing the VI-SEEM infrastructure
Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Warm Handshake with Websites, Servers and Web Servers:
Joseph JaJa, Mike Smorul, and Sangchul Song
VO-Neural / Data Mining Exploration
Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
University of Technology
PROCESS - H2020 Project Work Package WP6 JRA3
Cloud Computing Dr. Sharad Saxena.
searching for KDD in MDS standards… …the DAME experience
Business Process Management Software
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Knowledge Sharing Mechanism in Social Networking for Learning
Maria Teresa Capria December 15, 2009 Paris – VOPlaneto 2009
Presentation transcript:

searching for KDD in MDS standards… …the DAME experience Marianna Annunziatella, Massimo Brescia, Stefano Cavuoti, Raffaele DAbrusco, George S. Djorgovski, Ciro Donalek, Mauro Garofalo, Marisa Guglielmo, Omar Laurino, Giuseppe Longo, Ashish Mahabal, Ettore Mancini, Francesco Manna, Amata Mercurio, Alfonso Nocella, Maurizio Paolillo, Luca Pellecchia, Sandro Riccardi, Giovanni Vebber, Civita Vellucci. Department of Physics – University Federico II – Napoli INAF – National Institute of Astrophysics – Capodimonte Astronomical Observatory – Napoli CALTECH – California Institute of Technology - Pasadena

Data Mining (KDD) as the Fourth Paradigm Of Science Definition DM is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

The BoKs Problem Limited number of problems due to limited number of reliable BoKs So far Limited number of BoK (and of limited scope) available Painstaking work for each application (es. spectroscopic redshifts for photometric redshifts training) Fine tuning on specific data sets needed (e.g., if you add a band you need to re-train the methods) Community believes AI/DM methods are black boxes You feed in something, and obtain patters, trends, i.e. knowledge…. Theres a need of standardization and interoperability between data together with DM application M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

What DAME is DAME Program is a joint effort between University Federico II, INAF-OACN, and Caltech aimed at implementing (as web applications and services) a scientific gateway for massive data analysis, exploration and mining, on top of a virtualized distributed computing environment. Technical and management info Documents Science cases Newsletters DAMEWARE Web application Beta Version M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

DM 4-rule virtuous cycle Finding patterns is not enough Science business must: – Respond to patterns by taking action – Turning: Data into Information Information into Action Action into Value Hence, the Virtuous Cycle of DM: 1.Identify the problem 2.Mining data to transform it into actionable information 3.Acting on the information 4.Measuring the results Virtuous cycle implementation steps: – Transforming data into information via: Hypothesis testing Profiling Predictive modeling – Taking action Model deployment Scoring – Measurement Assessing a models stability & effectiveness before it is used M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

DM: 11-step Methodology 1.Translate any opportunity (science case) into DM opportunity (problem) 2.Select appropriate data 3.Get to know the data 4.Create a model set 5.Fix problems with the data 6.Transform data to bring information 7.Build models 8.Assess models 9.Deploy models 10.Assess results 11.Begin again (GOTO 1) The four rules reflect into an 11-step exploded strategy, at the base of DAME concept M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

Effective DM process break-down M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

In this scenario DAME (Data Mining & Exploration) project, starting from astrophysics requirements domain, has investigated the Massive Data Sets (MDS) exploration by producing a taxonomy of data mining applications (hereinafter called functionalities) and collected a set of machine learning algorithms (hereinafter called models). This association functionality-model is made of what we defined "use case", easily configurable by the user through specific tutorials. At low level, any experiment launched on the DAME framework, externally configurable through dynamical interactive web pages, is treated in a standard way, making completely transparent to the user the specific computing infrastructure used and specific data format given as input. So the user doesnt need to know anything about the computing infrastructure and almost nothing about the internal mechanisms of the chosen machine learning model.. The Black box Infrastructure M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

DAME Infrastructure GRID UI GRID SE GRID CE DR ExecutionDR Storage DM Models Job Execution (300 multi-core processors) User & Data Archives (300 TB dedicated) Cloud facilities 16 TB 15 processors M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

DAME SW Architecture M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

The Available Services DAMEWARE Web Application Resource Main service providing via browser a list of algorithms and tools to configure and launch experiments as complete workflows (dataset creation, model setup and run, graphical/text output): Functionalities: Regression, Classification, Image Segmentation, Multi-layer Clustering; Models: MLP+BP, MLP+GA, SVM, MLP+QNA, K-Means (through KNIME), PPS, SOM, NEXT-II; VOGCLUSTERS Web Application for data and text mining on globular clusters; STraDiWA (Sky Transient Discovery Web Application) detect variable objects from real or simulated images (under R&D); WFXT (Wide Field X-Ray Telescope) Transient Calculator Web service to estimate the number of transient and variable sources that can be detected by WFXT within the 3 main planned extragalactic surveys, with a given significant threshold; SDSS (Sloan Digital Sky Survey) Local mirror website hosting a complete SDSS Data Archive and Exploration System; M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

DAMEWARE GUI KNIME WORKFLOW DMM API COMPONENT CLOUD EXE/STORAGE ENVIRONMENT DM PLUG-IN COMPONENT EXPERIMENT SETUP REQUEST Offline creation Offline creation Offline creation EXECUTION OUTPUT K-Means (through KNIME) M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

Web 2.0 Features in DAME Web 2.0? It is a system that breaks with the old model of centralized Web sites and moves the power of the Web/Internet to the desktop. [J. Robb] the Web becomes a universal, standards-based integration platform. [S. Dietzen] M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

VO Interoperability scenarios DA1 DA2 SAMP DA WA WSC MASSIVE DATA SETS WA1 WA2 URI? MASSIVE DATA SETS Full interoperability between DA (Desktop Applications) Local user desktop fully involved (requires computing power) Full WA DA interoperability Partial DA WA interoperability (such as remote file storing) MDS must be moved between local and remote apps Local user desktop partially involved (requires minor computing and storage power) Except from URI exchange, no standard interoperabilty Different accounting policy MDS must be moved between remote apps (but larger bandwidth) No local computing power required M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

Our vision: improving aspects WA1 WA2 plugins DAs has to become WAs Unique accounting policy (google/Microsoft like) To overcome MDS flow apps must be plug&play (e.g. any WAx feature should be pluggable in WAy on demand) No local computing power required. Also smartphones can run VO apps Requirements Standard accounting system; No more MDS moving on the web, but just moving Apps, structured as plugin repositories and execution environments; standard modeling of WA and components to obtain the maximum level of granularity; Evolution of SAMP architecture to extend web interoperability (in particular for the migration of the plugins); M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

Our vision: plugin granularity flow WAx Px-2 Px-1 Px-3 Px-… Px-n WAy Py-1 Py-2 Py-… Py-n 1. WAy requires Px-3 from WAx 2. WAx sends Px-3 to WAy Px-3 3. Way execute Px-3 This scheme could be iterated and extended involving all standardized web apps M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

The Lernaean Hydra VO KDD App M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

WAx Px-2 Px-1 Px-3 Px-… Px-n WAy Py-1 Py-2 Py-… Py-n After a certain number of such iterations… Px-2 Px-1 Px-3 Px-… Px-n Py-1 Py-2 Py-… Py-n The synchronization of plugin releases between WAs is performed at request time The VO KDD App scenario will become: No different WAs, but simply one WA with several sites (eventually with different GUIs and computing environments) All WA sites can become a mirror site of all the others Minimization of data exchange flow (just few plugins in case of synchronization between mirrors) The Lernaean Hydra VO KDD App M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

Conclusions DAME was not originally conceived (for the lack of suitable standards) to be interoperable with the VO, but offers a good benchmark to plan for the future developments of KDD on MDS in a VO environment. 1. DAME is just an example of what new ICT (Web 2.0) can do for A&A KDD problems. 2. A new vision of the KDD App approach, suitable for VO must be based on the minimization of data transfer and maximization of interoperability within the VO community. 3. If implemented, the new scheme can reach a wider science community by giving the opportunity to share data and apps worldwide, without any particular infrastructure requirements (i.e. by using a simple smartphone with a low-band connection). DAME group is currently involved in the definition of standards and rules and is working to modify and adapt the present infrastructure to become compliant with the VO. M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011