1 Cloud paradigm, standards and middleware for PGS * ESRIN * 23.01.2013.

Slides:



Advertisements
Similar presentations
ESA Data Integration Application Open Grid Services for Earth Observation Luigi Fusco, Pedro Gonçalves.
Advertisements

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
05/11/2001 CPT week Natalia Ratnikova, FNAL 1 Software Distribution in CMS Distribution unitFormContent Version of SCRAM managed project.
1 Models for scientific exploitation of EO data * ESRIN *
CoastColour BEAM Workshop Lisbon, October 21, 2011.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Integration Framework Pennsylvania Treasury
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Useful Tools for Testing
EXTENDING SCIENTIFIC WORKFLOW SYSTEMS TO SUPPORT MAPREDUCE BASED APPLICATIONS IN THE CLOUD Shashank Gugnani Tamas Kiss.
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
SCAPE Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library The SCAPE Platform Overview.
Big Data Analytics with R and Hadoop
HADOOP ADMIN: Session -2
Tutorial on Hadoop Environment for ECE Login to the Hadoop Server Host name: , Port: If you are using Linux, you could simply.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
B ROCKMANN C ONSULT – Finnish Environment Institute, Syke Helsinki Programming with BEAM Norman Fomferra Carsten Brockmann.
A Guided Tour of BOINC David P. Anderson Space Sciences Lab University of California, Berkeley TACC November 8, 2013.
AUTOBUILD Build and Deployment Automation Solution.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
MapReduce.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
Luis Russi¹, Carlos R. Senna¹, Edmundo R. M. Madeira¹, Xuan Liu², Shuai Zhao², and Deep Medhi² Hadoop-in-a-Hybrid-Cloud GEC21 The 21st GENI Engineering.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
ServiceSs, a new programming model for the Cloud Daniele Lezzi, Rosa M. Badia, Jorge Ejarque, Raul Sirvent, Enric Tejedor Grid Computing and Clusters Group.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
SCAPE Rainer Schmidt SCAPE Information Day May 5 th, 2014 Österreichische Nationalbibliothek The SCAPE Platform Overview.
Unified scripts ● Currently they are composed of a main shell script and a few auxiliary ones that handle mostly the local differences. ● Local scripts.
MapReduce. What is MapReduce? (1) A programing model for parallel processing of a distributed data on a cluster It is an ideal solution for processing.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CERN IT Department t LHCb Software Distribution Roberto Santinelli CERN IT/GS.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Next Generation of Apache Hadoop MapReduce Owen
JRA1 Meeting – 09/02/ Software Configuration Management and Integration EGEE is proposed as a project funded by the European Union under contract.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
20 Copyright © 2006, Oracle. All rights reserved. Best Practices and Operational Considerations.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Hadoop on the EGI Federated Cloud Dr Tamas Kiss, CloudSME Project Director University of Westminster, London, UK Carlos Blanco – University.
SEE-GRID-SCI WRF-ARW model: Grid usage The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures.
Beginning XML 4th Edition. Chapter 8: XSLT Chapter 8 Objectives How XSLT can be used to convert XML for presentation or restructure XML for business-to-business.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
Hadoop Architecture Mr. Sriram
Introduction to Distributed Platforms
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Spark Presentation.
Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
SDM workshop Strawman report History and Progress and Goal.
Introduction to Apache
ESIS Consulting LLC (C) ESIS Consulting LLC. All rights reserved
Distributing META-pipe on ELIXIR compute resources
MapReduce: Simplified Data Processing on Large Clusters
Thursday AM, Lecture 1 Lauren Michael
Pig Hive HBase Zookeeper
Presentation transcript:

1 Cloud paradigm, standards and middleware for PGS * ESRIN *

Calvalus Full mission EO cal/val processing and exploitation services 2 Cloud paradigm, standards and middleware for PGS * ESRIN *

Deployment and processing scenario 3 Cloud paradigm, standards and middleware for PGS * ESRIN * UserCalvalus PortalHadoop Cluster installation package output data request v4.10 node 1 local disk node 2 local disk node n local disk... master feeder external data source or destination test server test 1 test 1 test 1 test 1 test 1 test 1 vm1vm1 node 3 local disk node 4 local disk

Selecting processors in the Calvalus portal 4 Cloud paradigm, standards and middleware for PGS * ESRIN *

Processor bundles 5 Cloud paradigm, standards and middleware for PGS * ESRIN * v4.10 A bundle is a versioned package of software and configuration and optionally auxiliary data. Bundles are deployed. Processing jobs refer to a bundle and a processor to be executed Examples: BEAM Case2R 1.5.3, SEADAS 6.3 Processor parameters installation package

Processor bundles for the BEAM framework 6 Cloud paradigm, standards and middleware for PGS * ESRIN * ls -1 /mnt/hdfs/calvalus/software/1.0/case2-regional beam-meris-case2-regional jar beam-collocation-1.3.jar beam-meris-brr-2.3.jar beam-meris-glint jar beam-meris-l2auxdata-1.2.jar beam-meris-radiometry jar beam-meris-sdr-2.3.jar flint-processor-1.2.jar bundle-descriptor.xml Java Service Provider Interface (SPI) declarations in META- INF/services SPI implementation “Case2IOPOperator.Spi” as factory of the BEAM GPF Operator BEAM GPF Operator implementation “Case2IOPOperator” with processor name “Meris.Case2Regional” annotation Processor descriptors with name “Meris.Case2Regional”, formal parameters, output variables

Requests and processor name resolution 7 Cloud paradigm, standards and middleware for PGS * ESRIN * L2 productionName case2r-${year}-${month} processorBundleName case2-regional processorBundleVersion processorName Meris.Case2Regional processorParameters true false true RADIANCE_REFLECTANCES...

Bundle descriptor for BEAM Operators 8 Cloud paradigm, standards and middleware for PGS * ESRIN * case2-regional BEAM Meris.Case2Regional MERIS Case2 Regional doAtmosphericCorrection boolean Whether or not to perform atmospheric correction. true... BEAM-DIMAP,NetCDF,GeoTIFF... !case2_flags.INVALID a_total_443 AVG

Deployment process for BEAM operators  Stand-alone test in VISAT (optional)  copying to HDFS, automated replication  bulk request submission (portal, batch client)  deployment via distributed cache, automatic unpacking, all JARs on class path  processor adapter  calls operator  provides input and output as product object  BEAM handles stream interface from/to HDFS via tile cache 9 Cloud paradigm, standards and middleware for PGS * ESRIN *

Processor bundles for UNIX executables 10 Cloud paradigm, standards and middleware for PGS * ESRIN * ls -1 /mnt/hdfs/calvalus/software/1.0/seadas-6.3 bundle-descriptor.xml l2gen-process.vm l2gen-prepare.vm l2gen-parameters.vm nasa-seadas-6.3.tar.gz

Converting a request to a UNIX call  Transformation of scripts with Velocity (.vm) or XSLT (.xsl) providing parameters, inputs and outputs in the context/document  execution of an optional “prepare” script to avoid unnecessary input retrieval and processing (in case output exists)  execution of a “process” script, the processor wrapper 11 Cloud paradigm, standards and middleware for PGS * ESRIN *

Converting a request to a UNIX call 12 Cloud paradigm, standards and middleware for PGS * ESRIN * #!/bin/bash inputPath="$inputFile" #[[ input=$(basename $inputPath) output=${input:0:8}2${input:9:$((${#input} - 12))}.hdf SEADAS=./seadas-6.3/nasa-seadas-6.3 L2GEN_BIN=${SEADAS}/run/bin/linux_64/l2gen L2GEN_ENV=${SEADAS}/config/seadas.env. $L2GEN_ENV # determine and get AUX data files... function handle_progress() { line=$1 if [[ ${line} =~ Processing\ scan\ #\ +[0-9]+\ +\(([0-9]+)\ +of\ +([0-9]+)\)\ +after ]]; then a1=${BASH_REMATCH[1]} a2=${BASH_REMATCH[2]} progress=$(echo "scale=3; ${a1} / ${a2}" | bc) printf "CALVALUS_PROGRESS %.3f\n" $progress fi } ${L2GEN_BIN} ifile=${inputPath} ofile=${output} par=parameters | \ while read x ; do handle_progress "$x" ; done echo CALVALUS_OUTPUT_PRODUCT ${output} ]]# l2gen-process.vm

Converting a request to a UNIX call 13 Cloud paradigm, standards and middleware for PGS * ESRIN * #!/bin/bash inputUrl=$1 outputDir=$2 inputFile=$(basename $inputUrl) outputFile=${inputFile:0:8}2${inputFile:9:$((${#inputFile} - 12))}.hdf outputPath=$outputDir/$outputFile if hadoop fs -ls $outputPath 2>&1 > /dev/null then echo "skipping $inputFile, $outputFile exists" echo “CALVALUS_OUTPUT_PRODUCT $outputPath” echo "CALVALUS_SKIP_PROCESSING yes“ fi l2gen-prepare

Converting a request to a UNIX call 14 Cloud paradigm, standards and middleware for PGS * ESRIN * = l2gen-parameters.xsl

Deployment process for UNIX executables 15 Cloud paradigm, standards and middleware for PGS * ESRIN *  Stand-alone test (optional)  packing as versioned.tar.gz file(s) + wrapper scripts  copying to HDFS, automated replication  bulk request submission (portal, batch client)  deployment via distributed cache, automatic unpacking, available as symlink in working dir  processor adapter  transforms scripts  checks availability of output  retrieves input into working dir  calls wrapper scripts (prepare, process, finalize)  archives output

Summary  Convention for processor software packages  Requests with processor (version) identification  Integration of existing processors by wrappers  Hadoop distributed cache for deployment  Input provision, working dir, output archiving  Progress monitoring, exception handling  Acknowledgement: The initial Calvalus idea was developed and its realisation was funded by the European Space Agency under the SME-LET programme. 16 Cloud paradigm, standards and middleware for PGS * ESRIN *