Expanding and Scaling Lifemapper Computations Using CCTools

Slides:



Advertisements
Similar presentations
Virtualizing Lifemapper for PRAGMA: Step 2 - The Computational Tier By Aimee Stewart, Cindy Zheng, Phil Papadopoulos, C.J. Grady University of Kansas Biodiversity.
Advertisements

WP3 Biomapping results to date WP3: NRM, CDF, CEFAS, DINARA, WCS Additional input: WP1, AquaMaps workgroup.
Step 1: Valley Segment Classification Our first step will be to assign environmental parameters to stream valley segments using a series of GIS tools developed.
Using Specimen Data in Scientific Workflow Environments to Connect to Metadata Archive and Discovery Services in Environmental Biology CJ Grady, J.H. Beach,
Visual Basic: An Object Oriented Approach 2 – Designing Software Systems.
Lifemapper Provenance Virtualization
Aimee Stewart (KU) Nadya Williams (UCSD)
1 Component Description CMU Note-Taker Tools Human Computer Interaction Institute Carnegie Mellon University Prepared by: Bill Scherlis March 26, 1999.
Introduction : ‘Skoll: Distributed Continuous Quality Assurance’ Morimichi Nishigaki.
Accessing Biodiversity Resources in Computational Environments from Workflow Application J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray,
Web-Based Tool and Why Cross Platform Support Multi-User No special software to install… just a browser Offload real work to server No worrying about versions.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Internet Map Server Help This presentation briefly describes the Internet map server viewer and model interface and how to work them.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Bridging Species Niche Modeling and Multispecies Ecological Modeling and Analysis Jeffery Cavner, J.H. Beach, Aimee Stewart, CJ Grady
Adding Phylogeny to GIS-enabled Species Range and Distribution Analyses Jeffery Cavner, J.H. Beach, Aimee Stewart, CJ Grady
Specify Software Project – Quick Facts
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Synopsis of current BIEN and Enquist projects managed by Martha iPlant 2014.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
NR 422- Habitat Suitability Models Jim Graham Spring 2009.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
ESIP Federation 2004 : L.B.Pham S. Berrick, L. Pham, G. Leptoukh, Z. Liu, H. Rui, S. Shen, W. Teng, T. Zhu NASA Goddard Earth Sciences (GES) Data & Information.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
1 Media Grid Initiative By A/Prof. Bu-Sung Lee, Francis Nanyang Technological University.
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
Application Software System Software.
Biodiversity Data Exchange Using PRAGMA Cloud Umashanthi Pavalanathan, Aimee Stewart, Reed Beaman, Shahir Shamsir C. J. Grady, Beth Plale Mount Kinabalu.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Species Distribution Modeling Alexandre Copertino Jardim Antonio Miguel Vieira Monteiro Karla Donato Fook Lúbia Vinhas Silvana Amaral Scientific Workshop.
Lifemapper II: Finding the Good Life Aimee Stewart James H. Beach, C.J. Grady, David A. Vieglias Biodiversity Institute, KU.
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
1 openModeller Presentation Plan: Overview of openModeller OMWS: an open standard for distributed ecological niche modelling openModeller in relation to.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
User scenario on Marine Biodiversity AquaMaps Pasquale Pagano National Research Council (CNR) – ISTI Italy.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Aimee Stewart (KU) Nadya Williams (UCSD) 1.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Staging of the Ecological Niche Modeling Mammal Prototype Project Deana Pennington University of New Mexico December 14, 2004.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
11 DEPLOYING AN UPDATE MANAGEMENT INFRASTRUCTURE Chapter 6.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Lifemapper 2.0 Using and Creating Geospatial Data and Open Source Tools for the Biological Community Aimee Stewart, CJ Grady, Dave Vieglais, Jim Beach.
HPC In The Cloud Case Study: Proteomics Workflow
Introduction to UML.
Ecological Niche Modelling in the EGI Cloud Federation
Netscape Application Server
Open Source distributed document DB for an enterprise
BDII Performance Tests
Unified Modeling Language
The Client/Server Database Environment
XSEDE’s Campus Bridging Project
Ada – 1983 History’s largest design effort
Mixed Reality Server under Robot Operating System
The Globus Toolkit™: Information Services
Species Distribution Models
Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Database System Architectures
Mark Quirk Head of Technology Developer & Platform Group
Presentation transcript:

Expanding and Scaling Lifemapper Computations Using CCTools CJ Grady University of Kansas

What is Lifemapper Web service wrapped single and multi species tools for scaling biodiversity analysis

LmSDM: Species Distribution Modeling Species Occurrence Data This illustrates the basics of SDM modeling. We input Species occurrence data, map coordinates of where a species has been found, with Environmental data for those coordinates. In our case we are using temperature, precipitation, and elevation data, to create a model of the habitat best suited for the species, based on its known locations. We use OpenModeller with about 10 algorithms and MaxEnt softwares to create these models. The models are then projected back onto maps of the environmental data to find other areas where the habitat is suitable for the species. Potential Habitat Environmental Data

Kirtland’s Warbler Range Here is an example of the predicted distribution of the Kirtland’s Warbler – this is the predicted range given today’s climate

Kirtland’s Warbler Future And here is the predicted distribution after the climate change that might occur by the end of the century

LmRAD: Range and Diversity The second set of tools we are developing are Range and Diversity tools. We begin with geospatial data, representing species habitat ranges for multiple species These data are all summarized in a Presence Absence Matrix, or PAM: which then contains much of the information needed to understand species diversity. From the PAM, we have different ways of representing species diversity, such as scatterplots, indices, and other quantifications We connect these in an implementation using webservices and client software, We will support large amounts of data, from multiple sources, and analyze them at multiple scales requiring huge computational resources Presence Absence Matrix (PAM) Species Habitat Data Multi-species analyses Range and Diversity Quantifications

Terrestrial Mammals of the World Proportional Species Richness Terrestrial Mammals of the World Per-site Range Size Here is an example of some calculations that can be mapped from a PAM. These are from the “Terrestrial Mammals of the World” data, with over 3600 species, compiled over a period of 3 years by Gerardo Ceballos at National University of Mexico, UMAN, then analyzed by Jorge Soberon and his students at the U of KS Biodiversity Institute You can see the inverse relationship between species richness and range size is clearly visualized here. Areas of high species richness, diversity are composed of species with very small range sizes, such as the most diverse area in the world, here in the Andes along the northern portion of South America Areas where most of the species have large range size, such as the Arctic, have very low species diversity High Yellow Moderate Red Low Blue

Client-side applications The Lifemapper QGIS plugin provides a visual interface for a user to connect to a LM server to examine and request analyses of their own data, and retrieve analysis results for further exploration. Our QGIS plugin connects with the LM infrastructure and allows users to examine single and multi species data inputs to and outputs from Lifemapper it also allows users to request data and analyses, using their own, or Lifemapper data

Integrates Phylogenetic Trees This screen shows integration of phlogenetic trees in a multi-species analyses. The analyses are done by the LM infrastructure, then returned to the user.

The PAICs of the Philippines Here is another example of the QGIS interface looking at data calculated by Lifemapper for amphibians in the Phillipines. On the right, the different colored Pleistocene aggregate island complexes (PAICs) on the right of the screen show similar species characteristics that evolved in islands that were previously connected by land bridges In the plot on the lower left, you can see highlighted areas that are also highlighted in the in the northern island complex

Implementation Rocks Cluster Database and Webserver on Front End Compute code on nodes

Harvard Collaboration 12,500 species 15 future climate scenarios (+1 current) 6 experiments ~500 MB per projection ~72 TB compressed outputs

Workflow N per species per experiment Projection Request Climate data Occurrence Data Create Projection Create Model Projection Model Create Package Package

Attempt 1 Used our existing 10 node cluster Limited to 20 concurrent processes Used entire resource Would have taken more than 2 years

Attempt 2 Used KU community cluster Limited by disk space Accidentally created DDoS attack

CCTools and XSEDE Modified our compute platform to run on Stampede Split each experiment into chunks of 1000 species 4 hours per chunk on 1024 cores

Second Experiment Limited to 128 or so cores Disk performance has changed Reduced input data resolution

Current Status Integrating CCTools into core infrastructure Moving from cataloging jobs to cataloging workflows Simplified communication Eliminate our job server Reduce overhead Pack multiple tasks into single job

Workflow N per species per experiment Projection Request Climate data Occurrence Data Create Projection Create Model Projection Model Create Package Package

Workflow N per species per experiment Projection Request Climate data Occurrence Data Create Projection Create Model Projection Modify Projection Model Create Package Package

MF W MF W LM DB Catalog Server MF W MF W

Near Future Deploy integrated version of Lifemapper Deploy on Comet Suggestions from this meeting