Fourth Paradigm Science-based on Data-intensive Computing.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

DataCite Metadata. Science Paradigms Thousand years ago: science was empirical describing natural phenomena Last few hundred years: theoretical branch.
NEPTUNE Canada Ocean Sciences Enters the Data- Intensive Scene Benoît Pirenne, University of Victoria, BC, Canada.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Developing a Geoscience and Remote Sensing Laboratory as a Pathway to Earth SySTEM Education John D. Moore Einstein Fellow Emeritus President-Elect, National.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Current NIST Definition NIST Big data consists of advanced techniques that harness independent resources for building scalable data systems when the characteristics.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
Client+Cloud The Future of Research Dr. Daniel A. Reed Corporate Vice President Extreme Computing Group & Technology Strategy and Policy.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
The Experience Factory May 2004 Leonardo Vaccaro.
NSF and Environmental Cyberinfrastructure Margaret Leinen Environmental Cyberinfrastructure Workshop, NCAR 2002.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
CH01-1 Welcome to GEOS 105
Sheldon Brown, UCSD, Site Director Milton Halem, UMBC Director Yelena Yesha, UMBC Site Director Tom Conte, Georgia Tech Site Director Fundamental Research.
CH01-1 Welcome to GEOS 105
Geographic Information System Geog 258: Maps and GIS February 17, 2006.
Introduction to Grid Computing Ann Chervenak Carl Kesselman And the members of the Globus Team.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
Web 2.0: Concepts and Applications 6 Linking Data.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Climate Change Problem Solving (AOSS 480 // NRE 480) Richard B. Rood Cell: Space Research Building (North Campus)
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Planning for Arctic GIS and Geographic Information Infrastructure Sponsored by the Arctic Research Support and Logistics Program 30 October 2003 Seattle,
Report on March Crystal City Workshop to Identify Grand Challenges in Climate Change Science By its cochair- Robert Dickinson For the 5 Sept
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
Ethanol & Sustainability teaching: Integrating business, public policy, and science Rick Oches Department of Natural & Applied Sciences Bentley University.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
Office of Science Office of Biological and Environmental Research DOE Workshop on Community Modeling and Long-term Predictions of the Integrated Water.
“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
Chapter 6: Integrating Knowledge and Action Scott Kaminski ME / 9 / 2005.
Climate Change Problem Solving (AOSS 480 // NRE 480) Richard B. Rood Cell: Space Research Building (North Campus)
Opportunities for Research in the Dynamics of Water Processes in the Environment at NSF Pam Stephens Directorate of Geosciences, NSF Directorate of Geosciences,
Global climate system - link together many of the topics on the basis of the most recent modeling for future trends Climate patterns - short-term time.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 Image: MODIS Land Group, NASA GSFC March 2000 The Influences of Changes.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Focus on “deep soil column” Spatial patterns Mechanism that control development and function Implications for ecology, biogeochemistry and hydrology What.
Welcome and Introduction to the Course MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
MERGE 5 years from now MORE OF THE SAME OR MORE THAN THAT?
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
Statistics in WR: Lecture 1 Key Themes – Knowledge discovery in hydrology – Introduction to probability and statistics – Definition of random variables.
1 Why is Digital Curation Important for Workforce and Economic Development? Alan Blatecky Office of Cyberinfrastructure Symposium on Digital Curation in.
Page 1 Model interoperations: Community models, models as services, and model webs NASA Biodiversity and Ecological Forecasting Team Meeting New York 8.
3-D rendering of jet stream with temperature on Earth’s surface ESIP Air Domain Overview The Air Domain encompasses a variety of topic areas, but its focus.
U.S. Department of the Interior U.S. Geological Survey Manage and Provide Information: Examples from fish health, contaminants, and water quality data.
High throughput biology data management and data intensive computing drivers George Michaels.
A WEB-ENABLED APPROACH FOR GENERATING DATA PROCESSORS University of Nevada Reno Department of Computer Science & Engineering Jigar Patel Sohei Okamoto.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Section: The Case for Data Stewardship.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Using Analysis and Tools to Inform Adaptation and Resilience Decisions -- the U.S. national experiences Jia Li Climate Change Division U.S. Environmental.
Climate Change Problem Solving (AOSS 480 // NRE 480)
2nd GEO Data Providers workshop (20-21 April 2017, Florence, Italy)
USGS EROS LCMAP System Status Briefing for CEOS
Model Summary Fred Lauer
Data-intensive Computing - Review
Bird of Feather Session
Presentation transcript:

Fourth Paradigm Science-based on Data-intensive Computing

How to read the text? Foreword: – A very interesting foreword by Gordon Bell – discusses innovation and discoveries through the centuries… – Data-intensive science consists of three basic activities: data capture, curation and analysis (we will add one: visualization). – Various sources of data are identified

Some sources of big data Australian square kilometer array of telescopes CERN Hadron Collider Gene sequencing machines (HTS: High Throughput Sequencing) Human genome project

Activities around big data Capture and aggregation Curation involves – Right data structures – Schema – Metadata – Integration across instruments, labs, and experiments Modeling Analysis (including visualization) Archiving

How can we contribute? “..there are many opportunities and challenges for data-intensive science.” “scientific data mash-ups” … Read the foreword in the Fourth Paradigm book. It is quite inspiring.

Jim Grey on eScience Research cycle: data capture, data curation, data analysis and data visualization Mega-scale, milli-scale Publication of research results We need tools … Capture, curate, analyze, model, visualize, make decision and take actions (This will be our objectives for the two projects for the course.)

Science paradigms (chronologically) Thousands of years ago: – Science was empirical – Describing natural phenomena Last few hundred years – Theoretical branch – Models and generalizations Last few decades – Computational branch – Simulating complex phenomena Today – Data exploration/data science – Unify theory, experiment and simulation – Data captured by simulator or instrument – Processed by software – Info/knowledge/ intelligence – Analysis and visualization

Fourth Paradigm: Area 1: Environmental Sciences

Three Phases Phase 1: largely discipline-oriented: geology, atmospheric chemistry, ecosystems Phase 2: study of interacting element earth sciences, human behavior and systems: Complex systems and models for explaining these systems emerged; knowledge developed for scientific understanding Phase 3: knowledge developed for practical decisions and actions This new knowledge endeavor is termed “Science of environmental applications”

Knowledge and Queries “With the basic understanding now well established, the demand for climate applications knowledge is emerging. How do we quantify and monitor total forest biomass so that carbon markets can characterize supply? What are the implications of regional shifts in water resources for demographic trends, agricultural output, and energy production? To what extent will seawalls and other adaptations to rising sea level impact coasts?” These questions and additional issues need applications to built around this basic knowledge Integration of knowledge from many disciplines: physics, biogeochemical, engineering, and human processes (demographics, practices). Snow-melt problem affects 1 billion people in the world: it is discussed in detail. Models have to consider interactions among various systems: traditional approaches may not suffice. My opinion: We may need a compendium of methods and the knowledge derived from these methods have to be unified.

Consideration for designing models Need driven vs. curiosity driven Externally constrained Consequential and recursive (knowledge generates more knowledge) Useful even when incomplete Scalable Robust Data-intensive

Development of New Knowledge Types and Tools For acquiring knowledge multiple sources: satellite imagery, energy-balancing reconstruction, etc. Practical answers, intellectually captivating models, knowledge for policy making, etc. Equally important is using this knowledge in everyday lives: esp. with the availability of mobile devices and the Internet. Ex: Hurricane Irene models, knowledge and consequences

Simple Example Figure 1: Sierra Nevada and Central Valley of CA geography image NASA satellite images at various spectral bands Use these two collection to arrive at the snow cover model Something of a scientific “mashup”

Figure 1

Ecological Sciences Systems (p.24..) This is especially relevant in the context of recent anomalies such as super storms Sandy. Navigating the ecological data flood Step1: in ecological scientific analysis is data discovery and harmonization…sources, conversions, scrapping, web services, RSS, namespace mediation, search portals, wikipedia, … Step 2: Moving ecological synthesis into the cloud: CSV MATlab ready data files from hundreds of sites; analysis on the cloud; SQL Server Analysis Data Cube on the cloud;

Sample Ecology Mashup See figure F1 and the link that goes with it at the bottom of the page.

Ecological Sciences Systems (contd.) Step 2 (contd.): analysis will download 3 terabyte of imagery, 4000 CPU hours, and generate < 100MB results. Challenges: complex visualization, diversity of the data set, data semantics, data publisher, meta data, collaboration tools. So you think these are NOT CSE problems, take a look at this site:

Summary Climate research is one of the critical application area of research for data-intensive computing. We looked at some of the sample applications. We also observed “mashup” of data from various sources is a common approach used in these applications.