Workshop Organizing Committee: Rosalind R. JamesCarolyn Lawrence Sharon PapiernikCurt Van Tassell.

Slides:



Advertisements
Similar presentations
Chapter 5 Transfer of Training
Advertisements

1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Luquillo Experimental Forest Information Management: a Long-Term Ecological Research system to deposit documented data ready for analysis and synthesis.
Ch 1 - The Nature of Science
Know how a data management project can help:  Improve program design  Demonstrate effectiveness  Highlight the best work being done  Compete for.
Global Analysis and Distributed Systems Software Architecture Lecture # 5-6.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Introducing WatchGuard Dimension. Oceans of Log Data The 3 Dimensions of Big Data Volume –“Log Everything - Storage is Cheap” –Becomes too much data –
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.
The MetaDater Model and the formation of a GRID for the support of social research John Kallas Greek Social Data Bank National Center for Social Research.
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
Topics Problem Statement Define the problem Significance in context of the course Key Concepts Cloud Computing Spatial Cloud Computing Major Contributions.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
Module 3: Business Information Systems Chapter 11: Knowledge Management.
Presentation to the Secretariat of the Federal National Council On its Experience in Parliamentary Research & Studies (For the Inter-Parliamentary Union)
CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
8 th Grade.  Scientific Method Review   Ask a question to find out more information or to solve a problem.  What does this fossil show?  What kind.
ESIP Federation Air Quality Cluster Partner Agencies.
IT Job Roles & Responsibilities Shannon Ciriaco Unit 2:
CyberInfrastructure workshop CSG May Ann Arbor, Michigan.
INTRODUCTION TO GEOGRAPHICAL INFORMATION SCIENCE RSG620 Week 1, Lecture 2 April 11, 2012 Department of RS and GISc Institute of Space Technology, Karachi.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
1.1 – What is Science?. What is Science? Science is … Knowledge – what we know A process – how we discover new things Driven by curiosity Asking questions.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Group Science J. Marc Overhage MD, PhD Regenstrief Institute Indiana University School of Medicine.
 TERMINOLOGY TERMINOLOGY DATA INFORMATION  NEED OF INFORMATION NEED OF INFORMATION  QUALITIES OF INFORMATION QUALITIES OF INFORMATION  FILE SYSTEM.
Challenges of Coping with Funding and Data Management in a Changing World Rick Lyons Director Infectious Disease Research Center.
Copyright ©2005 by South-Western, a division of Thomson Learning. All rights reserved Chapter 17 1 Information Management Systems MANAGEMENT Meeting and.
Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.
Cloud Computing & Big Data Group 9 Femme L H Sabaru | Aditya Gisheila N P | Aninda Harapan | Harry | Andrew Khosugih.
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
Expedition Workshop Towards Scalable Data Management June 10, 2008 Chris Greer Director, NCO.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
The Nature of Science Section 1 What is Science? Science – a way of learning about the natural world. Scientists ask questions about the natural world,
High throughput biology data management and data intensive computing drivers George Michaels.
BIG DATA. The information and the ability to store, analyze, and predict based on that information that is delivering a competitive advantage.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
DATA MANAGEMENT: WHAT IT MEANS FOR YOUR RESEARCH Maggie Howell 02 June
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
What we mean by Big Data and Advanced Analytics
H2020 Big Data Lighthouse Pilot DataBio
Why KM is Important KM enhances mission command, facilitates the exchange of knowledge, supports doctrine development, fosters leaders’ development, supports.
Big Data Enterprise Patterns
Campus Cyberinfrastructure
Tools and Services Workshop
Joslynn Lee – Data Science Educator
SAS Education Practice
FDA Objectives and Implementation Planning
 Deep Analytical Talent  Data Savvy Professionals  Technology and Data Enablers.
Big Data Young Lee BUS 550.
Copyright © JanBask Training. All rights reserved Become AWS Certified & Get Amazing Job Opportunities.
ICTs transforming agricultural science, research & technology generation Science Forum Workshop Theme 3.
Visualization of Global Argo Metadata:
Presentation transcript:

Workshop Organizing Committee: Rosalind R. JamesCarolyn Lawrence Sharon PapiernikCurt Van Tassell

Workshop Purpose Bring ARS scientific capability to the cutting edge

Workshop Purpose Develop a vision and strategy that defines: (1)ARS scientific Big Data needs (2)An infrastructure for dealing with these needs for now and into the future

What is Big Data? Massive amounts of data that collect over time that are difficult to analyze and handle using common data management tools.

Big Data comes in V-Dimensions: Volume. With large size comes difficulty in finding what is relevant, space to store it, and how to index it Variety. Highly structured data, variability structured data, and unstructured data Velocity. How fast is the data created, and how fast must it be processed? Veracity. Uncertain or imprecise data.

What makes Big Data so important? Researchers no longer simply ask, What experimental design will best address this question? But rather, What can I glean from extant data? Or better yet, What insights can I glean if I could fuse data from multiple domains? From: The Fourth Paradigm: Data-Intensive Scientific Discovery

EO Wilson Consilience, The Utility of Knowledge

Scientific computing is becoming increasingly data intensive. We are becoming increasingly able to Answer previously intractable questions, More efficiently solve problems, Characterize the natural world to a greater level of detail

An era of large datasets Large Hadron Collider 15 Pbytes/year (15 x 10 6 Gbytes, 15 x 10 3 Tbytes) Pan-STARRS (panoramic survey telescope) 2Gbytes per image, taken every 30 sec from 4 cameras Several Tbytes/night/telescope Natl. Human Genome Research Institute 1000 genomes = 200 Tbytes Beijing Genomics Institute 5 Tbytes/day

GenBank Sequence Growth (to 2008)

What it takes to move Big Data 1Gbyte data T1 line: 1.5 hrs Thin Ethernet: 14 min Fast Ethernet: 1 min 1 Tbyte data T1 line: 65 days, 22.5 hrs Thin ethernet: 10 days, 4.3 hrs Fast ethernet: 1 day, 0.5 hrs Gig-E: 2 hrs, 26 min.

Moving into the cloud Scientists need to be able to move and share large datasets. Cloud/Cluster/Grid computing. Not just for holding data, but for computations Reduce the need to repeatedly move the same datasets.

Libraries: Provide access and dissemination of information…

Existing Systems for Handling Big Data XCEDE (replaces TeraGrid) A virtual system that scientists can use to interactively share super computer resources, data, & expertise Composite of several university advanced computer centers iPlant (Texas Advanced Computing Center) Plant genomic data Cyber infrastructure for the transfer, storage, analysis, visualization, meta-data control, discovery, etc. Cloud computing

Existing Big Data Systems (cont.) Three Rivers Optical Exchange (part of XCEDE) Amazon Cloud Computing Purchase computing power and storage, as needed John Wesley Powell Center for Analysis & Synthesis USGS Earth sciences issues Enhancing scientific discovery & problem solving through integrated research. European grid systems Watson (?)

ARS Could Provide Leadership for Agricultural Data OSTP Big Data Research and Development Initiative John Holdren (3/29/2012) The government is under investing in data management The process of going from data knowledge understanding is being inhibited Human capital needs People with deep analytical skills, Data-savvy managers/executives Greater IT savvy technicians, for both structured and unstructured data

What does ARS have to add? Decision support software operate from a cloud system Public databases could be better organized and more easily accessible, collectively Large data Currently wasting money on redundant hardware And software Currently have difficulty moving the data Cloud systems facilitate fusing datasets ARS capable of long-term stability for storage, analyses

Thus this Workshop Will Gather together ARS scientists who are already working with large data or with experience and knowledge of our current database collections or who are trying to work with Big Data Include speakers familiar with Big Scientific Data issues, who have developed solutions Develop a Vision for what an ARS solution should look like.

A white paper describing a vision for ARS Big Data, including examples of current needs and an infrastructure for meeting current and future needs. This infrastructure will include IT resources Intellectual resources Personnel resources

ARS Administrators (AC Council) ARS Office National Programs OCIO and IT Specialists in the Field ARS Scientific Staff (scientists, technicians, computational biologists, statisticians)

The climb is steep, but there are cairns along the way.

Thank you!