Big Data Architectures

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

BEDI -Big Earth Data Initiative
International Conference on Cloud and Green Computing (CGC2011, SCA2011, DASC2011, PICom2011, EmbeddedCom2011) University.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Facilitating Distributed.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Clouds will win! Geoffrey Fox Director,
1 Clouds and Sensor Grids CTS2009 Conference May Alex Ho Anabas Inc. Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department.
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
X-Informatics Introduction: What is Big Data, Data Analytics and X-Informatics? January Geoffrey Fox
X-Informatics Cloud Technology (Continued) March Geoffrey Fox Associate.
I399 1 Research Methods for Informatics and Computing D: Basic Issues Geoffrey Fox Associate Dean for Research.
E-Science: Stuart Anderson National e-Science Centre Stuart Anderson National e-Science Centre.
Science of Cloud Computing Panel Cloud2011 Washington DC July Geoffrey Fox
Getting Access to FutureGrid CTS Conference 2011 Philadelphia May Geoffrey Fox
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
PolarGrid Geoffrey Fox (PI) Indiana University Associate Dean for Graduate Studies and Research, School of Informatics and Computing, Indiana University.
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox
ICETE 2012 Joint Conference on e-Business and Telecommunications Hotel Meliá Roma Aurelia Antica, Rome, Italy July
Image Generation and Management on FutureGrid CTS Conference 2011 Philadelphia May Geoffrey Fox
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
X-Informatics MapReduce February Geoffrey Fox Associate Dean for Research.
Virtual Appliances CTS Conference 2011 Philadelphia May Geoffrey Fox
Clouds will win! CTS Conference 2011 Philadelphia May Geoffrey Fox
Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
1 DMS-DQS-SUPSC03-PRE-12-E © DEIMOS Space S.L., 2007 A Semantic Data Grid for Satellite Mission Quality Analysis Reuben Wright Deimos Space.
E ARTHCUBE C ONCEPTUAL D ESIGN A Scalable Community Driven Architecture Overview PI:
Training Data Scientists DELSA Workshop DW4 May Washington DC Geoffrey Fox Informatics, Computing.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
1 Grid Systems: What is needed from Web Service standards? ICSOC Panel November Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology.
Big Data Workshop Summary Virtual School for Computational Science and Engineering July Geoffrey Fox
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Space Tools Standard Compare the purposes of the tools & the technology that scientists use to study space.
Geoffrey Fox Panel Talk: February
Panel: Beyond Exascale Computing
Indiana University Pervasive Technology Institute – briefing for CACR
Clouds , Grids and Clusters
Geoffrey Fox, Shantenu Jha, Dan Katz, Judy Qiu, Jon Weissman
Community Grids Laboratory Activities
Biology MDS and Clustering Results
CICC Combines Grid Computing with Chemical Informatics
Data Science for Life Sciences Research & the Public Good
Hilton Hotel Honolulu Tapa Ballroom 2 June 26, 2017 Geoffrey Fox
4 Education Initiatives: Data Science, Informatics, Computational Science and Intelligent Systems Engineering; What succeeds? National Academies Workshop.
Clouds from FutureGrid’s Perspective
The Great Academia/Industry Grid Debate
The two faces of Cyberinfrastructure: Grids (or Web 2
Panel: Revisiting Distributed Simulation and the Grid
Discussion: Cloud Computing for an AI First Future
Cyberinfrastructure and PolarGrid
Services, Security, and Privacy in Cloud Computing
Department of Intelligent Systems Engineering
$1M a year for 5 years; 7 institutions Active:
PolarGrid and FutureGrid
Status of Grids for HEP and HENP
Panel on Research Challenges in Big Data
Digital Science Center
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Big Data, Simulations and HPC Convergence
CReSIS Cyberinfrastructure
Convergence of Big Data and Extreme Computing
I590 Data Science Curriculum August
Presentation transcript:

Big Data Architectures Panel on Exploiting Big Data in Collaboration Initiatives CTS 2012 Westminster (Denver) CO May 23 2012 Geoffrey Fox gcf@indiana.edu http://www.infomall.org http://www.salsahpc.org Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies,  School of Informatics and Computing Indiana University Bloomington

Architecture of Data Repositories? Traditionally governments set up repositories for data associated with particular missions For example EOSDIS, GenBank, NSIDC, IPAC for Earth Observation , Gene, Polar Science and Infrared astronomy respectively LHC/OSG computing grids for particle physics Focus has been on getting access to data with curation, provenance etc. Assumes analysis is dealt with separately as repositories have modest attached computing

Big Data Analysis Big data suggest that model of scientist browsing repository and downloading (petabytes of) data is flawed Data bandwidth too low and local compute resources too small The “Fourth Paradigm” (data oriented science) based on large scale data analysis Need to support repositories for large instruments (telescopes, accelerators, satellites) and pervasive distributed instruments/sensors (gene sequences)

Clouds as Support for Data Repositories? The data deluge needs cost effective computing Clouds are by definition cheapest Shared (Collaborative!) resources essential (to be cost effective and large) Can’t have every scientists downloading petabytes to personal cluster Need to reconcile distributed (initial source of ) data with shared computing Can move data to (disciple specific) clouds How do you deal with multi-disciplinary studies

Traditional File System? Data Compute Cluster C Archive Storage Nodes Typically a shared file system (Lustre, NFS …) used to support high performance computing Big advantages in flexible computing on shared data but doesn’t “bring computing to data” Cloud Object stores similar to this?

Hadoop/Google Data Parallel File System? C Data File1 Block1 Block2 BlockN …… Breakup Replicate each block No archival storage and computing brought to data