Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Architectures

Similar presentations


Presentation on theme: "Big Data Architectures"— Presentation transcript:

1 Big Data Architectures
Panel on Exploiting Big Data in Collaboration Initiatives CTS 2012 Westminster (Denver) CO May Geoffrey Fox Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies,  School of Informatics and Computing Indiana University Bloomington

2 Architecture of Data Repositories?
Traditionally governments set up repositories for data associated with particular missions For example EOSDIS, GenBank, NSIDC, IPAC for Earth Observation , Gene, Polar Science and Infrared astronomy respectively LHC/OSG computing grids for particle physics Focus has been on getting access to data with curation, provenance etc. Assumes analysis is dealt with separately as repositories have modest attached computing

3 Big Data Analysis Big data suggest that model of scientist browsing repository and downloading (petabytes of) data is flawed Data bandwidth too low and local compute resources too small The “Fourth Paradigm” (data oriented science) based on large scale data analysis Need to support repositories for large instruments (telescopes, accelerators, satellites) and pervasive distributed instruments/sensors (gene sequences)

4 Clouds as Support for Data Repositories?
The data deluge needs cost effective computing Clouds are by definition cheapest Shared (Collaborative!) resources essential (to be cost effective and large) Can’t have every scientists downloading petabytes to personal cluster Need to reconcile distributed (initial source of ) data with shared computing Can move data to (disciple specific) clouds How do you deal with multi-disciplinary studies

5 Traditional File System?
Data Compute Cluster C Archive Storage Nodes Typically a shared file system (Lustre, NFS …) used to support high performance computing Big advantages in flexible computing on shared data but doesn’t “bring computing to data” Cloud Object stores similar to this?

6 Hadoop/Google Data Parallel File System?
C Data File1 Block1 Block2 BlockN …… Breakup Replicate each block No archival storage and computing brought to data


Download ppt "Big Data Architectures"

Similar presentations


Ads by Google