Large data storage (examples, volumes, challenges) Cluster, Grid, Clouds – Julien Dhallenne.

Slides:



Advertisements
Similar presentations
Slide 1 Steve Lloyd Culham - July 2006 The Data Deluge and the Grid Steve Lloyd Queen Mary, University of London Culham July 2006.
Advertisements

CCGrid2013 Panel on Clouds Henri Bal Vrije Universiteit Amsterdam.
A Local-Optimization based Strategy for Cost-Effective Datasets Storage of Scientific Applications in the Cloud Many slides from authors’ presentation.
Large Scale Computing Systems
Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.
1 GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
ADIOS IO introduction Yufei Dec 10. System at Oak Ridge 672 OSTs 10 Petabytes of storage 60 GB/sec = 480 Gbps aggregate performance (theoretical) 225,000.
Evolution in Coming 10 Years: What's the Future of Network? - Evolution in Coming 10 Years: What's the Future of Network? - Big Data- Big Changes in the.
GRID COMPUTING: REPLICATION CONCEPTS Presented By: Payal Patel.
TECHNOLOGY GUIDE THREE Emerging Types of Enterprise Computing.
Panel Abstractions for Large-Scale Distributed Systems Henri Bal Vrije Universiteit Amsterdam.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
CS Dept, City Univ.1 Internet of Things: Doubts and Reality Xiaohua Jia Dept. of Computer Science City University of Hong Kong.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Cloud Computing & Big Data Beny. Erlien. Febrian. Ragnar. Billy.
…building the next IT revolution From Web to Grid…
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Challenges of deploying Wide-Area-Network Distributed Storage System under network and reliability constraints – A case study
Workshop on Parallelization of Coupled-Cluster Methods Panel 1: Parallel efficiency An incomplete list of thoughts Bert de Jong High Performance Software.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Giuseppe Andronico INFN Sez. CT / Consorzio COMETA Beijing,
Federal Electronic Records Management: Current Trends and Tools Brave New World of E-Records Puget Sound Region Records Management Seminar June 23, 2010.
TECHNOLOGY GUIDE THREE Emerging Types of Enterprise Computing.
Configuring, Managing and Maintaining Windows Server® 2008 Servers Course 6419A.
The HDF Group January 8, ESIP Winter Meeting Data Container Study: HDF5 in a POSIX File System or HDF5 C 3 : Compression, Chunking,
Cloud Computing ILAS project DONE BY:. Table of content INTRODUCTION. ◦ Cloud computing in general ◦ What are the things that worked during the implementation.
Hemera KickOff October 5th, 2010 Working Group B5 Efficient management of very large volumes of information for data- intensive applications Gabriel Antoniu,
Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,
Collaborative Research Projects in Australia: High Energy Physicists Dr. Greg Wickham (AARNet) Dr. Glenn Moloney (University of Melbourne) Global Collaborations.
Pouya Ostovari and Jie Wu Computer & Information Sciences
EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Giuseppe Andronico INFN Sez. CT & Consorzio COMETA Workshop Clouds.
File system: Ceph Felipe León fi Computing, Clusters, Grids & Clouds Professor Andrey Y. Shevel ITMO University.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
CWG12: Filesystems for TDS U. FUCHS / CERN. O2 Data Flow Schema FLPs Data Network 1500 EPNs ~3o PB, 10 9 files, ~150 GBps Data Management facilities,
3 Hadoop? Cloud data warehousing? Machine learning? NoSQL?
Distributed storage system with dCache E.Y.Kuklin, A.V.Sozykin, A.Y.Bersenev Institute of Mathematics and Mechanics UB RAS, Yekaterinburg G.F.Masich Institute.
Brief introduction about “Grid at LNS”
Big Data, Data Mining, Tools
Big Data Analytics on Large Scale Shared Storage System
Understanding Big Data
XNAT at Scale June 7, 2016.
Grid site as a tool for data processing and data analysis
Distributed Network Traffic Feature Extraction for a Real-time IDS
HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.
Locality-driven High-level I/O Aggregation
Research Introduction
Introduction to Data Management in EGI
Hadoop.
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
DL (Deep Learning) Workspace
Algorithms for Big Data Delivery over the Internet of Things
Emergent Types of Distributed Systems
Grid Canada Testbed using HEP applications
Microsoft Connect /22/2018 9:50 PM
SDM workshop Strawman report History and Progress and Goal.
Hadoop Technopoints.
Big Data Architectures
Zoie Barrett and Brian Lam
Dep. of Information Technology By: Raz Dara Mohammad Amin
AGENDA Buzz word. AGENDA Buzz word What is BIG DATA ? Big Data refers to massive, often unstructured data that is beyond the processing capabilities.
GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Big DATA.
Substation Automation IT Needs
CS 295: Modern Systems Organizing Storage Devices
Tel Hope Foundation’s International Institute of Information Technology, (I²IT). Tel
Task 36a Scope – Storage (L=ChrisH)
SQL Server 2019 Bringing Apache Spark to SQL Server
Convergence of Big Data and Extreme Computing
Presentation transcript:

Large data storage (examples, volumes, challenges) Cluster, Grid, Clouds – Julien Dhallenne

Introduction More and more data generated Social content Scientific projects IoT Current data storage > 10^24 bytes (Yottabyte) How to store this data?

Outline Examples Social Media Scientific measurements IoT – sensors data Volumes Challenges

Examples 1/3 – Social Medias

Examples 2/3 – Scientific measurements The Hadron Collider at CERN (~27km) produces so much data that scientists must discard most of it, hoping they haven’t thrown away anything useful

Examples 3/3 – IoT sensors data Source : Cisco

Outline Examples Social Media Scientific measurements IoT – sensor data Volumes Challenges

Volumes - IoT

Volumes – Youtube data volume Consequenc es : Huge data storage Numerous I/O operations

Challenges – current technologies Source : Seagate’s website, cloud/products/,

Challenges – adapted file systems Source : Seagate’s website, cloud/products/, Quantcast File System. High-performance, fault- tolerant, distributed file system HDFS. Distributed file system providing high-throughput access Ceph. Unified, distributed storage system Lustre. File system for computer clusters GlusterFS. Scale-out NAS file system PVFS. Designed to scale to petabytes of storage

Thank you Questions?