HPC system for Meteorological research at HUS Meeting the challenges Nguyen Trung Kien Hanoi University of Science Melbourne, December 11 th, 2012 High.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
PowerEdge T20 Customer Presentation. Product overview Customer benefits Use cases Summary PowerEdge T20 Overview 2 PowerEdge T20 mini tower server.
Overview of MapReduce and Hadoop
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
An Introduction to Sector/Sphere Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago and VeryCloud June 22, 2010.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
International Workshop APAN 24, Current State of Grid Computing Researches and Applications in Vietnam Nguyen Thanh Thuy 1, Nguyen Kim Khanh 1,
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
HPC at IISER Pune Neet Deo System Administrator
1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),
The Hadoop Distributed File System
DoSon Grid Computing School, Current State of Grid Computing Researches and Applications in Vietnam Vu Duc Thi 1, Nguyen Thanh Thuy 2, Tran Van.
Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HPC at HCC Jun Wang Outline of Workshop1 Overview of HPC Computing Resources at HCC How to obtain an account at HCC How to login a Linux cluster at HCC.
Ceph Storage in OpenStack Part 2 openstack-ch,
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Introduction to Hadoop and HDFS
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Cyberinfrastructure for Distributed Rapid Response to National Emergencies Henry Neeman, Director Horst Severini, Associate Director OU Supercomputing.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Large Scale Parallel File System and Cluster Management ICT, CAS.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Supercomputing versus Big Data processing — What's the difference?
Map reduce Cs 595 Lecture 11.
MERANTI Caused More Than 1.5 B$ Damage
Experience of Lustre at QMUL
Introduction to Distributed Platforms
Large-scale file systems and Map-Reduce
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Experience of Lustre at a Tier-2 site
Introduction to MapReduce and Hadoop
The Basics of Apache Hadoop
Presentation transcript:

HPC system for Meteorological research at HUS Meeting the challenges Nguyen Trung Kien Hanoi University of Science Melbourne, December 11 th, 2012 High resolution Vietnam projections workshop

Contents Computational/Storage Needs 1 Scale to meet the needs 2 GC for meteorological research 3 Q&A 4

Computational and Storage needs up to 2013 National projects: – Development of seasonal prediction system for prediction of extreme climate events for natural disaster prevention in Vietnam – Building an operational ensemble assimilation system for numerical weather prediction and an ensemble system for regional climate models to forecast and project extreme weather and climate events

Computational and Storage needs up to 2013 Joint DANIDA project: – Climate Change-Induced Water Disaster and Participatory Information System for Vulnerability Reduction in North Central Vietnam Joint project with CSIRO – Australia: – High-resolution downscaling for Vietnam: Facilitating effective climate adaptation

Computational and Storage needs up to 2013 Weather Forecast: MM5, HRM, WRF – 3-day forecast - 4 times daily – 2 hours/run (on 1 compute node – 2xQuadCore 2.5GHz, 8GB Ram) Tropical Cyclone Detection: RegCM – 12-month detect – 1 time monthly – 140 hours/run – Stores 70 GB data Seasonal Forecast: MM5, WRF, RegCM – 7-month forecast - 1 time weekly – hours/run – Stores 6-16 GB data

Computational and Storage needs up to 2013 Climate simulation 1979 – 2010: – Boundary, Initial condition: ERA40, NCEP, INTERIM – Models: RegCM, MM5CL, REMO, clWRF – 2-5 hours/month, output ~ 5GB data Climate projection : – A1B, A2 scenarios – Models: MM5CL, CCAM, RegCM, clWRF, REMO – 2-5 hours/month, output ~ 5GB data

Computational and Storage needs up to 2013 Large number of users: – 10 – staff – 2-3 PhD students – 5-6 Master students – > 15 Bachelor students – Users from other organizations Need to store data from previous projects The total storage needs: > 100 TB

Computational and Storage needs up to 2013 System as of 2011: – 11 compute nodes – Rpeak = 880 GFlops – Desktop HDD – low cost, low MTBF + 1 Gbps + NFS => Storage has low read/write speed and not reliable – Low bandwidth interconnect network (1Gbps) => Max Performance << Peak Performance Computational and Storage needs are “huge” => System needs to be upgraded

Scale to meet the needs - Network Use infiniband instead of Ethernet: – Many versions: SDR, DDR, QDR, FDR, … – Bandwidth support from 10 -> 56 Gbps – The servers support only PCI Express x4 => choose Infiniband SDR 4x - 10Gpbs

Scale to meet the needs - Storage Hot spare Raid5

Scale to meet the needs - Storage Hot spare Raid5 Hot spare Raid5

Scale to meet the needs - Storage Hot spare Raid5 Hot spare Raid5 Infiniband (10Gbps)

Scale to meet the needs - Storage Hot spare Raid5 Hot spare Raid5 Infiniband (10Gbps) Use only Enterprise SAS/SATAHDD LustreFS

Scale to meet the needs 14 node, 106 core, 141 GB RAM Rocks cluster 5.5 Rpeak ~ 1 Tflops Infiniband SDR 10Gbps & 1Gbps interconnect network 76 TB LustreFS using Enterprise HDDs LustreFS: from 1 client: ~ 700 MB/s, aggregate throughput up to 1.8 GB/s Infiniband 10Gbps 1Gbps Ethernet / /24 METOCEAN Cluster

Scale to meet the needs - Hadoop 76 TB LustreFS is not enough Compute nodes have 6-36 drive slots Lots of used Desktop SATA disks 0.3 – 2TB Commodity hardware (except LustreFS HDD) CC an we create reliable storage from the available hardware with constrained budget? HH adoop Distributed File System (HDFS)

Name node Data node3 Client Two way replication: A file is cut into 64MB blocks Each block is written onto two different datanodes Data node1 Data node2

Scale to meet the needs - Hadoop Name node Client Client read data block from datanode directly => high read speed Data node3 Data node1 Data node2

Scale to meet the needs - Hadoop Name node Fault Tolerance: under-replicated block is automatically copied to another datanode Data node2 Data node3 Data node1

Scale to meet the needs - Hadoop 60 TB “Cloud Storage” HDFS using Desktop HDDs Stores large files (multiple of 64MB) Replication factor = 2 (useful space: 30 TB) Mounted to Linux FS with FUSE HDFS & LustreFS metadata uploaded automatically to Dropbox Cloud Storage for disaster recovery Infiniband 10Gbps 1Gbps Ethernet / /24 Dropbox Cloud Storage fsimage edits MDT image

Grid Computing for meteorological research Demand for Computational and Storage for Meteorological Research has no limit Need more Computational power: – look further into the future, e.g.: 5day forecast – Better forecast: ensemble forecast needs 10s of models (MPI jobs) running in parallel – Operational Real-time forecast More data to save => need more storage: – 100s of TB to PB of data Constrained budget

Grid Computing for meteorological research Meteorological research organizations: HUS SRHMC IMHEN HUNRE NCHMF Clusters of different size: – 100s of Gflops to some Tflops – Some to 10s of TB storage Resources should be connected and shared to do bigger problems than each of us can do

Grid Computing for meteorological research HUS SRHMC IMHEN HUNRE NCHMF VINAREN 155 Mbps VINAREN 155 Mbps Grid/Cloud Storage to share data Computational Grid for Ensemble Forecast

Grid Computing for meteorological research Workload Management System NCHMF IMHEN SRHMC HUNRE HUS MPI/MapR Jobs Ensemble Forecasts

Grid Computing for meteorological research HUS SRHMC IMHEN HUNRE NCHMF VINAREN European Earth Science Grid TEIN TEIN = Trans-Eurasia Information Network Bandwidth = 622 Mbps

Questions? Thank you for your attention!