Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kolkata Tier ALICE and Status Site Name :-

Similar presentations


Presentation on theme: "Kolkata Tier ALICE and Status Site Name :-"— Presentation transcript:

1 Kolkata Tier II @ ALICE and Status Site Name :-
Tier-2 Site for the WLCG (World Wide Computing Grid) GOCDB Name:- IN-DAE-VECC-02 VO :- ALICE Group:- EHEPAG Unit:- VECC City:- KOLKATA Country :- INDIA Team:- Subhasis Chattopadhyay Vikas Singhal Prasun Singh Roy T. K. Samanta and S. K. Pal helped in establishing the centre in the initial years. Grid Computing Architecture WLCG Grid based on MONARCH Tier model

2 Grid Computing Facility at VECC Journey from 2002 to till now
(2 core to 4000 cores) Storage (512MB to 400TB) Bandwidth (128Kbps to 1Gbps) Year 2 PC 512MB 128Kbps shared link 2002 2 Tower Servers 40GB as DAS 512Kbps 2003 9 HP 1U Servers 400GB in HP MSA 500 2Mbps Dedicated Link 2004 17 Wipro 1U Single Core 2TB Wipro NAS 4Mbps from Bharti 2006 40HP blades dual core 108TB HP EVA SAN 30Mbps from Reliance 2008 8 HP Blades Quad Core 25 TB i-scsi 100Mbps from VSNL (ERNET) 2009 32 Dell, Dual Processor 200TB IBM DS 5100 300 Mbps from NKN 2011 GPU Server (448 cores) 2TB HDD 1Gbps NKN 2012 Intel Xeon Phi 244 core 150 TB Disk Servers Requested for 10Gbps 48 Dell Servers installing 10 Gbps (within 2 weeks) 2017 Why Tier-2 India’s contribution in 2 main detectors of the ALICE PMD and Muon Arm and large volume of raw, calibrated and simulated data, therefore decided to build TIER-2. VECC, Kolkata is the only Tier-2 for the ALICE. CERN 128Kbps Bandwidth 2002 started with 2 Computers 2

3 Evolution of Grid Computing Facility and Cooling Solution
2006 2008 2010 2012 Cooling Solution logical diagram Evolution of Grid Computing Facility and Cooling Solution now Hot and Cool Air is separated via Cold Aisle Containment. Temperature gradient between Cold and Hot aisle is 5oC. Power usage effectiveness (PUE) =Total Facility Power/ IT Equipment Power = 1200Units / 816Unit per Day = 1.47 Cooling solution reduced cooling power consumption by half. Management and monitoring of the server, storage is from outside Cold Aisle Containment. 3

4 Nearly 4000000 jobs successfully completed, at Kolkata Cream
ALICE Job Nearly jobs successfully completed, at Kolkata Cream Consistently every hour more than 70 Jobs successfully completing during last 6 years. No AMC for any server. Maintaining In-house only 24x7 Operation, 95% availability Vikas Singhal, VECC, INDIA 4

5 Kolkata Tier-2 Resources
Total :-Computing -> 448 cores (Equivalent 5K HEPSpec Computing Resources) DELL Quad Core Blades 28*2*4*2=448(HT) Due to new hardware arrived few older nodes removed. Within a month new cores will be under production. Storage :- 174 TB (78TB SAN Based) 1 Xrootd Redirector 1 disk Server SAN based 78TB (IBM) No warranty for the HP EVA (72TB) Will move this data if could recover. EOS :- 96TB (Usable) 3 Dell PowerEgde R730 48TB each 1Gbps WAN Network speed (Will increase to 10Gbps) 10Gbps Backbone Network

6 More than 2500 HT cores of Computing Resources procured
Procured 12 DELL PowerEdge FX2 Enclosures. Each contain 4 DELL PowerEdge FC630 servers. Each server configuration:- 2 Nos of Intel Xeon E v4 2.4 GHz with 14 cores 8 * 16 GB RDIMM, 2400MT/s 960 GB of SSD harddisk. 2 * 10 Gigabit network cards. Total cores 48*2*14*2=2688 (HT) Installation is almost complete. Scientific Linux CERN 6.8 installed 10G network connected. Benchmarking is going on. Approx cost of the equipment = $ 300,000.

7 51TFlops of Computing Resources
Theoretical Peak Performance Rpeak= CPU speed GHZ * total number of core * operation/cycle Rpeak = 2.4*28*16 gigaflop = gigaflop = teraflop (Single Server) Rpeak value for all 48 nodes will be Rpeak = *48 Tflops = Tflops Preliminary HPL Benchmarking test results:- (Test completed yesterday only) is e+04 Gflops or Tflops. HPL]# tail -20 xhpl_intel64_dynamic_outputs_48nodes_log2.txt  T/V                N    NB     P     Q               Time                 Gflops WC00C2R2         192    12     8                       e+04 HPL_pdgesv() start time Mon Oct  9 22:28: HPL_pdgesv() end time   Tue Oct 10 01:18: ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        PASSED ============================================================ Finished      4 tests with the following results:               4 tests completed and passed residual checks,               0 tests completed and failed residual checks,               0 tests skipped because of illegal input values End of Tests. Want to know HEP Spec 2006 number for the Intel Xeon E v4 2.4 GHz processors with 35 MB of cache.

8 10G Switches installed at both VECC and SINP site.
Physical Network Connectivity upto Kolkata CERN LHCONE Upgradation to 10G Redundant Link upto Kolkata-tier2 is about to finish. 10 Gbps 4 Gbps/10 Gpbs Shared TEIN-3 Network TIFR Mumbai 10 Gbps 10 Gbps Internet 1 Gbps POP in Mumbai 10 Gbps 10 Gbps Core Network 10 Gbps 10G Switches installed at both VECC and SINP site. POP in Kolkata 10 Gbps SINP Kolkata 10 Gbps 4-10 Gbps 1 Gbps x Gbps 1 Gbps 10 Gbps VECC Kolkata 1 Gbps Kolkata Tier-2 8

9 1G Network Utilization by Kolkata Tier-2 during April 2017

10 Bandwidth Test for Kolkata Tier-2

11 Backbone Network inside Kolkata Tier-2
Brocade Switches 10Gb connection to each server, 40Gb connectivity between the core switches, Both Fiber and Copper connectivity, Connection via DAC Cable. Dual path, Full Redundancy. Vikas Singhal, VECC, INDIA

12 Restructured Power Distribution
Replaced earlier 3*40KVA UPS due to safety measures. Procured a new 80KVA UPS and one redundant line via 160 KVA UPS from Computer Division.. Proper planning done to avoid single point of failure. Two MCBs boxes installed. Providing UPS power to MCB Box from 2 different sources. Every network rack is getting power from two MCBs to avoid SPOF. Cabling to servers are done in such a way that every server is getting power from both MCB A and MCB B. Thanks to ELECTRICAL Section, VECC for coordinating and performing the entire work. 12

13 Grid-Peer Tier-3 Cluster Status
More Load on the CLUSTER as CBM user also utilizing the cluster 8 Numbers of HP BL675G7 Blade servers each with 4 * AMD Opteron Processor 6380(16 Core) 3 Number of Dell M610 Blade servers each with 2 * Intel Quad Core E5530 Xeon 2.4GHz CPU 8MB cache and 16GB RAM. 6 out of 8 HP blade servers are dedicated for non-interactive nodes and rest is being used for CBM work. 3 Dell blade servers are being used as interactive node. Extensively used by VECC users and PMD Collaborators, completed more than jobs successfully in last 3 months. 75 TB storage, almost filled up. 75 + active users (across India.) 45 + active users (in VECC.) Tape based backup of Tier-3 storage performed twice in a month. 13

14 Connectivity between Asian Tiers
Not bad increasing day by day. 10/10/2017

15 Achievements Milestone (Achievement) Reasoning
Grid Computing Facility and Awareness. Only Center in India for ALICE. Awareness towards the High Performance Computing (HPC). Expanding the knowledge of GRID Computing and Related Technology. Achieved initial pledge in 2012. within XI plan budget (No extra) Green and Efficient cooling solution implemented. (First in Eastern India) No cooling loss. Less power required. Server efficiency increased. Implemented similar at NIBMG, DBT Successfully running for last 15 years more than 90% availability. Tier-3 Cluster for Collaborators. 100% utilized by Indian Collaborators. More than 25 PhD thesis completed using Grid Computing Facility of VECC. Supporting STAR, ALICE, CBM, Medical Imaging, INO Providing computing resources to all the projects. Trained more than 30 graduate student for working in HPC community. (Can Participate in National Super Computing Mission) Indian Grid Certification Authority (IGCA), Bangalore Due to requirement for ALICE only, IGCA established. Thanks to Subrata da and his team.) Knowledge of Digital and Grid Certificate. (RA for IGCA) National Knowledge Network (NKN) India connected with the LHC- ONE Network via NKN. Asian Tier Centre Forum Networking and data exchange between Asian countries.

16 Future Road Map and Vision
Planning to collaborate with Industry. Exploring Cloud Computing via Industry partnership. (Discussion going on with Microsoft (Brij initiated the same)). Parallel computing is the only solution for utilizing present computing infrastructure. Accelerated or Heterogeneous Computing is new evolving field. (Participating at FAIR in this direction). Focus on High Throughput Computing (HTC) using of Accelerates like NVIDIA GPU, AMD, APU, Intel Co-processors. Spreading knowledge on Parallel and Heterogeneous Computing. For the huge data resources, using the low cost storage solution based on the EOS CERN. Procuring the low cost storage boxes.

17 Thank You


Download ppt "Kolkata Tier ALICE and Status Site Name :-"

Similar presentations


Ads by Google