OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL 631-344-3621

Slides:



Advertisements
Similar presentations
Cloud Computing: Theirs, Mine and Ours Belinda G. Watkins, VP EIS - Network Computing FedEx Services March 11, 2011.
Advertisements

S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Evaluation of NoSQL databases for DIRAC monitoring and beyond
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
STTC 20/08/2002 Eva Sánchez-Corral 1 ST Division W2000 Migration status report Organisation and Management Strategy and Planning Review of experience Status.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-1 HDFS itself is “big” Why do we need “hbase” that is bigger and more complex? Word count, web logs.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Hardware Upgrade Project George Palios. Contents Outlines the activities undertaken to upgrade the hardware for the Revenues, Benefits and NNDR Systems.
Titan Graph Database Meet Bhatt(13MCEC02).
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
OSG Public Storage and iRODS
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
Performance Evaluation on Hadoop Hbase By Abhinav Gopisetty Manish Kantamneni.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:
Grid Workload Management Massimo Sgaravatto INFN Padova.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
Development of a noSQL storage solution for the Panda Monitoring System “Database Futures” Workshop at CERN June 7 th, 2011 Maxim Potekhin Brookhaven National.
Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL
Remote Site C Pilot Scheduler Pilots and Pilot Schedulers Jobs Statistics Production Dashboard Dynamic Data Movement Monitor Panda Server (Apache) Development.
OSG Technology Area Brian Bockelman Area Coordinator’s Meeting February 15, 2012.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
OSG Area Coordinator’s Report: Workload Management May14 th, 2009 Maxim Potekhin BNL
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
Julia Andreeva on behalf of the MND section MND review.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
OSG Area Coordinator’s Report: Workload Management March 25 th, 2010 Maxim Potekhin BNL
OSG Area Coordinator’s Report: Workload Management October 6 th, 2010 Maxim Potekhin BNL
OSG Area Coordinator’s Report: Workload Management August 20 th, 2009 Maxim Potekhin BNL
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
TAG and iELSSI Progress Elisabeth Vinek, CERN & University of Vienna on behalf of the TAG developers group.
First results of the feasibility study of the Cassandra DB application in Panda Job Monitoring ATLAS S&C Week at CERN April 5 th, 2011 Maxim Potekhin for.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
OSG Area Coordinator’s Report: Workload Management June 3 rd, 2010 Maxim Potekhin BNL
BIG DATA/ Hadoop Interview Questions.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Scaling HDFS to more than 1 million operations per second with HopsFS
Computing Operations Roadmap
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Maximum Availability Architecture Enterprise Technology Centre.
NOSQL.
Savannah to Jira Migration
WLCG Service Interventions
HEPiX Fall 2017 CERN project Follow-up
Data Lifecycle Review and Outlook
CPU Scheduling G.Anuradha
Presentation transcript:

OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL

2 Workload Management: Panda Panda Monitoring:  Closer integration of the existing Panda Monitoring System with the Global Dashboard  Upgrade lowered in priority due to existing functionality in the Dashboard (ATLAS decision) Scalability of Panda:  Typical throughput almost doubled in the past 12 month, from about 250k daily jobs run globally, to almost 500k per day, with peak count of 713k in the final days of data reprocessing in Nov’10  That puts more pressure on the database (Oracle), which is used for keeping complete state of the system, monitoring and data mining for performance analysis  Data is heavily indexed and indexes can block during copying of data across tables  The DB engine sometimes make suboptimal choices when confronted with multiple indexes  In the fall of 2010, there were a few problem days after a series network outages:  resulting disbalance of data distribution across tables, lots of backlog be to copied hence decreased performance  Multiple DB optimizations have been implemented since, notably table partitions  Demonstrated increase in performance  Some queries are still problematic and require workarounds

3 Workload Management: Panda Scalability of Panda, cont’d:  Along with DB optimization, alternatives are being considered for storage of finalized job data (archive), where Oracle is redundant – looking at noSQL solutions in particular – such as Cassandra, HBASE etc  noSQL advantages (such as Cassandra):  When compared to traditional RDBMS, more cost-effective horizontal scaling with commodity hardware and media  Load-balanced, redundant, truly distributed system  Extremely fast sinking of data with proper configuration (important)  Demonstrated performance of noSQL solutions in industry (Amazon, Facebook, Twitter, Google etc)  In December 2010, started an evaluation of Cassandra with real Panda job data feed  Test cluster (3 nodes) located at CERN  Data repository at Amazon S3  First round of testing encouraging, data design ongoing  To be evaluated at the ATLAS Software Week at CERN in April

4 Workload Management: Engagement CHARMM:  Thanks to 17+ active sites used the recent run was expedient, according to the team  Resource requirements turned out to be pretty precise (encouraging)  The last wave of jobs is finishing right now and the data goes to the experimental group, only 408 jobs submitted in the past month LBNE/Daya Bay  Jobs ran at PDSF and BNL (J.Caballero), a number of issues discovered and resolved, such as:  Peculiarities of WN configuration at PDSF (version of curl)  Suboptimal job configuration resulted in some jobs running out of memory, which is now fixed  Additional software optimization was done by the researchers (MC)  An announcement went out on the Daya Bay mailing list that the initial production run will start in a few days  An additional cluster at IIT (Illinois) is under construction  Panda user documentation is being reviewed as per researchers’ request