CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities IT Monitoring CERN IT-CF HEPiX Fall 2013.

Slides:

Advertisements

Similar presentations

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.

Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.

1 Vic Hargrave |

Evaluation of NoSQL databases for DIRAC monitoring and beyond

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Workload Management Massimo Sgaravatto INFN Padova.

Understanding and Managing WebSphere V5

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.

USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.

 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.

Elasticsearch in Dashboard Data Management Applications David Tuckett IT/SDC 30 August 2013 (Appendix 11 November 2013)

M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Version control Using Git Version control, using Git1.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.

CERN IT Department CH-1211 Genève 23 Switzerland t Experiences running a production Puppet Ben Jones HEPiX Bologna Spring.

CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities IT Monitoring IT/CF IT/OIS – IT/PES IT Technical.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

Carlos Fernando Gamboa RACF, BNL HEPiX

CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

CERN IT Department CH-1211 Genève 23 Switzerland t IT Configuration Activities Gavin McCance Online Cross-experiment Meeting, 14 June 2012.

Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?

HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.

Tool Integration with Data and Computation Grid “Grid Wizard 2”

+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.

CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.

CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.

Next Generation of Apache Hadoop MapReduce Owen

Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.

Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Improving resilience of T0 grid services Manuel Guijarro.

Copyright © New Signature Who we are: Focused on consistently delivering great customer experiences. What we do: We help you transform your business.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.

Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.

CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.

Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences

A presentation on ElasticSearch

Wataru Takase, Tomoaki Nakamura, Yoshiyuki Watase, Takashi Sasaki

Platform as a Service (PaaS)

Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.

Platform as a Service (PaaS)

WinCC-OA Log Analysis SCADA Application Service - Reporting

GWE Core Grid Wizard Enterprise (

Open Source distributed document DB for an enterprise

Collecting heterogeneous data into a central repository

Experience in CMS with Analytic Tools for Data Flow and HLT Monitoring

DI4R, 30th September 2016, Krakow

Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.

Michael Mast Senior Architect

The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.

Virtualization in the gLite Grid Middleware software process

Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.

CS6604 Digital Libraries IDEAL Webpages Presented by

Module 01 ETICS Overview ETICS Online Tutorials

Overview of big data tools

another noSql customization for the HDB++ archiving system

Introduction to Elasticsearch with basics of Lucene May 2014 Meetup

OpenStack Summit Berlin – November 14, 2018

Indexing with ElasticSearch

OpenStack for the Enterprise

Presentation transcript:

CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities IT Monitoring CERN IT-CF HEPiX Fall th October 2013

Computing Facilities 2 Motivation –Several independent monitoring activities in CERN IT –Combination of data from different groups necessary –Understanding performance became more important –Move to a virtualized dynamic infrastructure Challenges –Implement a shared architecture and common tool-chain –Delivered under a common collaborative effort Introduction

Computing Facilities 3 Adopt open source tools –For each architecture block look outside for solutions –Large adoption and strong community support –Fast to adopt, test, and deliver –Easily replaceable by other (better) future solutions Integrate with new CERN infrastructure –AI project, OpenStack, Puppet, Roger, etc. Focus on simple adoption (e.g. puppet modules) Strategy

Computing Facilities 4 –Integrate data from multiple producers –Scalable transport collect operations data –Long term archival for offline processing –Analytics: real time queries, limited data retention –Visualization: dynamic and user- friendly dashboards Architecture Analytics Long Term Repository Analytics Long Term Repository Notifications Visualization Producer Transport

Computing Facilities 5 Technologies

Computing Facilities 6 Distributed service for collecting large amounts of data – Robust and fault tolerant – Horizontally scalable, multi-tier deployment – Many ready to be used input and output plugins Avro, Thrift, JMS, Syslog, HTTP, ES, HDFS, Custom, … – Java based, Apache license Flume

Computing Facilities 7 Flume event –Byte payload + set of string headers Flume agent –JVM process hosting “source -> sink” flow(s) Data Flow

Computing Facilities 8 Routing is static –On demand subscriptions are not possible –Requires reconfiguration and restart No authentication/authorization features –Secure transport available Java process on client side –Smaller memory footprint would be nicer Needs tuning to correctly size flume layers Available sources and sinks saved a lot of time Horizontally scalable In-flight data filtering possible Feedback

Computing Facilities 9 Distributed framework for large data sets processing Distributed filesystem designed for commodity HW – Suitable for applications with large data sets – Cluster provided by other IT group (DSS) – Data stored by cluster (might be revised) – Daily jobs to aggregate data by month Feedback – Large analytics potential to explore – Reliable external long term repository Hadoop HDFS 9

Computing Facilities 10 Distributed RESTful search and analytics engine – Real time acquisition, data is indexed in real time – Automatically balanced shards and replicas – Schema free, document oriented (JSON) No prior data declaration required Automatic data type discovery – Based on Lucene (full-featured IR library) Full text search – RESTful JSON API ElasticSearch 10 $ curl -XGET { "cluster_name" : "itmon-es", "status" : "green", "timed_out" : false, "number_of_nodes" : 11, "number_of_data_nodes" : 8, "active_primary_shards" : 2990, "active_shards" : 8970, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0 }

Computing Facilities 11 Used by many large companies –Soundcloud “To provide immediate and relevant results for their online audio distribution platform reaching180 million people” –Github “20TB of data using ElasticSearch, including 1.3 billion files and 130 billion lines of code” –Foursquare, Stackoverflow, Salesforce,... Distributed under Apache license ElasticSearch

Computing Facilities 12 Requires a lot of RAM (Java), IO intensive –Take into account when planning deployment Shards re-initialisation takes some time (~1h) –Not frequent operation, only after full cluster reboot Authentication not built-in -Done with Jetty plugin: Access control, SSL Monitoring: many plugins available -ElasticHQ, BigDesk, Head, Paramedic, … Easy to deploy and manage Robust, fast, and rich API More features coming with aggregation framework Feedback

Computing Facilities 13 Visualize time-stamped data from ElasticSearch – Designed to analyse log, perfectly fits time stamped data – No code, point & click to build your own dashboard – Built with AngularJS (from google) – Open source, community driven Supported by ElasticSearch Provided code/feature contribution – Easy to install & configure “git clone” OR “tar -xvzf” OR ElasticSearch plugin 1-line config file to point to the ElasticSearch cluster Kibana

Computing Facilities 14 Easy to install and configure Very cool user interface Fits many use cases (e.g. text, metrics) Still limited feature set, but active growing community Feedback

Computing Facilities 15 Producers –From all puppet-based data centre nodes –Infrastructure & Application monitoring Flume –10 aggregator nodes, 5 nodes to HDFS + 5 nodes to ES HDFS –~500 TB cluster, 1.8 TB collected since mid July 2013 ElasticSearch –1 master node, 1 search node, 8 data nodes –90 days TTL, 10 shards/index, 2 replicas/shards –Running ElasticSearch Kibana plugin Deployment

Computing Facilities 16 Deployment

Computing Facilities 17 Every IT service needs monitoring Similar technologies fit different use cases Community makes you stronger –Does not impose solutions –Leverages the effort of adopting, deploying and running Examples of what other teams in IT have been doing with these tools –IT-OIS Cloud Service –IT-PES Batch service Community

Operating Systems & Information Services 18 Cloud Infrastructure CERN Cloud Infrastructure –Based on OpenStack –Consists of several services (Keystone, Nova, Glance, Cinder, …) Production deployment –664 nodes, cores, ~ 1800 VMs Requirements –Centralized copy of logs for investigation –Display OpenStack usage statistics and functional tests –Maintain a long term history of the infrastructure status

Operating Systems & Information Services 19 Example

Computing Facilities 20 Running LSF 4000 servers with over CPU cores jobs/day Requirements -Tool enabling 2nd line support for end users -More specific/relevant information display with Kibana -Kibana dashboards opened to Helpdesk, users -No need for engineer-level personnel replying to requests Batch monitoring

Computing Facilities 21 Example

Computing Facilities 22 Several interesting technologies tested and deployed Full workflow deployed for concrete monitoring needs Verified by different monitoring producers Improve monitoring tools under a common effort Summary

Computing Facilities 23 Questions??