Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ben Jones 12/9/2013 NEC'20132.

Similar presentations


Presentation on theme: "Ben Jones 12/9/2013 NEC'20132."— Presentation transcript:

1

2 Ben Jones ben.dylan.jones@cern.ch 12/9/2013 NEC'20132

3 Agile Infrastructure Why change the operating model? Twice the compute, same staff levels New DC at Wigner, Budapest “We’re not special” Existence of open source tool chain: OpenStack, puppet, foreman, kibana “Coffee time” provisioning of cloud servers 12/9/2013 NEC'20133

4 12/9/2013 NEC'2013 4

5 New Data Centre 12/9/2013 NEC'2013 5 Data centre in Geneva at the limit of electrical capacity at 3.5MW New centre chosen in Budapest, Hungary Additional 2.7MW of usable power Local on-site support for hardware maintenance and installations

6 What is Cloud? Technology model virtualization of compute, network, storage Operational model run your services in a certain way Consumption model “don’t make me talk to IT” delivered instantly* over the wire, variable price 12/9/2013 NEC'20136

7 12/9/2013 NEC'2013 7 What is IaaS?

8 Private Cloud Software 12/9/2013 NEC'20138 We use OpenStack, an open source cloud project http://openstack.orghttp://openstack.org ATLAS and CMS High Level Trigger clouds HEP Clouds at BNL, IN2P3, NECTaR, FutureGrid, … Clouds at HP, IBM, Rackspace, eBay, PayPal, Yahoo!, Comcast, Bloomberg, Fidelity, NSA, CloudWatt, Numergy, Intel, Cisco …

9 OpenStack Apache 2.0 licensed No “enterprise” version Open Source Open design summit Anyone is able to define core architecture Open Design GitHub Launchpad Open Development OpenStack foundation in 2012 Now 190+ companies, 3000+ developers, 11000+ members Open Community 12/9/2013 NEC'20139

10 12/9/2013 NEC'201310 Microsoft Active Directory CERN DB on Demand CERN Network Database Account mgmt system Horizon Keystone Network Compute Glance Scheduler Cinder Nova Block Storage Provider

11 Nova Cloud computing fabric controller Network manager modified for CERN integration with network database specific to our use case, not pushed upstream Nova Compute aware of CERN DNS & AD Multiple availability zones special zone for Hyper-V scheduler has filter based on image distribution metadata 12/9/2013 NEC'201311

12 Glance Services for discovering, registering and retrieving VM images Aim for automated image creation / update common process for Linux & Windows images common tools – Aeolus Oz CERN tools to hook up Oz & Glance API Images for all CERN supported OS user defined images supported Initial contextualization via cloud-init Cloudbase contributed cloud-init for windows 12/9/2013 NEC'201312

13 Keystone Identity service: authentication, authorization and service catalog Full integration with Active Directory via LDAP CERN’s AD: 44K users & 29K groups Minimal changes to AD CERN submitting changes upstream Account mgmt. System Integration for project creation / deletion SSL for everything 12/9/2013 NEC'201313

14 12/9/2013 NEC'201314

15 Operational practices evolving Security incidents old: reinstall, new: replace with new VM Misconfiguration requiring reboot Resize a service lxplus.cern.ch add VMs to serve demand resize VMs (or rather, replace with bigger) In future resize services automatically 12/9/2013 NEC'201315

16 Service Models 12/9/2013 NEC'201316 Pets are given names like pussinboots.cern.ch They are unique, lovingly hand raised and cared for When they get ill, you nurse them back to health Cattle are given numbers like vm0042.cern.ch They are almost identical to other cattle When they get ill, you get another one

17 Some other use cases… Hippos are cattle with block storage. Useful where there is redundancy, ie MongoDB, Cassandra. Canaries are cattle at high risk to give early warning of failures. Fail fast and fix. 12/9/2013 NEC'201317

18 Heat Heat orchestrates composite cloud apps (stacks) HA (restarts resources) & “auto-scaling” 12/9/2013 NEC'201318

19 Configuration Management Adopted puppet widely used, large community, scales Needed to make reproducible services in the CERN CC Simplify the configuration of OpenStack itself. community modules from RH, puppetlabs, users 12/9/2013 NEC'201319

20 12/9/2013 NEC'201320

21 Accounting CERN computing is funded from CERN central budgets, no billing but quotas Experiments don’t have credit cards What to do when quota is exceeded? Unused capacity? low SLA usage to plug the gaps? Fair share across the cloud? Worked for supercomputers but heavy for clouds at scale Bursting to public clouds? 12/9/2013 NEC'201321

22 Ceilometer Accounting for OpenStack by project Collects statistics from each compute node common OpenStack message bus Sharded MongoDB store 2gb / day HyperV in Havana Cinder statistics upcoming 12/9/2013 NEC'201322

23 CERN Status CERN IT OpenStack Cloud Folsom based service ~500 hypervisors on KVM and Hyper-V New “grizzly” production service opened late July 280 hypervisors, 600 VMs, 50 projects and growing rapidly High availability components using load balancing ie 3 nova controllers per cell All Puppet managed to configure OpenStack LHC experiment farms CMS currently running 1,300 hypervisors with 50,000 cores ATLAS starting to ramp up to a similar size Other science grid sites moving to private cloud on OpenStack Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP, … 12/9/2013 NEC'201323

24 Outlook Track stable Grizzly releases in RedHat RDO Up to date but not too close to the leading edge Scaling Expect 15,000 hypervisors, 150,000 VMs by 2015 Manageability Metering, Orchestration with Heat, Bare Metal Functionality Load Balancing, High Availability Storage and Pets 12/9/2013 NEC'201324

25 What have we learnt? Automate everything from the beginning Puppet and Stackforge are a great help Distributions and appliances make getting started much easier Constant rate of change requires a different approach Focus on core technologies and keep up to date Track new projects but don’t adopt too early unless strategic Many of our users are cloud aware Culture changes for legacy application coding and IT services Communities are major motivators But administrators need to engage and adapt rather than re- invent 12/9/2013 NEC'201325

26 Conclusions CERN IT is re-engineering to deliver additional capacity to 11,000 physicists within fixed resources Clouds models can simplify current large scale computing infrastructure OpenStack and its ecosystem allows us to meet this challenge and help others through open source 12/9/2013 NEC'201326

27 Questions ? 12/9/2013 NEC'2013 27

28 Preproduction Service 12/9/2013 NEC'2013 28

29 12/9/2013 Bamboo Koji, Mock AIMS/PXE Foreman AIMS/PXE Foreman Yum repo Pulp Yum repo Pulp Puppet-DB mcollective, yum JIRA Lemon / Hadoop / LogStash / Kibana Lemon / Hadoop / LogStash / Kibana git OpenStack Nova OpenStack Nova Hardware database Puppet Active Directory / LDAP Active Directory / LDAP NEC'201329

30 Training for Newcomers 12/9/2013 NEC'201330 Buy the book rather than guru mentoring

31 Job Opportunities 12/9/2013 NEC'2013 31


Download ppt "Ben Jones 12/9/2013 NEC'20132."

Similar presentations


Ads by Google