Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November.

1 Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November 7, 2013

2 Science needs computing power ● High-performance computing ● High-throughput computing – Thousands or millions of independent jobs – What matters is the rate of job completion, not the turnaround time of individual jobs

3 High-throughput computing applications ● Physical simulation – particle collision – atomic/molecular (bio, nano) – Earth climate system ● Compute-intensive data analysis – particle physics (LHC) – Astrophysics (radio, gravitational) – genomics ● Bio-inspired optimization – genetic algorithms, flocking, ant colony etc.

4 Approaches to HTC ● Cluster computing – lots of commodity or rack-mounted PCs in a room ● Grid computing – share clusters between organizations ● Cloud computing – rent cluster nodes, e.g. Amazon EC2 ● Volunteer computing – use computers owned by consumers

5 The Consumer Digital Infrastructure ● Computing devices – Desktop and laptop computers – Mobiles devices: tablets, smart phones – Game consoles – Set-top boxes, DVRs – Appliances ● Commodity Internet – Cable, DSL, fiber to the home, cell networks

6 Measures of computing speed ● Floating-point operation (FLOP) ● GigaFLOPS (10 9 /sec): 1 Central Processing Unit (CPU) ● TeraFLOPS (10 12 /sec): 1 Graphics Processing Unit (GPU) ● PetaFLOPS (10 15 /sec): 1 supercomputer ● ExaFLOPS (10 18 /sec): current Holy Grail

7 CDI performance potential ● 1 billion Desktop/laptop PCs – CPUs: 10 ExaFLOPS – GPUs: 1,000 ExaFLOPS ● 2.5 billion smartphones – CPUs: 10 ExaFLOPS

8 Volunteer computing ● Consumers donate computing capacity to – support science – be in a community – compete ● History – 1997: GIMPS, – 1999: SETI@home, Folding@home – 2003: BOINC

9 Limiting factors ● Volunteership – Study of college students [Toth 2006] ● 5% would “definitely participate” ● 10% would “possible participate” ● PC availability – 65% average availability [Kondo 2008] – 35% of PCs are available 24/7

10 Other limiting factors ● Network bandwidth (client, server) – Commodity Internet ● Memory, disk usage – new PCs average 6 GB RAM

11 BOINC: middleware for volunteer computing ● Supported by NSF since 2002 ● Open source (LGPL) ● Based at University of California, Berkeley ●

12 Volunteer computing with BOINC volunteers projects CPDN LHC@home WCG attachments

13 How to volunteer

14 Choose projects

15 Configure

16 Community

17 Creating a BOINC project ● Install BOINC server software on a Linux box ● Compile apps for Windows/Mac/Linux ● Attract volunteers – develop web site – generate publicity – communicate with volunteers

18 Volunteer computing today ● 500,000 active computers ● 50 projects ● 15 PetaFLOPS average

19 Some BOINC-based projects ● IBM World Community Grid ● Einstein@home ● ● LHC@home ● Rosetta@home

20 Cost The cost of 10 TeraFLOPS for 1 year: ● CPU cluster: $1.5M ● Amazon EC2: $4M – 5,000 small instances ● Volunteer: ~ $0.1M

21 How BOINC works home PC BOINC client project HTTP download data, executables compute upload outputs BOINC server get jobs

22 Issues handled by BOINC ● Heterogeneous computers ● Untrusted, anonymous computers – Result validation ● replication, adaptive replication ● Credit: amount of work done ● Consumer-friendly client

23 Using GPUs ● BOINC detects and schedules GPUs – NVIDIA, AMD, Intel – multiple/mixed GPUs – various language systems (CUDA, OpenCL, CAL) ● Issues – non-preemptive GPU scheduling – no paging of GPU memory

24 Multicore apps ● Next-generation PCs may have 100 cores ● BOINC supports multi-core apps – OpenMP, MPI – OpenCL CPU apps

25 Using VM technology ● CDI platforms: – 85% Windows – 7% Linux – 7% Mac OS X ● Developing and maintaining versions for different platforms is hard ● Even making a portable Linux executable is hard

26 Virtual machines Host operating system Guest operating system application

27 Virtual machines Windows 7 Debian Linux 2.6 application

28 BOINC VM support ● Create a VM image for your favorite environment ● Create executables for that environment BOINC client VirtualBox executive Vbox wrapper VM instance shared directory: executable input, output files

29 VM advantages ● Develop in your favorite environment – No need for multiple versions ● A VM is a strong “sandbox” – Can run untrusted applications ● Free “checkpointing”

30 BOINC on Android ● New GUI ● Battery-related issues ● Released July 2013 – Google, Amazon App Stores – ~50K active devices

31 Why hasn’t volunteer computing gained traction? ● “Ecosystem of projects” model – Lots of competing projects ● Problems with this model – Creating/operating a project is too hard and risky – Volunteers need simplicity – No coherent PR; too many brands

32 Umbrella projects ● One project serves many scientists ● Examples – CAS@home (Chinese Academy of Science) – World Community Grid (IBM) – U. of Westminster (desktop grid) – Ibercivis (Spanish consortium)

33 Integrating BOINC ● HTCondor (U. of Wisconsin) – Goal: BOINC-based back end for Open Science Grid or any Condor pool BOINC server BOINC server HTCondor node Grid manager BOINC GAHP Job submission

34 Integrating BOINC ● HUBzero (Purdue) – Goal: BOINC-based back end for science portals such as nanoHUB BOINC server BOINC server Hub projects PCs

35 Proposal: Science@home ● Single “brand” for volunteer computing ● Volunteers register for science areas rather than projects ● How to allocate computing power? – Involve the HPC, scientific funding communities

36 projects Implementing Science@home ● BOINC “account manager” architecture Science@home BOINC client BOINC client projects

37 Summary ● Volunteer computing is – Usable for most HTC applications – A path to ExaFLOPS computing – A way to popularize science ● BOINC provides the software infrastructure ● Barriers are largely organizational

38 Contacts ● ●

