Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching.

Similar presentations


Presentation on theme: "© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching."— Presentation transcript:

1 © Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching

2 © Spinnaker Labs, Inc. Overview Slides Lab Materials Readings Video Lectures Tools Datasets http://code.google.com/edu

3 © Spinnaker Labs, Inc. Slides Multiple short course outlines available: “MapReduce in a week” “Introduction to Problem Solving on Large Scale Clusters” “MapReduce Mini Lecture Series”

4 © Spinnaker Labs, Inc. Labs Lab designs from UW course available –“Introduction to MapReduce” –“A Simple Inverted Index” –“PageRank on the Wikipedia Corpus” –“Clustering the Netflix Movie Data”

5 © Spinnaker Labs, Inc. Readings Google has several papers available –“Introduction to Distributed Systems” –“MapReduce: Simplified Data Processing on Large Scale Clusters” –“The Google File System” –“BigTable: A Distributed Storage System for Structured Data” http://research.google.com/pubs/papers.html

6 © Spinnaker Labs, Inc. Lecture Videos MapReduce Mini-series

7 © Spinnaker Labs, Inc. Tools: Hadoop VM Problem: Experimenting with Hadoop requires one (or more) machines running Linux –Step 1: Install Linux –Step 2: Install Hadoop –Step 3: … No more Windows computer 

8 © Spinnaker Labs, Inc. Tools: Hadoop VM Solution: A virtual machine image with Linux and Hadoop 0.13, preconfigured –Runs with VMWare free player under any host OS –Allows experimentation with single-machine Hadoop image

9 © Spinnaker Labs, Inc. IBM has created a plugin for Eclipse to interact with Hadoop clusters –Open source –Free download –Works with Hadoop VM Tools: Eclipse Plugin

10 © Spinnaker Labs, Inc. (Tools demo)

11 © Spinnaker Labs, Inc. Datasets: Wikipedia Wikipedia supports free “bulk download” of data –Current site snapshot (big) –Entire revision history (massive) Eliminates need for Nutch crawls Good for indexing, search labs http://download.wikimedia.org

12 © Spinnaker Labs, Inc. Datasets: Netflix Netflix is an American company which provides online DVD rental by mail: Image © Netflix, Inc -- www.netflix.com

13 © Spinnaker Labs, Inc. Datasets: Netflix Netflix’s web site provides recommendations Theory: Other people watched movie X, then Y. You watched X, you might like Y. Open question: Can you provide more useful recommendations than their current system?

14 © Spinnaker Labs, Inc. Datasets: Netflix The Netflix Prize: $1,000,000 if you can find a better algorithm, based on their criteria They provide you with a large dataset of existing rental associations to work with www.netflixprize.com

15 © Spinnaker Labs, Inc. Datasets: Palimpsest New project announced Jan ‘08 Free access to scientific data from around the world –e.g., Hubble Telescope Data – 120 TB Coming soon? –research.google.com

16 © Spinnaker Labs, Inc. Conclusions Lots of starter materials available on the web –Good for reference –Get teaching assistants up to speed Readings, sample worksheets and other resources are open content & ready to use


Download ppt "© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching."

Similar presentations


Ads by Google