Download presentation
Presentation is loading. Please wait.
Published byBruno Higgins Modified over 9 years ago
1
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching
2
© Spinnaker Labs, Inc. Overview Slides Lab Materials Readings Video Lectures Tools Datasets http://code.google.com/edu
3
© Spinnaker Labs, Inc. Slides Multiple short course outlines available: “MapReduce in a week” “Introduction to Problem Solving on Large Scale Clusters” “MapReduce Mini Lecture Series”
4
© Spinnaker Labs, Inc. Labs Lab designs from UW course available –“Introduction to MapReduce” –“A Simple Inverted Index” –“PageRank on the Wikipedia Corpus” –“Clustering the Netflix Movie Data”
5
© Spinnaker Labs, Inc. Readings Google has several papers available –“Introduction to Distributed Systems” –“MapReduce: Simplified Data Processing on Large Scale Clusters” –“The Google File System” –“BigTable: A Distributed Storage System for Structured Data” http://research.google.com/pubs/papers.html
6
© Spinnaker Labs, Inc. Lecture Videos MapReduce Mini-series
7
© Spinnaker Labs, Inc. Tools: Hadoop VM Problem: Experimenting with Hadoop requires one (or more) machines running Linux –Step 1: Install Linux –Step 2: Install Hadoop –Step 3: … No more Windows computer
8
© Spinnaker Labs, Inc. Tools: Hadoop VM Solution: A virtual machine image with Linux and Hadoop 0.13, preconfigured –Runs with VMWare free player under any host OS –Allows experimentation with single-machine Hadoop image
9
© Spinnaker Labs, Inc. IBM has created a plugin for Eclipse to interact with Hadoop clusters –Open source –Free download –Works with Hadoop VM Tools: Eclipse Plugin
10
© Spinnaker Labs, Inc. (Tools demo)
11
© Spinnaker Labs, Inc. Datasets: Wikipedia Wikipedia supports free “bulk download” of data –Current site snapshot (big) –Entire revision history (massive) Eliminates need for Nutch crawls Good for indexing, search labs http://download.wikimedia.org
12
© Spinnaker Labs, Inc. Datasets: Netflix Netflix is an American company which provides online DVD rental by mail: Image © Netflix, Inc -- www.netflix.com
13
© Spinnaker Labs, Inc. Datasets: Netflix Netflix’s web site provides recommendations Theory: Other people watched movie X, then Y. You watched X, you might like Y. Open question: Can you provide more useful recommendations than their current system?
14
© Spinnaker Labs, Inc. Datasets: Netflix The Netflix Prize: $1,000,000 if you can find a better algorithm, based on their criteria They provide you with a large dataset of existing rental associations to work with www.netflixprize.com
15
© Spinnaker Labs, Inc. Datasets: Palimpsest New project announced Jan ‘08 Free access to scientific data from around the world –e.g., Hubble Telescope Data – 120 TB Coming soon? –research.google.com
16
© Spinnaker Labs, Inc. Conclusions Lots of starter materials available on the web –Good for reference –Get teaching assistants up to speed Readings, sample worksheets and other resources are open content & ready to use
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.