Poly Hadoop CSC 550 April 26, 2007 Scott Griffin Daniel Jackson Alexander Sideropoulos Anton Snisarenko
Grid Computing High Performance Computing Cluster Network of Resources CPUs, Applications, Data and Storage Common Interface
Map Reduce Google Map Function Reduce Function Examples Word Frequencies Hyperlink Source/Target Tree
Hadoop Open Source Java Framework Map/Reduce Paradigm Cluster Commodity Hardware HDFS
Our Project Setup Hadoop BladeCenter 10 Physical Nodes VMware: Grid of Virtual Nodes 1, 2, 4, 8 Virtual Nodes per Physical Node
Experimental Goals Feasibility Performance Ease of Deployment Limits
Large Dataset Netflix Prize: Movie/User/Rating Database Calculate Average Rating per User Map: For every rating, emit Reduce: Average every rating for a given user, emit
Related Work UCSB Hadoop on XEN University of Washington CSE 490 – Class projects in Hadoop
Timeline Week 5-6 Install/Configure Environment Develop Code Week 7-8 Run Experiments Week 9-10 Analyze Data Write Paper Present results
Questions?