Download presentation
Presentation is loading. Please wait.
Published byMorgan Hart Modified over 9 years ago
1
Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010
2
May-20-10CS572-Summer2010CAM-2 Outline What is Hadoop? Where did it come from? What are the current versions of Hadoop? What can it do?
3
May-20-10CS572-Summer2010CAM-3 Apache Hadoop The brainchild of Doug Cutting Built out by brilliant engineers and contributors from Yahoo, and Facebook and Cloudera and other companies Started in 2007/2008 when code was spun out of Nutch Has grown into really large project at Apache with significant ecosystem
4
May-20-10CS572-Summer2010CAM-4 How to get started Hadoop (0.20.0/0.20.2) –Put your Java hat on –Go here: http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html If you want to do this on Windows, get Cygwin, or VMWare or something that you can run Linux on Run the Map Reduce examples on local mode Check on the data generated in your HDFS –Scaling it out Amazon Elastic Map Reduce Setting it up on your own cluster: DataNodes and Task/JobTracker
5
May-20-10CS572-Summer2010CAM-5 Basic Operations Listing files –./bin/hadoop fs –ls Writing files –./bin/hadoop fs –put Running Map Reduce Jobs –mkdir input –cp conf/*.xml input –./bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+’ –cat output/*
6
May-20-10CS572-Summer2010CAM-6 Advanced Topics Writing your Mappers and Reducers –Check out Map Reduce Tutorial here: –http://hadoop.apache.org/common/docs/r0.20.0/mapred _tutorial.htmlhttp://hadoop.apache.org/common/docs/r0.20.0/mapred _tutorial.html –Code for several examples including Word Count
7
May-20-10CS572-Summer2010CAM-7 Other Hadoop ecosystem projects HBase –Big Table HIVE –Built at FB, provides SQL interface on HDFS Chukwa –Log Processing Pig –Scientific data analysis language on top of M/R and HDFS Zookeeper –Distributed Systems management
8
May-20-10CS572-Summer2010CAM-8 No releases in a while Stick with 0.20.x
9
May-20-10CS572-Summer2010CAM-9 Wrapup Lots more information at –http://hadoop.apache.orghttp://hadoop.apache.org –http://hadoop.apache.org/mapreduce/http://hadoop.apache.org/mapreduce/ –http://hadoop.apache.org/hdfs/http://hadoop.apache.org/hdfs/ Project ideas –Implement GIS or geometrical algorithm in Map Reduce –Write REST interface to control HDFS and to M/R –Add new Writeable input data formats –Integrate Solr and Hadoop
10
May-20-10CS572-Summer2010CAM-10 Acknowledgements Material inspired by discussions and talks on the Apache Mailing lists for Hadoop and through discussions with the rest of the Hadoop community
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.