Download presentation
Presentation is loading. Please wait.
Published byAsher Mitchell Modified over 9 years ago
1
Actores y Actrices
2
Peligro Please be careful!
3
IMDb (I assume you all know?)
4
IMDb Dump Not open/free!
5
The Question You are Going to Answer … Which pair of actors/actresses have acted together the most times?
6
An Example In how many movies have Al Pacino and Robert Di Nero starred together in IMDb? ?
7
IMDB: Typical File Log into machine cluster.dcc.uchile.cl Username: uhadoop zcat /data/hadoop/hadoop/data/imdb/actors.list.gz | more
8
IMDb: Already Parsed zcat /data/hadoop/hadoop/data/imdb/tsv/actpersons-to-movies.tsv.gz | more How many theatrical movies was Uma Thurman in? zcat /data/hadoop/hadoop/data/imdb/tsv/actresses-to-movies.tsv.gz | grep -e “^Thurman, Uma” | grep -e “THEATRICAL_MOVIE” | wc -l
9
The Question You are Going to Answer … Which pair of actors/actresses have acted together the most times?
10
1. Download the project http://aidanhogan.com/teaching/cc5212-1/mdp-lab5.zip
11
2. Implement the Hadoop job(s)! Adapt WordCount example – Refer to lab slides from last week Can use class file for each part of the task Test on small file – /uhadoop/imdb/actpersons-to-movies.100k.tsv Run on big file – /uhadoop/imdb/full/actpersons-to-movies.tsv Write to your directory!!! – /uhadoop/[username]
12
3. Continuation Count the pairs – CountPairs.java Sort the pairs – SortPairs.java Figure out the input Figure out the map/reduce phase Adapt a previous example – WordCount or EmitPairs – Change generics – Implement new Map/Reduce Run it!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.