Presentation is loading. Please wait.

Presentation is loading. Please wait.

Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

Similar presentations


Presentation on theme: "Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)"— Presentation transcript:

1 Actores y Actrices

2 Peligro Please be careful!

3 IMDb (I assume you all know?)

4 IMDb Dump Not open/free!

5 The Question You are Going to Answer … Which pair of actors/actresses have acted together the most times?

6 An Example In how many movies have Al Pacino and Robert Di Nero starred together in IMDb? ?

7 IMDB: Typical File Log into machine cluster.dcc.uchile.cl Username: uhadoop zcat /data/hadoop/hadoop/data/imdb/actors.list.gz | more

8 IMDb: Already Parsed zcat /data/hadoop/hadoop/data/imdb/tsv/actpersons-to-movies.tsv.gz | more How many theatrical movies was Uma Thurman in? zcat /data/hadoop/hadoop/data/imdb/tsv/actresses-to-movies.tsv.gz | grep -e “^Thurman, Uma” | grep -e “THEATRICAL_MOVIE” | wc -l

9 The Question You are Going to Answer … Which pair of actors/actresses have acted together the most times?

10 1. Download the project http://aidanhogan.com/teaching/cc5212-1/mdp-lab5.zip

11 2. Implement the Hadoop job(s)! Adapt WordCount example – Refer to lab slides from last week Can use class file for each part of the task Test on small file – /uhadoop/imdb/actpersons-to-movies.100k.tsv Run on big file – /uhadoop/imdb/full/actpersons-to-movies.tsv Write to your directory!!! – /uhadoop/[username]

12 3. Continuation Count the pairs – CountPairs.java Sort the pairs – SortPairs.java Figure out the input Figure out the map/reduce phase Adapt a previous example – WordCount or EmitPairs – Change generics – Implement new Map/Reduce Run it!


Download ppt "Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)"

Similar presentations


Ads by Google