CMU 15-505: Internet Search Technologies
15-505 Internet Search Technologies Instructors: Alona Fyshe Scott Larsen Chris Monson Kamal Nigam http://www.cs.cmu.edu/~knigam/15-505
What does it take to build a world-class search engine and related services? Lots of computer science Massively parallel computation Special-purpose data storage Information retrieval Machine learning Language analysis User interface design
Study each of these topics in narrow but deep fashion Format: small seminar, readings, interactive discussions, programming practicum Grading: 55% programming homework 30% reading response 15% class participation
What are reading responses? Practice for reading and thinking about computer science research papers Meant to be open-ended, fairly short (1 page) Can be: Summary of paper Critique of theory, experiments, approach Suggestions for follow-on studies
Collaboration and Cheating Please collaborate on ideas, approaches, diagnosing problems – use the mailing list All words and code must be your own Disclose all collaborations Clarify any doubts
What will make this class enjoyable? Interactive Flexibility to explore fun domains and data Early feedback to us about what works and doesn’t
Problems in Internet Search Technology: Huge Problems E.g. what changed in the web since this time yesterday? Classic Problems E.g. sorting a gazillion numbers fast New Problems E.g. making sense of dynamic Cyrillic web pages Practical Problems Eg. how do we make both advertisers and consumers happier at the same time? Non-practical Problems E.g. what do you see if you zoom all the way in on the moon? Beautiful Problems And Fun Problems
A Taste Sorting Matrix Operations Scaling size up Scale time requirements down Matrix Operations Thinking about the problem in a blend of old ways and new ways
Classic Sorting Algorithms Quick Merge Selection Shell Heap Radix Bucket …. Ever heard of the Patience sort? Bozo sort?
Enlarge the Problem: 1,000x too many keys for a single machine 1024 machines to use
Sorting: Parallel How would you do it? Quick? Merge? Selection? Shell? Heap? Radix? Bucket? ….
Bitonic Sort: Batcher (1968) Bitonic Sequence: <a0, a1, …, an-1 > Exists i such that <a0 .. ai> is monotonically increasing and <ai+1 .. an-1> is monotonically decreasing Or: there exists a cyclic shift of indices such that the above is satisfied Eg. < 8, 9, 2, 1, 0, 4> is a bitonic sequence
Bitonic Merging Network Compliments of Dr. Quinn Snell, BYU
Bitonic Merge on a Hypercube
Bitonic Sort
Bitonic Sort Procedure BitonicSort for i = 0 to d -1 for j = i downto 0 if (i + 1)st bit of iproc <> jth bit of iproc comp_exchange_max(j, item) else comp_exchange_min(j, item) endif endfor endfor comp_exchange_max and comp_exchange_min compare and exchange the item with the neighbor on the jth dimension
Bitonic Sort Demo http://www.inf.fh-flensburg.de/lang/algorithmen/sortieren/bitonic/bitonicen.htm
Parallel Sort: Beauty or a Beast? What does it take to implement this?
Bitonic Sort: Why? O(n log2(n)) Data independent Resource needs are perfectly defined Very parallel friendly
Matrix Multiplication 0.75 0.25 0.0 0.75 0.25 0.0 0.5625 0.375 0.0625 0.0 0.1875 0.675 = *
Matrix Pipeline 0.5625 0.75 0.25 0.0 + 0.0625 + 0.0 + 0.75 0.25 0.0 0.0 = 0.625 0.375 0.0 0.1875 0.75 0.0625 0.5625
Visualization = *
Visualization * =
Visualization
Visualization Add a “top down” slide with 4 rectangles and the image plane
Matrix Multiplication A cube of processors Each does a chunk of the computation Each needs different (and overlapping) portions of the input Each passes intermediate results to certain neighbors Result is stored across multiple machines Seems kinda heavy for a simple algorithm! Lookup Fox’s algorithm and Canon’s algorithm Very pretty at one level Gory at another level
A Different View Courtesy http://www.unrealtournament3.com/
Multiplication Multi-texturing *
Addition Blending + =
Graphics Pipeline Multiply Multiply Add Image (Frame Buffer)
How the Algorithm Works Add a “top down” slide with 4 rectangles and the image plane
How the Algorithm Works
How the Algorithm Works *
How the Algorithm Works * Color all four planes in upper right image
How the Algorithm Works * +
Performance
GPU Sorting
Problems in Internet Search Technology: Huge Problems Classic Problems New Problems Practical Problems Non-practical Problems Beautiful Problems Fun Problems
Questions? CMU 15-505: Internet Search Technologies Kamal Nigam (knigam@google.com) Chris Monson (shiblon@google.com) Alona Fyshe (alonaf@google.com) Scott Larsen (esl@google.com)
Bitonic Rearranging (cycling)