Presentation is loading. Please wait.

Presentation is loading. Please wait.

MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer.

Similar presentations


Presentation on theme: "MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer."— Presentation transcript:

1 MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer Science Department 19.6.12MQSE3 Industrial Project – Final Presentation

2 Introduction The Movie Quotes Search Engine project focuses on the creation of a search engine allowing a user to search for terms that appear in the dialogues of a movie. The project consists of two main components:  A web application used as a user interface to the search engine.  A crawling engine used to maintain a searchable index and a content database.  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

3 Goals  Relevant search results  Modern UI design  Rich search options  Video play option  Browser agnostic website  Large-scale movies database  Incremental, priority-based crawling  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

4 Methodology  IMDb & OpenSubtitles.org dump files  SRT subtitle files  OpenSubtitles.org XML-RPC API  SQLite database  Apache Lucene  Java Servlets / JSP  HTML5 / CSS / JavaScript  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

5 System Diagram  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

6 Achievements  Crawling  Command-line tool  Dump files parsing  OpenSubtitles.org API based  Subtitles downloading & indexing  Cover art downloading  Multithreaded pipelined execution  Priority based  Index recovery  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

7 Achievements  Storage  SQLite-based database  Movies metadata (popularity, rating, IMDb link...)  Cover art  ~20000 subtitles downloaded & indexed  Local videos repository  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

8 Achievements  Indexing  SRT files parsing & validating  SRT files filtering  Translator comments  Hearing impaired comments  Format tags  Partitioning into overlapping search units  Indexing using Lucene core  Stemming  Stop words removal  Actual indexing of the search units  ~250ms per average SRT file  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

9 Achievements  Searching  Searching using Lucene core  Query parsing  Search operators support  Stemming  Stop words removal  Relevant buckets retrieval & ranking  Aggregating buckets to movies  Merging of overlapping buckets  Highlighting search words using Lucene core  Buckets trimming to most relevant text  Configurable weighted movie ranking  Lucene rank  Popularity  Rating  Year  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

10 Achievements  Web Application  JSP/HTML5/CSS/JavaScript based  Full support for IE9  Modern UI design  Search results snippets  Multiple hits per movie  Paging  Video play option  Per result snippet  Relevant scene  Captions  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

11 Testing A testing platform enables comparing search results “quality” against different system configurations.  In each test, the search engine is queried with famous quotes  A test passes if relevant movie is found in the top-K results  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

12 Testing We tested the system with a set of ~100 famous movie quotes. With biased system configuration and K=9, we acquired ~90% pass rate.  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

13 Screenshots  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

14 Screenshots  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

15 Conclusions  Lucene is a powerful search platform  Optimal search results are difficult to define  Subtitles files from public sources should be further validated  HTML5 video support is still limited & browser dependent  Source control systems make life easier  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions


Download ppt "MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer."

Similar presentations


Ads by Google