Download presentation
Presentation is loading. Please wait.
Published byAngelica Fisher Modified over 9 years ago
1
MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer Science Department 19.6.12MQSE3 Industrial Project – Final Presentation
2
Introduction The Movie Quotes Search Engine project focuses on the creation of a search engine allowing a user to search for terms that appear in the dialogues of a movie. The project consists of two main components: A web application used as a user interface to the search engine. A crawling engine used to maintain a searchable index and a content database. Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
3
Goals Relevant search results Modern UI design Rich search options Video play option Browser agnostic website Large-scale movies database Incremental, priority-based crawling Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
4
Methodology IMDb & OpenSubtitles.org dump files SRT subtitle files OpenSubtitles.org XML-RPC API SQLite database Apache Lucene Java Servlets / JSP HTML5 / CSS / JavaScript Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
5
System Diagram Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
6
Achievements Crawling Command-line tool Dump files parsing OpenSubtitles.org API based Subtitles downloading & indexing Cover art downloading Multithreaded pipelined execution Priority based Index recovery Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
7
Achievements Storage SQLite-based database Movies metadata (popularity, rating, IMDb link...) Cover art ~20000 subtitles downloaded & indexed Local videos repository Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
8
Achievements Indexing SRT files parsing & validating SRT files filtering Translator comments Hearing impaired comments Format tags Partitioning into overlapping search units Indexing using Lucene core Stemming Stop words removal Actual indexing of the search units ~250ms per average SRT file Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
9
Achievements Searching Searching using Lucene core Query parsing Search operators support Stemming Stop words removal Relevant buckets retrieval & ranking Aggregating buckets to movies Merging of overlapping buckets Highlighting search words using Lucene core Buckets trimming to most relevant text Configurable weighted movie ranking Lucene rank Popularity Rating Year Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
10
Achievements Web Application JSP/HTML5/CSS/JavaScript based Full support for IE9 Modern UI design Search results snippets Multiple hits per movie Paging Video play option Per result snippet Relevant scene Captions Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
11
Testing A testing platform enables comparing search results “quality” against different system configurations. In each test, the search engine is queried with famous quotes A test passes if relevant movie is found in the top-K results Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
12
Testing We tested the system with a set of ~100 famous movie quotes. With biased system configuration and K=9, we acquired ~90% pass rate. Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
13
Screenshots Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
14
Screenshots Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
15
Conclusions Lucene is a powerful search platform Optimal search results are difficult to define Subtitles files from public sources should be further validated HTML5 video support is still limited & browser dependent Source control systems make life easier Introduction Goals Methodology System Diagram Achievements Testing Screenshots Conclusions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.