MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Presented By Avinash Gutte Under The Guidance of Mrs. Hemangi Kulkarni Department of Computer Engineering Pimpri-Chinchwad College of Engineering, Pune.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
Compass Semantic search
Information Retrieval in Practice
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Search Engine Optimization By Andy Smith | Art Institute of Dallas.
Lesson 2 Technology: Federated Searching Explained.
Overview of Search Engines
What is Web Design The term “web design” has come to encompass a number of disciplines, including: Visual (graphic) design User interface and experience.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Development of mobile applications using PhoneGap and HTML 5
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
1 CS428 Web Engineering Lecture 18 Introduction (PHP - I)
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Application for Internet Radio Directory 19/06/2012 Industrial Project (234313) Kickoff Meeting Supervisors : Oren Somekh, Nadav Golbandi Students : Moran.
Databases & Data Warehouses Chapter 3 Database Processing.
Overview of JSP Technology. The need of JSP With servlets, it is easy to – Read form data – Read HTTP request headers – Set HTTP status codes and response.
Presented by: Michal Nir, Saar Gross Supervisors: Nadav Golbandi, Oren Somekh Computer Science Department Industrial Project (234313) Tuesday, January.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Chapter 11 Adding Media and Interactivity. Flash is a software program that allows you to create low-bandwidth, high-quality animations and interactive.
Dreamweaver Domain 3 KellerAdobe CS5 ACA Certification Prep Flash Domain 2 KellerAdobe CS5 ACA Certification Prep Flash Domain 2: Identifying Rich Media.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
CSCI 6962: Server-side Design and Programming Course Introduction and Overview.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Multi-agent Research Tool (MART) A proposal for MSE project Madhukar Kumar.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
Patient Empowerment for Chronic Diseases System Sifat Islam Graduate Student, Center for Systems Integration, FAU, Copyright © 2011 Center.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
OpenURL Link Resolvers 101
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Design a full-text search engine for a website based on Lucene
Web Technologies Lecture 8 Server side web. Client Side vs. Server Side Web Client-side code executes on the end-user's computer, usually within a web.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
A search engine is a web site that collects and organizes content from all over the internet Search engines look through their own databases of.
INTRODUCTION TO HTML5 New HTML5 User Interface and Attributes.
WINDOW SEARCH SERVER Topics  Topology  High-level Architecture  Performance  WSS vs. MOSS Search Comparison  Search Server 2008.
Chapter 11 Adding Media and Interactivity. Chapter 11 Lessons Introduction 1.Add and modify Flash objects 2.Add rollover images 3.Add behaviors 4.Add.
Lucene Jianguo Lu.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
June 30, 2005 Public Web Site Search Project Update: 6/30/2005 Linda Busdiecker & Andy Nguyen Department of Information Technology.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
A S P. Outline  The introduction of ASP  Why we choose ASP  How ASP works  Basic syntax rule of ASP  ASP’S object model  Limitations of ASP  Summary.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
Information Retrieval in Practice
Search Engine Architecture
Over 1,000 books, journals, videos and reference material
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Thanks to Bill Arms, Marti Hearst
Searching EIT, Author Gay Robertson, 2017.
What are Cascading Stylesheets (CSS)?
Introduction to Nutch Zhao Dongsheng
Getting Started With Solr
Presentation transcript:

MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer Science Department MQSE3 Industrial Project – Final Presentation

Introduction The Movie Quotes Search Engine project focuses on the creation of a search engine allowing a user to search for terms that appear in the dialogues of a movie. The project consists of two main components:  A web application used as a user interface to the search engine.  A crawling engine used to maintain a searchable index and a content database.  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Goals  Relevant search results  Modern UI design  Rich search options  Video play option  Browser agnostic website  Large-scale movies database  Incremental, priority-based crawling  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Methodology  IMDb & OpenSubtitles.org dump files  SRT subtitle files  OpenSubtitles.org XML-RPC API  SQLite database  Apache Lucene  Java Servlets / JSP  HTML5 / CSS / JavaScript  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

System Diagram  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Achievements  Crawling  Command-line tool  Dump files parsing  OpenSubtitles.org API based  Subtitles downloading & indexing  Cover art downloading  Multithreaded pipelined execution  Priority based  Index recovery  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Achievements  Storage  SQLite-based database  Movies metadata (popularity, rating, IMDb link...)  Cover art  ~20000 subtitles downloaded & indexed  Local videos repository  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Achievements  Indexing  SRT files parsing & validating  SRT files filtering  Translator comments  Hearing impaired comments  Format tags  Partitioning into overlapping search units  Indexing using Lucene core  Stemming  Stop words removal  Actual indexing of the search units  ~250ms per average SRT file  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Achievements  Searching  Searching using Lucene core  Query parsing  Search operators support  Stemming  Stop words removal  Relevant buckets retrieval & ranking  Aggregating buckets to movies  Merging of overlapping buckets  Highlighting search words using Lucene core  Buckets trimming to most relevant text  Configurable weighted movie ranking  Lucene rank  Popularity  Rating  Year  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Achievements  Web Application  JSP/HTML5/CSS/JavaScript based  Full support for IE9  Modern UI design  Search results snippets  Multiple hits per movie  Paging  Video play option  Per result snippet  Relevant scene  Captions  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Testing A testing platform enables comparing search results “quality” against different system configurations.  In each test, the search engine is queried with famous quotes  A test passes if relevant movie is found in the top-K results  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Testing We tested the system with a set of ~100 famous movie quotes. With biased system configuration and K=9, we acquired ~90% pass rate.  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Screenshots  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Screenshots  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions

Conclusions  Lucene is a powerful search platform  Optimal search results are difficult to define  Subtitles files from public sources should be further validated  HTML5 video support is still limited & browser dependent  Source control systems make life easier  Introduction  Goals  Methodology  System Diagram  Achievements  Testing  Screenshots  Conclusions