Searching and browsing through fragments of TED Talks

Slides:



Advertisements
Similar presentations
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Advertisements

1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Search Engines and Information Retrieval
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Information Retrieval in Practice
IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Information Retrieval in Practice
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Search Engines and Information Retrieval Chapter 1.
1 The BT Digital Library A case study in intelligent content management Paul Warren
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Chapter 1 Introduction to Data Mining
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
University of Economics Prague Information Extraction (WP6) Martin Labský MedIEQ meeting Helsinki, 24th October 2006.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Thinking of Drupal 8? Get started with the resources.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
WP5: Semantic Multimedia
User Modeling for Personal Assistant
Digital Video Library - Jacky Ma.
Towards a framework for architectural design decision support
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval (in Practice)
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Multimedia Information Retrieval
Social Knowledge Mining
Information Retrieval
Exploring Scholarly Data with Rexplore
Data Warehousing and Data Mining
Data Mining Chapter 6 Search Engines
Multimedia Information Retrieval
ISWC 2013 Entity Recommendations in Web Search
Web Mining Department of Computer Science and Engg.
Introduction to Information Retrieval
Academic & More Group 4 谢知晖 王逸雄 郭嘉宋 程若愚.
Web archives as a research subject
Information Retrieval and Web Design
Intro to Azure Search Julie Smith 2019.
Example of Event-Based Video Data (Touch-down Scenario)
Intro to Azure Search Julie Smith 2019.
Presentation transcript:

Searching and browsing through fragments of TED Talks MARIELLA SABATINO – mariella.sabatino@eurecom.fr GO! 11/01/2019

TED Talks TED is a global set of conferences, held throughout North America, Europe and Asia. TED Talks address a wide range of topics within the research and practice of science and culture. The speakers are given a maximum of 18 minutes to present their ideas in the most innovative and engaging way they can, often through storytelling. 11/01/2019

Problem It is very difficult to find interesting documents Which are the fragments potentially relevant without having to watch the entire video? Users are overwhelmed with audiovisual content Users browse fast, looking for topic of interest It is very difficult to find interesting documents 11/01/2019

Research questions HOW TO: how to recommend related media fragments within the same video collection 1 2 3 recommend related media fragments within the same video collection? design a web application that provides a rich environment for exploring a video collection? detect segments of interest in a video? 11/01/2019

HyperTED Browsing and recommendation of Media Fragments of TED Talks based on entities extracted in the subtitles Integration of the Media Fragments concept and the subtitles enrichment performed by NERD on a Node.js server 11/01/2019

Research question 1 HOW TO: detect segments of interest in a video? how to recommend related media fragments within the same video collection 1 2 3 detect segments of interest in a video? recommend related media fragments within the same video collection? design a web application that provides a rich environment for exploring a video collection? 11/01/2019

1 2 3 What is a NER task? Named Entity Recognition (NER) aims to locate and classify elements of textual document into pre-defined categories such as: People names; Organizations names; Places; Temporal and numerical expressions. These elements and the categories take respectively the name of entities and ontologies. 11/01/2019

1 2 3 For example… “This is Nikita, a security guard from one of the bars in St. Petersburg.” NER “This is Nikita, a security guard from one of the bars in St. Petersburg.” PERSON FUNCTION LOCATION Natural Language Processing (NPL) Task  disambiguating URL in a knowledge base. E.g. http://dbpedia.org/resource/Saint_Petersburg. Category: type in the NER task. Example taken from the transcript of https://www.ted.com/talks/2089 11/01/2019

NER extractors Web Tools that use NER algorithms. 1 2 3 NER extractors Web Tools that use NER algorithms. Open APIs for research use. 11/01/2019

NERD http://nerd.eurecom.fr/ 1 2 3 NERD Compare performance of NER tools available on web. Unify the results of NER extractors in a common output. http://nerd.eurecom.fr/ 11/01/2019

NER extractors evaluation 1 2 3 DOCUMENTS ANALYZED: 5 short TED Talks NUMBER OF EVALUATORS: 1 STEPS OF EVALUATION: Selection of the meaningful concepts on the subtitles; Run of each extractor; Comparison of the results. PRECISION: the fraction of retrieved documents that are relevant RECALL: is the fraction of relevant documents that are retrieved. F-MEASURE: is the level of accuracy considering both the Precision and the Recall 11/01/2019

NER extractors evaluation 1 2 3 EXTRACTOR PRECISION RECALL F-MEASURE AlchemyAPI 0,15 0,03 0,05147488928 DataTXT 0,21 0,36 0,2652521588 DBpedia Spotlight 0,14 0,37 0,1994140988 Lupedia 0,18 0,02 0,04389924763 OpenCalais 0,27 0,09 0,1347540544 Saplo 0,00 Textrazor 0,17 0,40 0,2416065311 THD 0,12 0,05 0,07485426603 Wikimeta 0,13 0,08 0,09514781377 Yahoo! Content Analysis 0,52 0,202927267 Zemanta 0,44 0,2511994999 Combined 0,11 0,54 0,1859774587 11/01/2019

A Media Fragment is a part of a multimedia object. 1 2 3 Media Fragments A Media Fragment is a part of a multimedia object. Temporal Fragments sections along the time dimension of the media resource with a start and an end point. http://www.w3.org/TR/media-frags/ http://www.w3.org/TR/media-frags/ 11/01/2019

TED Talks have paragraphs: a human-made subdivision of subtitles. MF creation: chapters 1 2 3 TED Talks have paragraphs: a human-made subdivision of subtitles. 11/01/2019

MF creation: hot spots 1 2 3 Extraction of topic from TextRazor and entities from NERD Clustering of consecutive chapters which talks about similar topics Filtering of those fragments based on annotation relevance The Hot Spots are those fragments whose relative relevance falls under the first quarter of the final score distribution. 11/01/2019

Research question 2 HOW TO: how to recommend related media fragments within the same video collection 1 2 3 recommend related media fragments within the same video collection? design a web application that provides a rich environment for exploring a video collection? detect segments of interest in a video? 11/01/2019

Search Engine indexing 1 2 3 A search engine is a system able to access to information previously stored and indexed. The search engine indexing is the process of collecting, parsing and storing data to make searches faster. We use it for indexing annotations in our database 11/01/2019

Annotation based index 1 2 3 WHY ANNOTATIONS? Because they “contain” the meaning of the talk Because they contain some very useful attributes: timing references (startNPT and endNPT); uuid; relevance references. WHICH ANNOTATIONS? Entities and Topics 11/01/2019

1 2 3 ElasticSearch ElasticSearch is an open-source search engine. It uses Apache Lucene™ for indexing. It aims to make full text search easy by hiding the complexities of Lucene behind a simple RESTful API. 11/01/2019

ElasticSearch HOW TO MAKE A QUERY 1 2 3 ElasticSearch HOW TO MAKE A QUERY ElasticSearch provides a full Query DSL based on JSON to define queries. In general, there are basic queries such as term or prefix.  11/01/2019

Recommendation Interlinking through chapters and topic 1 2 3 Recommendation Interlinking through chapters and topic Interlinking to openCourseware and openUniversity 11/01/2019

Research question 3 HOW TO: how to recommend related media fragments within the same video collection 1 2 3 design a web application that provides a rich environment for exploring a video collection? recommend related media fragments within the same video collection? detect segments of interest in a video? 11/01/2019

1 2 3 Architecture 11/01/2019

1 2 3 DEMO http://linkedtv.eurecom.fr/mediafragmentplayer 11/01/2019

Conclusions Evaluation of NER tools in the context of TED Talks HotSpot detection based on topics and entities Recommendation algorithm, hyperlinks between fragment of TED talks + external education resources Nice and responsive UI 11/01/2019

Publications HyperTED is one of the submitted app at the Challenge at LinkedUP - http://linkedup-challenge.org/ José Luis Redondo García, Mariella Sabatino, Pasquale Lisena and Raphaël Troncy. Detecting Hot Spots in Web Videos. In International Semantic Web Conference (ISWC’14), Demo  11/01/2019