EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08/10/13.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Information Retrieval in Practice
Learning to Extract Form Labels Nguyen et al.. The Challenge We want to retrieve and integrate online databases We want to retrieve and integrate online.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Scalable Text Mining with Sparse Generative Models
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
AstroTag MUG meeting, STScI December Data Tagging Storing associations between data sets and tags (words/phrases) – IPPPSSOOT {w_1, w_2, …, w_n}
A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes Bilal Hawashin, Farshad Fotouhi Traian Marius Truta Department.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
WISER : OvidSP OvidSP is the new interface for searching many of the science and medicine databases available via OxLIP Catherine Dockerty
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Amy Dai Machine learning techniques for detecting topics in research papers.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Welcome to the Business Source Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Author : Stamatina Thomaidou, Konstantinos Leymonis, and Michalis Vazirgiannis.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Final Year Project – I Smart Recruiter Group Members: Uzair Siddiqui [05363] Rehma Ather [05625] Meeran Khan [05364] Syed Maaz Alam [05284] Supervisor.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Automatic Labeling of Multinomial Topic Models
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Information Retrieval in Practice
Clustering of Web pages
Genomics research paper presentation
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
A research literature search engine with abbreviation recognition
IL Step 3: Using Bibliographic Databases
Information Retrieval and Web Design
Topic: Semantic Text Mining
Presentation transcript:

EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08/10/13

The data we focus: Huge Collection of Logs Following a normal approach and landing to runway 4 in roc; aircraft was taxied clear of the end of runway to the gate.several ground snow removal vehicles were operating to left of aircraft so we moved to the right side of ramp.…. Each Document

Power of Text-Rich Data Cubes Hierarchical Data CubeText Analysis

Power of Text-Rich Data Cubes Data CubeRich Text Efficient Summarization Powerful Text Mining

Power of Text-Rich Data Cube

Other features Contextual SearchHierarchical Dimension Selection : support multiple choices Similar Document Finding : based on Contextual Search Keyword Frequency DistributionMulti-gram Summarization

Contextual Search  Motivation:  Every word/concept may have equivalent word/concept  “SVM” = “Support Vector Machine”, “Alt” = “Altitude”  Connections between words  “Kernel Method” - “SVM”, “altitude” – “flight level”

Contextual Search  We develop a contextual search framework to build the word-net  Contains 4 different relationships:  A “Use” B: Equivalent terms, B is more common  A “RT” B: Related terms, not hierarchical  A “BT” B: B is the broader word  A “NT” B: B is the narrower word

Contextual Search  Step 1: Generate word-net when uploading dataset.  Step 2: Return the related terms when inputing.  Step 3: Automatically include the equivalent terms when searching.  Step 4: Operator Support “AND”/”OR”/”NOT”

Hierarchical Dimension Support  Multiple Choice Support  Each Dimension can support several levels  Powerful examples:  “B-737” VS. “B-747”  “Boeing” VS. “Airbus”

Document List Result  Using the default Mysql “natural language full text search”  Extract the title based on the most relevant part.  Show tags of dimension values for target dimensions  Highlight the keywords

Similar Document  Also contextual search  Step 1: Extract meaningful terms from the original report  Step 2: Using these terms as input, conduct contextual search.

Top Cells  Search all the cells in the targeted dimensions, find the most relevant cells  A multi-dimensional cell ranking

Single Dimension Distribution Based on Keywords

 Using a offline + online framework to calculate the distribution.  If Offline:  Combination of keywords are exponential  If Online:  Retrieve the whole corpus every time.  Strategy:  Store the single keyword distribution in the database. [Offline]  Combine the single ones to a new distribution online. [Online]

Single Dimension Distribution Based on Keywords  Offline process:  Step1: Map equivalent terms into one.  Step2: Build both keyword reverse index and cell reverse index based on report  Step3: Compare these two reverse indexes and calculate the single term distribution.  Online process [with a list of terms and dimensions]  Step1: match each term into it’s equivalent term.  Step2: Calculate the combined distribution based on the independent assumption, for each dimension  Val(t1..tn) = 1 –π(1-val(ti));

Topic Distribution  Based on Topic Cube  Applying topic model.  Support comparison between different cells

Unigram/Multigram description  Based on Qiaozhu’s paper, “Automatic Labeling of Multinomial Topic Models”  Find multi-gram candidate from the whole text  Scoring it based on unigram  Adjust it based on it’s length

Thinking  Data Cube:  Efficient Summary  Highly Structured Data.  Rich Text:  Topic Analysis, keyword search  Common: ASRS, IMDB, Publication-Net, News…  Network (HIN)  Good at mining, contains structural information.  No information loss

Motivation of EventCube  Combine Data Cube with Rich Text.  Combine Summary with Keyword Search  Build a general search/analysis system for rich text cube data.  1. Aviation Safety Reporting Data  Time, Weather, Location, Model…Flight logs  2. Publication Data  Author, Conf, Time, Field, Affliation…Abstract  3. IMDB  Time, Country, Style, Director…Description

Thanks