Download presentation
Presentation is loading. Please wait.
Published byEugene Elliott Modified over 9 years ago
1
EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08/10/13
2
The data we focus: Huge Collection of Logs Following a normal approach and landing to runway 4 in roc; aircraft was taxied clear of the end of runway to the gate.several ground snow removal vehicles were operating to left of aircraft so we moved to the right side of ramp.…. Each Document
3
Power of Text-Rich Data Cubes Hierarchical Data CubeText Analysis
4
Power of Text-Rich Data Cubes Data CubeRich Text Efficient Summarization Powerful Text Mining
5
Power of Text-Rich Data Cube
6
Other features Contextual SearchHierarchical Dimension Selection : support multiple choices Similar Document Finding : based on Contextual Search Keyword Frequency DistributionMulti-gram Summarization
7
Contextual Search Motivation: Every word/concept may have equivalent word/concept “SVM” = “Support Vector Machine”, “Alt” = “Altitude” Connections between words “Kernel Method” - “SVM”, “altitude” – “flight level”
8
Contextual Search We develop a contextual search framework to build the word-net Contains 4 different relationships: A “Use” B: Equivalent terms, B is more common A “RT” B: Related terms, not hierarchical A “BT” B: B is the broader word A “NT” B: B is the narrower word
9
Contextual Search Step 1: Generate word-net when uploading dataset. Step 2: Return the related terms when inputing. Step 3: Automatically include the equivalent terms when searching. Step 4: Operator Support “AND”/”OR”/”NOT”
10
Hierarchical Dimension Support Multiple Choice Support Each Dimension can support several levels Powerful examples: “B-737” VS. “B-747” “Boeing” VS. “Airbus”
11
Document List Result Using the default Mysql “natural language full text search” Extract the title based on the most relevant part. Show tags of dimension values for target dimensions Highlight the keywords
12
Similar Document Also contextual search Step 1: Extract meaningful terms from the original report Step 2: Using these terms as input, conduct contextual search.
13
Top Cells Search all the cells in the targeted dimensions, find the most relevant cells A multi-dimensional cell ranking
14
Single Dimension Distribution Based on Keywords
15
Using a offline + online framework to calculate the distribution. If Offline: Combination of keywords are exponential If Online: Retrieve the whole corpus every time. Strategy: Store the single keyword distribution in the database. [Offline] Combine the single ones to a new distribution online. [Online]
16
Single Dimension Distribution Based on Keywords Offline process: Step1: Map equivalent terms into one. Step2: Build both keyword reverse index and cell reverse index based on report Step3: Compare these two reverse indexes and calculate the single term distribution. Online process [with a list of terms and dimensions] Step1: match each term into it’s equivalent term. Step2: Calculate the combined distribution based on the independent assumption, for each dimension Val(t1..tn) = 1 –π(1-val(ti));
17
Topic Distribution Based on Topic Cube Applying topic model. Support comparison between different cells
18
Unigram/Multigram description Based on Qiaozhu’s paper, “Automatic Labeling of Multinomial Topic Models” Find multi-gram candidate from the whole text Scoring it based on unigram Adjust it based on it’s length
19
Thinking Data Cube: Efficient Summary Highly Structured Data. Rich Text: Topic Analysis, keyword search Common: ASRS, IMDB, Publication-Net, News… Network (HIN) Good at mining, contains structural information. No information loss
20
Motivation of EventCube Combine Data Cube with Rich Text. Combine Summary with Keyword Search Build a general search/analysis system for rich text cube data. 1. Aviation Safety Reporting Data Time, Weather, Location, Model…Flight logs 2. Publication Data Author, Conf, Time, Field, Affliation…Abstract 3. IMDB Time, Country, Style, Director…Description
21
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.