Download presentation
Presentation is loading. Please wait.
Published byLeonard Perkins Modified over 9 years ago
1
Understanding Text Corpora with Multiple Facets Lei Shi, Furu Wei, Shixia Liu, Xiaoxiao Lian, Li Tan and Michelle X. Zhou IBM Research
2
Emergency Room Records
3
Hotel Reviews
4
Intelligence Reports
5
Email Documents
6
Financial News/Blogs/Message Boards
7
Outline Problem & Related Work Multi-Facet Text Data Model and Text Processing –Data model –Text pre-processing –Content summarization Visualization –Metaphor –Creation algorithm –Interactions Video Demo
8
Problem & Related Work It’s challenging to build a visual analytics tool to explain multi-faceted text corpora! –How to combine the raw text data with rich text analytics result for visualization? –What visual metaphors to apply to effectively illustrate text content, evolution and facet correlations? –How to customize interactions to assist user in data navigation and other visual analytics task? Related work –Text trend visualization ThemeRiver, NameVoyager, etc. –Text content visualization Tag cloud, Wordle, PhraseNet, etc. –Text entity pattern visualization TileBars, Jigsaw, FeatureLens, Takmi, etc. –Text visualization in specific domains Themail@email, TileBars@search,
9
Multi-Facet Data Model and Text Pre-Processing Multi-Facet Data Model for Text Corpora -- –Time Facet Explicit field or extracted from raw text –Category Facet Topic modeling by Latent Dirichlet Allocation (LDA, Blei et al. 2003) Category labels from document classification/clustering Leverage other nominal structured information (hotel names, countries, etc.) –Unstructured (Content) Facets Inherent multiple text fields Multiple facets from NE extraction (people, location, organization) or POS parsing (Noun, Verbs, Adjective) –Structured Facets Categorical, numerical or nominal data fields Other calculated categorical value (sentiment orientations, average ratings)
10
Content Facet Summarization A set of topics {T 1, …T i,… T N } A set of keywords {W 1, …, W j, …, W M } A set of topic probabilities {…, P(T i | D k ), …} A set of word probabilities {…, P(W j | T i ), …} kth document in the collection Rank the topics to present most valuable ones first Select keyword sub-set for each time segment for content summary {…} t-1, {…, W j, …} t, {…} t+1,
11
Doc-topic dist. Doc length Doc no. Content Facet Summarization Topic/category re-ranking by topic coverage and variance: find the most active topic with significant variety –Topic coverage: –Topic variance: –Balancing two metrics: Keyword re-ranking –Topic keyword re-ranking: –Time-sensitive keyword re-ranking: preserve completeness and distinctiveness Completeness: cover the original keywords of a topic Distinctiveness: distinguish one time segment from another Topic-keyword distribution Topic number
12
System Architecture Text Summarization Text Preprocessing Text content + meta data Visualization Text collection User Interaction Summarization results
13
Visualization Metaphors Multi-stack trend visualization + Time-sensitive tag clouds –Vis-data mappings: time facet – x (time) axis, category facet – stack, unstructured facets – tag clouds, structured facet – keyword style (color/font) –Other mappings: document count – y axis, re-ranked occurrence count -- keyword size Category Facet Time Unstructured Facets Structured Facets
14
Keywords Layout Keyword layout with the sweep-line greedy algorithm
15
Interactions Temporal zooming for time facet navigation Topic editing for category facet navigation Unstructured facet navigation panel Structured facet mapping Other customized interactions: topic focus-in-context view
16
Focus-In-Context View Calculation Constraints for detailed trend view –Contour-preserving –Flexible space control –All topic trends as undistorted as possible 1D fisheye distortion –Height calculation for expanded trend –Order-preserving height adjustment –Apply fisheye distortion from the center line of selected topic
17
Video Demo Visual Analytics for Emergency Room Record
18
18 Thank You Merci Grazie Gracias Obrigado Danke Japanese English French Russian German Italian Spanish Brazilian Portuguese Arabic Traditional Chinese Simplified Chinese Hindi Tamil Thai Korean
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.