Download presentation
Presentation is loading. Please wait.
Published byWilliam Simon Modified over 9 years ago
1
1 1 Why and how is this a “related document”?: Semantics-based analysis of and navigation through heterogeneous text corpora Bettina Berendt & Daniel Trümper (KU Leuven / HU Berlin) Blaž Fortuna, Marko Grobelnik & Dunja Mladeni č (JSI Ljubljana) www.cs.kuleuven.be/~berendt
2
2 2 ICT Motivation: Global+local interaction; beyond “similar documents“ with respect to what?
3
3 3 1. News and blogs Application motivation: Beyond dedicated search engines (Lloyd et al., Proc. CAAW 2006; Berendt et al., Kommunikation, Partizipation und Wirkungen im Social Web, 2008; Berendt, Fortuna et al., in prep.) 2. Multilingual sources Good results in semi-automatic ontology learning based on simple machine translation
4
4 4 PASCAL motivation: Re-use Textgarden‘s bread&butter and advanced tools n Text to bag-of-words n Ontogen http://www.textmining.net http://ontogen.ijs.si/
5
5 5 Solution vision: PORPOISE – Sailing the Internet Global Analysis Search Local analysis
6
6 6 Solution approach: Architecture & states overview Construct composite-similarity neighbourhood * Select Document * Aspect-based similarity search * Build ontology Select neighbour- hood * Search Global Analysis Local analysis Data / tool External Textgarden tool User action Created in this project * Refocus * Source doc.s database * Ont. Learning (Ontogen) Import ontology * Web Retrieval & Preprocessing * Specify sources & filters *
7
7 7 Retrieval and preprocessing Crawler / wrapper * (uses Blogdigger) Translator * (uses Babelfish) Preprocessing (Txt2Bow) NER (GATE) Similarity Computation * Web Source doc.s database Retrieval & Preprocessing
8
8 8 Ontology learning (1)
9
9 9 Ontology learning (2)
10
10 Ontology learning (3)
11
11 Inspection of ontology and instances
12
12 Inspection of documents
13
13 More on documents
14
14 The neighbourhood of a document
15
15 Constructing the similarity measure & neighbourhood (I)
16
16 Constructing the similarity measure & neighbourhood (II)
17
17 Constructing the similarity measure & neighbourhood (III) A news source A German- language blog Most neighbours are blogs Most neighbours are English- language blogs English blog German blog English news
18
18 Comparing documents
19
19 Comparing documents; utilizing multilingual sources
20
20 Refocusing
21
21 Structuring a neighbourhood
22
22 Ex.: Finding a “story“ Evaluation? User studies!
23
23 “Pump-priming“: PORPOISE as catalyst Using PASCAL software for analyzing social-media doc.s Using PASCAL software for analyzing multilingual social-media doc.s Analyzing blogs and news PORPOISE PORPOISE+: More fine-grained sailing STORYGROWTH: Tracking concept and community evolution Supporting constructive search DM4E: “More constructive search“
24
24 Finally... could I express it better? Mood: presentation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.