Seeing Things in the Clouds over concept lattices with tag clouds browsing semi-structured data Bernd Fischer object attribute context table relation Galois.

Slides:



Advertisements
Similar presentations
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Advertisements

eClassifier: Tool for Taxonomies
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Organisation Of Data (1) Database Theory
XML: Extensible Markup Language
Teacher-Administered Testlets
Chapter 5: Introduction to Information Retrieval
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Management Information Systems, Sixth Edition
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Geographic Information Systems
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Tutorial 8 Sharing, Integrating and Analyzing Data
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
PubMed/How to Search, Display, Download & (module 4.1)
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
ACOT Intro/Copyright Succeeding in Business with Microsoft Excel
10-1 aslkjdhfalskhjfgalsdkfhalskdhjfglaskdhjflaskdhjfglaksjdhflakshflaksdhjfglaksjhflaksjhf.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
Defining Text Mining Preprocessing Transforming unstructured data stored in document collections into a more explicitly structured intermediate format.
PubMed/How to Search, Display, Download & (module 4.1)
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Populating an XML instance document with data from Excel 1.Create an instance document skeleton containing at least 2 elements (with attribute tags) 2.Import.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
Relational Databases (MS Access)
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Concepts of Database Management Seventh Edition
WISER : OxLIP+ Workshops in Information Skills and Electronic Research Oxford Libraries Information Platform Craig Finlay Gillian Beattie.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
+ Information Systems and Databases 2.2 Organisation.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Modul 4 Struktur Informasi Mata Kuliah Preservasi Informasi Digital.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Chapter 10 Database Management. Data and Information How are data and information related? p Fig Next processing data stored on disk Step.
DATA Spatial Data – where things are Non Spatial Data or Attribute Data – What things are Data in a computer database are managed and accessed through.
Conceptualization Relational Model Incomplete Relations Indirect Concept Reflection Entity-Relationship Model Incomplete Relations Two Ways of Concept.
What’s new in ADO 2.5 Greg Hinkel Program Manager Data Access Group
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Information Architecture
Ricardo EIto Brun Strasbourg, 5 Nov 2015
Microsoft Office Access 2010 Lab 3
GO! with Microsoft Office 2016
Text Based Information Retrieval
Information Retrieval and Web Search
GO! with Microsoft Access 2016
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Geographic Information Systems
Information Retrieval and Web Search
Database Vocabulary Terms.
Information Retrieval
CS 430: Information Discovery
MANAGING DATA RESOURCES
CSE 635 Multimedia Information Retrieval
Magnet & /facet Zheng Liang
Introduction to Information Retrieval
Tutorial 7 – Integrating Access With the Web and With Other Programs
Presentation transcript:

Seeing Things in the Clouds over concept lattices with tag clouds browsing semi-structured data Bernd Fischer object attribute context table relation Galois connection knowledge discovery mining software repositories join focus visualization information retrieval meet navigation Stellenbosch Computer Science

How do you find stuff on the Internet? concept-based browsing query

How do you find stuff on the Internet? Yikes! results!

How do you find stuff on the Internet? concept-based browsing query     lattice

How do you find stuff you didn’t look for? Retrieval: extract objects that satisfy a pre-defined criterion query describes criterion main operation is matching: check satisfaction against query main goal is precision: show only relevant objects Browsing: spontaneously explore a collection focus describes current position and selection main operation is navigation: change the focus main goal is recall: show all relevant objects

How do you browse? (hierarchical) navigation structure focus selection

How do you browse semi-structured data? What is semi-structured data? What is structured data? Structured data has a very high degree of regularity... an explicit, tight format (schema) Typical examples: spreadsheets relational databases (SQL: structured query language)

How do you browse semi-structured data? What is semi-structured data? Semi-structured data contains both free-text and formatted fields... has large structural variance... is implicitly formatted Typical examples: product reviews newspaper articles + meta-data revision control logs

Approach: find a suitable abstract data representation –bag-of-words, graphs, binary relations, RDF triples, XML,... find a suitable hierarchy –metric spaces, graphs, concept lattices,... find a suitable visual representation –lists, graphs, tag clouds, city scapes,... find a navigation algorithm How do you browse semi-structured data?

How do you represent data? Structured data is represented by n-ary relations or tables: each object becomes a row each column represents an attribute type text remains unstructured authortitleyearvenue FischerSpecification-based browsing J. ASE van ZijlSupernondeterministic finite CIAA

How do you represent data? Structured data is represented by n-ary relations or tables: each object becomes a row each column represents an attribute type text remains unstructured set-valued attributes require normalization authortitleyearvenue FischerSpecification-based browsing J. ASE van ZijlSupernondeterministic finite CIAA GreeneConceptCloud: A Tag-cloud FSE FischerConceptCloud: A Tag-cloud FSE

How do you represent data? Structured data is represented by n-ary relations or tables: each object becomes a row each column represents an attribute type text remains unstructured set-valued attributes require normalization Semi-structured data can be represented by binary relations: text is split into words each occurring value and word becomes an attribute build context table: add cross if attribute applies to object –word appears in document, meta-data, references... idtitleyearvenue 08Specification-based browsing J. ASE 15Supernondeterministic finite CIAA 42ConceptCloud: A Tag-cloud FSE idauthor 08Fischer 15van Zijl 42Greene 42Fischer Greenevan Zijlbrowsingtag ××× 15 ×× 42 ×××××

How do you find hierarchy in relations? Formal concept analysis: formal context: (O, A, ~) FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ×××××

How do you find hierarchy in relations? Formal concept analysis: formal context: (O, A, ~) common attributes: α(O) = { a ∈ A | ∀ o ∈ O : o ~ a } α({08, 42} = FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× α({08, 42} = {Fischer, browsing}

How do you find hierarchy in relations? Formal concept analysis: formal context: (O, A, ~) common attributes: α(O) = { a ∈ A | ∀ o ∈ O : o ~ a } common objects: ω(A) = { o ∈ O | ∀ a ∈ A : o ~ a } concept: (O, A) s.t. α(O) = A ∧ ω(A) = O FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× α({08, 42} = {Fischer, browsing} ω({Fischer, browsing} FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× ω({Fischer, browsing} = {08, 42} extentintent

FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× How do you find hierarchy in relations? Formal concept analysis: formal context: (O, A, ~) common attributes: α(O) = { a ∈ A | ∀ o ∈ O : o ~ a } common objects: ω(A) = { o ∈ O | ∀ a ∈ A : o ~ a } concept: (O, A) s.t. α(O) = A ∧ ω(A) = O {08} {F,browsing,’00} {42} {F,G,browsing,tag,’14} {08, 42} {F,browsing} {42} {tag}   extentintent

FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× How do you find hierarchy in relations? Formal concept analysis: formal context: (O, A, ~) common attributes: α(O) = { a ∈ A | ∀ o ∈ O : o ~ a } common objects: ω(A) = { o ∈ O | ∀ a ∈ A : o ~ a } concept: (O, A) s.t. α(O) = A ∧ ω(A) = O sub-concept ordering: (O ₁, A ₁ ) ≤ (O ₂, A ₂ ) iff O ₁ ⊆ O ₂ iff A ₁ ⊇ A ₂ concept lattice: concepts of a context form a complete lattice {08} {F,browsing,’00} {42} {F,G,browsing,tag,’14} {08, 42} {F,browsing} {42} {tag}  

Are we there yet? Nope. Concept lattices induce enough structure for navigation but too much to show directly!

How do you visualize concept lattices? Approach: don’t show the lattice use concepts as focus visualize only focus concept –but in relation to lattice

How do you visualize concepts? Approach: don’t show the lattice use concepts as focus visualize only focus concept –but in relation to lattice use extent to derive tag cloud

How do you build tag clouds for concepts? What is a tag cloud? visual representation of text data – summarize large data set – emphasize important tags single words or short phrases importance reflected as size – frequency in document – number of tagged items – number of page hits different layout methods

How do you build tag clouds for concepts? intent looks like tag cloud but is common to all objects ⇒ all tags same size instead: collect all attributes from all objects in extent –can be expressed in concept lattice: –also add extent via object identifiers intent shown as largest tags –smaller tags are related information {08, 42} {Fischer,browsing} FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× browsing Fischer Greene tag  

The ConceptCloud Browser by: Gillian Greene, US file message date author controls

The ConceptCloud Browser most prolific contributor

How do you navigate with tag clouds? Navigation modes: refinement: narrow the selection –select a new tag widening: extend the selection –remove a selected tag

How do you navigate with concept lattices? Navigation modes: refinement: narrow the selection –select a new tag: f’ = f ∧ δ(t) widening: extend the selection –remove a selected tag (ω({t}), α(ω({t}))) if t ∈ A δ(t) = (α(ω({t})), ω({t})) if t ∈ O focus concept tag concept focus concept focus concept

How do you navigate with concept lattices? Navigation modes: refinement: narrow the selection –select a new tag: f’ = f ∧ δ(t) widening: extend the selection –remove a selected tag: f’ = f ∨ δ(t) (ω({t}), α(ω({t}))) if t ∈ A δ(t) = (α(ω({t})), ω({t})) if t ∈ O tag concept focus concept tag concept focus concept focus concept focus concept

How do you navigate with concept lattices? Navigation modes: refinement: narrow the selection –select a new tag: f’ = f ∧ δ(t) widening: extend the selection –remove a selected tag: f’ = f ∨ δ(t) f’ = ∧ i ∈ π(f) \ {t} δ(i) –join-based widening can be useful as well (ω({t}), α(ω({t}))) if t ∈ A δ(t) = (α(ω({t})), ω({t})) if t ∈ O tag concept focus concept tag concept focus concept  

Navigation in the ConceptCloud Browser

The Percept Browser by: Carl Kritzinger, Fireworks

Semi-structured data is common but hard to analyze Tag clouds are a good visualization approach and the combination with concept lattices makes it easy to navigate and find related information Flexible approach, generic tool –different data sets –different types of contexts ( ⇒ different types of analysis) Scalability –DBLP, IMDb, Wikipedia? Customizability –context extraction –tool scripting Conclusions & Future Work

Semi-structured data is common but hard to analyze Tag clouds are a good visualization approach and the combination with concept lattices makes it easy to navigate and find related information Flexible approach, generic tool –different data sets –different types of contexts ( ⇒ different types of analysis) Scalability –DBLP, IMDb, Wikipedia? Customizability –context extraction –tool scripting

How do you find stuff you didn’t look for? Retrieval: extract objects that satisfy a pre-defined criterion query describes criterion main operation is matching: check satisfaction against query main goal is precision: show only relevant objects Browsing: spontaneously explore a collection focus describes current position and selection main operation is navigation: change the focus main goal is recall: show all relevant objects “That’s funny...” “Bingo!”

How do you find hierarchy in relations? Formal concept analysis: formal context: (O, A, ~) common attributes: α(O) = { a ∈ A | ∀ o ∈ O : o ~ a } common objects: ω(A) = { o ∈ O | ∀ a ∈ A : o ~ a } concept: (O, A) s.t. α(O) = A ∧ ω(A) = O FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× α({08, 42} = {Fischer, browsing} ω({Fischer, browsing} FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× FischerGreenevan Zijlbrowsingtag ××× 15 ×× 42 ××××× ω({Fischer, browsing} = {08, 42} extentintent “That’s funny...”

What is a tag cloud? How do you build tag clouds for concepts?

What is a tag cloud? visual representation of text data – summarize large data set – emphasize prominent tags single words or short phrases importance reflected as size – frequency in document – number of tagged items – number of page hits different layout methods

How do you navigate? Navigation modes: refinement: narrow the selection widening: extend the selection

How do you navigate with concept lattices? Navigation modes: refinement: narrow the selection –select a new tag: f’ = f ∧ δ(t) widening: extend the selection –remove a selected tag: f’ = f ∨ δ(t) (ω({t}), α(ω({t}))) if t ∈ A δ(t) = (α(ω({t})), ω({t})) if t ∈ O tag concept focus concept tag concept focus concept focus concept   focus concept

How do you navigate with concept lattices? Navigation modes: refinement: narrow the selection –select a new tag: f’ = f ∧ δ(t) widening: extend the selection –remove a selected tag: c’ = ∧ i ∈ π(c) \ {t} δ(i) –join-based widening can be useful as well (ω({t}), α(ω({t}))) if t ∈ A δ(t) = (α(ω({t})), ω({t})) if t ∈ O tag concept focus concept tag concept focus concept

Navigation in the ConceptCloud Browser