IAT 355 1 1 Text ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT]

Slides:

Advertisements

Similar presentations

Chapter 5: Introduction to Information Retrieval

Advertisements

INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Web Intelligence Text Mining, and web-related Applications

8.1 Types of Data Displays Remember to Silence Your Cell Phone and Put It In Your Bag!

Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda  Ranked retrieval Similarity-based ranking Probability-based ranking.

Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.

Information Retrieval Review

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.

Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.

Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.

The Vector Space Model …and applications in Information Retrieval.

14.1 Vis_04 Data Visualization Lecture 14 Information Visualization : Part 2.

Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.

Recommender systems Ram Akella November 26 th 2008.

Vocabulary Spectral Analysis as an Exploratory Tool for Scientific Web Intelligence Mike Thelwall Professor of Information Science University of Wolverhampton.

1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.

Chapter 5: Information Retrieval and Web Search

Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα

Advanced Multimedia Text Classification Tamara Berg.

Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.

Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,

Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,

CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

IAT 814 Trees Chapter 3.2 of Spence ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS +

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.

Amy Dai Machine learning techniques for detecting topics in research papers.

Chapter 6: Information Retrieval and Web Search

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Chapter 4c, Database H Definition H Structure H Parts H Types.

A Case Study of Interaction Design. “Most people think it is a ludicrous idea to view Web pages on mobile phones because of the small screen and slow.

Document Collections cs5984: Information Visualization Chris North.

Semantic Wordfication of Document Collections Presenter: Yingyu Wu.

Chapter 4 Working with Frames. Align and distribute objects on a page Stack and layer objects Work with graphics frames Work with text frames Chapter.

IAT Text ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT]

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

Information Retrieval

© 2011 Delmar, Cengage Learning Chapter 4 Working with Frames.

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.

Evidence from Content INST 734 Module 2 Doug Oard.

DATA OUTPUT  maps  tables. DATA OUTPUT output from GIS does not have to be a map many GIS are designed with poor map output capabilities types of output:

A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.

Citation-Based Retrieval for Scholarly Publications 指導教授：郭建明學生：蘇文正 M

Displaying Data  Data: Categorical and Numerical  Dot Plots  Stem and Leaf Plots  Back-to-Back Stem and Leaf Plots  Grouped Frequency Tables  Histograms.

1 CS 430: Information Discovery Lecture 5 Ranking.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.

Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.

1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.

A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.

Graphical Techniques.

Graphical Techniques.

Representation of documents and queries

Visualizing Document Collections

From frequency to meaning: vector space models of semantics

cs5984: Information Visualization Chris North

cs5984: Information Visualization Chris North

cs5984: Information Visualization Chris North

Presentation transcript:

IAT Text ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT] |

Jun 30, 2014 IAT Text is Everywhere We use documents as primary information artifact in our lives Our access to documents has grown tremendously in recent years due to networking infrastructure –WWW –Digital libraries –...

Jun 30, 2014 IAT How Can InfoVis Help? Example Specific Tasks Which documents contain text on topic XYZ? Which documents are of interest to me? Are there other documents that might be close enough to be worthwhile? What are the main themes of a document? How are certain words or themes distributed through a document? IAT 355 3

Jun 30, 2014 IAT Related Fields Information Retrieval –Active search process that brings back particular entities InfoVis –Perhaps not sure precisely what you’re looking for –Browsing task more than search IAT 355 4

Jun 30, 2014 IAT Challenge Text is nominal data –Does not seem to map to geometric presentation as easily as ordinal and quantitative data The “Raw data --> Data Table” mapping now becomes more important IAT 355 5

Jun 30, 2014 IAT Document Collections How to present document themes or contents without reading docs? Who cares? –Researchers –News people –CSIS –Market researchers IAT 355 6

Jun 30, 2014 IAT Vector Space Analysis How does one compare the similarity of two documents? One model –Make list of each unique word in document Throw out common words (a, an, the, …) Make different forms the same (bake, bakes, baked) –Store count of how many times each word appeared –Alphabetize, make into a vector IAT 355 7

Jun 30, 2014 IAT Vectors, Inner Products A The quick brown fox jumped over the lazy dog B The fox found his way into the henhouse C The fox and the henhouse are both words Vector A Vector B = 1 VectorB VectorC = 2 Thus B and C are most similar IAT quickbrownfoxjumplazydogfindhiswayhenhousebothwordSUM A B A.B 1 1 C B.C 1 1 2

Jun 30, 2014 IAT Vector Space Analysis Model (continued) –Want to see how closely two vectors go in same direction, inner product –Can get similarity of each document to every other one –Use a mass-spring layout algorithm to position representations of each document Similar to how search engines work IAT 355 9

Jun 30, 2014 IAT Some adjustments Not all terms or words are equally useful Often apply TF/IDF –Term Frequency / Inverse Document Frequency Weight of a word goes up if it appears often in a document, but not often in the collection IAT

Jun 30, 2014 IAT Process IAT

Jun 30, 2014 IAT Smart System Uses vector space model for documents –May break document into chapters and sections and deal with those as atoms Plot document atoms on circumference of circle Draw line between items if their similarity exceeds some threshold value »Salton et al Science ‘95 IAT

Jun 30, 2014 IAT Nov 20, Fall 2007 IAT

Jun 30, 2014 IAT Text Relation Maps Label on line can indicate similarity value Items spaced by length of section IAT

Jun 30, 2014 IAT Text Themes Look for sets of regions in a document (or sets of documents) that all have common theme –Closely related to each other, but different from rest Need to run clustering process IAT

Jun 30, 2014 IAT VIBE System Smaller sets of documents than whole library Example: Set of 100 documents retrieved from a web search Idea is to understand contents of documents relate to each other »Olsen et al Info Process & Mgmt ‘93 IAT

Jun 30, 2014 IAT Focus Points of Interest –Terms or keywords that are of interest to user Example: cooking, pies, apples Want to visualize a document collection where each document’s relation to points of interest is shown Also visualize how documents are similar or different IAT

Jun 30, 2014 IAT Technique Represent points of interest as vertices on convex polygon Documents are small points inside the polygon How close a point is to a vertex represents how strong that term is within the document IAT Term1 Term2Term3

Jun 30, 2014 IAT Example Visualization IAT laser plasma fusion

Jun 30, 2014 IAT VIBE Pro’s and Con’s Effectively communicates relationships Straightforward methodology and vis are easy to follow Can show relatively large collections Not showing much about a document Single items lose “detail” in the presentation Starts to break down with large number of terms (eg. 8 terms: octagon) IAT

InSpire Clusters Documents by word vectors K-means Clustering method: 1)Select K random docs (cluster centers) 2)For Each remaining document: Assign it to the closest of the above K docs (Creates K clusters) 3)For each cluster, compute “average” cluster center Repeat 2 and 3 until every doc stops moving from cluster to cluster Jun 30, 2014 IAT

K-means Thanks, Wikipedia! Jun 30, 2014 IAT

InSpire Jun 30, 2014 IAT

InSpire Jun 30, 2014 IAT

InSpire Clusters docs, then reports the most common TF/IDF words Presents docs in “Galaxy” display –Projects from high-dimensional space to 2D Jun 30, 2014 IAT

Jun 30, 2014 IAT Thanks: John Stasko IAT