Download presentation
Presentation is loading. Please wait.
Published byCynthia McDaniel Modified over 9 years ago
1
Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US
2
Part I: Challenge details
3
ePrints Database of around 1600 papers published by Pascal members Papers are described with: Authors (unique Pascal Id) Title Abstract (most papers) Publish date (some papers only have year)
4
Challenge Goal Two main goals: to test and compare different text visualization methods, ideas and algorithms on a common dataset, to contribute to the Pascal dissemination and promotion activities by using data about scientific publications from Pascal’s EPrints serverPascal’s EPrints server
5
Task Visualize and present the Pascal ePrints data in a novel way which enables: discovering main areas covered by the papers and people in Pascal, discovering area and people developments trough time, helping the researchers with recommendation on which papers to read, helping at finding the right reviewers for new papers.
6
Data Raw XML file from Pascal ePrints server Processed data for easier use: Bag-of-words (TextGarden, Matlab) Graph (Matlab, Pajek) Data processed for different possible scenarios.
7
Raw XML file Cleaned data from Pascal ePrints server. Data is given as a list of papers, each paper is described by: Title Abstract Year of publication List of authors Each Author is described by unique Pascal Id and institution. Synthesis of Maximum… In this presentation… Computati… Learning… Theory … Sandor Szedmak John Shawe-Taylor Universit…
8
Bag-of-words Covered scenarios: Document == Paper Document == Author Document == Institution Available formats: TextGarden Text file where one line equals one document Matlab Data available in form of sparse Term-Document matrix TextGarden ( www.textmining.net ): Format: Document_name !Subject DocumentList Example: Support_Vector_Machine_to_synthesise_kernels !Machine_Vision !Theory_and_Algorithms Support Vector Machine to synthesise kernels -- Suppose we are given two sets of … Matlab: Sparse matrix saved in text file, it can be simply read into Matlab by: X = spconvert(load(‘papers.dat’)); Documents are columns in the matrix Names of columns (document names) and rows (words) are provided.
9
Graph Covered scenarios: Vertex == Word, Edge == Co-Appearance Vertex == Author, Edge == Co-Authors Vertex == Institution, Edge == Collaboration Available formats: Matlab Data available in form of sparse adjacency matrix Pajek Software for network analysis Matlab: Sparse matrix saved in text file, it can be simply read into Matlab by: X = spconvert(load(‘words.dat’)); Names of vertices (words, authors, institutions) are provided. Pajek: Can be downloaded from: vlado.fmf.uni- lj.si/pub/networks/pajek
10
Submissions The results can be: images, movies, Web sites, VRML files, executables (windows, linux), etc. For interactive tool also provide a video, showing the use of the tool on the Pascal ePrints data.
11
Evaluation Usability of visualization – The goal is to assess usability of particular visualization in different practical contexts. Innovativeness – The goal is to estimate how innovative are the ideas used for visualization. Aesthetics of the image – Here we are aiming to identify the "nicest" images from the challenge. General Pascal-researchers’ voting over the web about "who likes what". Since all the criteria are subjective, we will hire experts for judging about the quality. Each of the criteria will generate a separate ranking.
12
Part II: Examples
13
Visualization example 1/2: Document Atlas Bag-of-words approach: Document == Author Author is described by a sum of all the abstracts from the papers he co-authored. We construct separate profile for papers from year 2004 and papers from year 2005.
14
Dimensionality reduction Documents are mapped from bag-of-words space to two dimensions in two steps: Latent Semantic Indexing: 13.000 dim => 110 dim Multidimensional Scaling 110 dim => 2 dim The background reflects the density of documents document
15
Background words Each part of the map is assigned a keyword which is most representative for the documents in the area. We get a “map” of the topics covered within the documents. In the case of Pascal ePrints data areas on the map correspond to the areas covered within the Pascal Network.
16
Time dynamics For each author we have profile for years 2004 and 2005 By showing the difference we can see how authors’ research focus developed between 2004 and 2005. gradient
17
Co-Authorships
18
Live Demo
19
Visualization example 2/2: IST World Web portal developed within IST World EU project Uses search and visualization methods to: discover the main research areas and collaborations within the PASCAL organizations produce recommendation on which papers to read (e.g. papers on image recognition, or kernel trick) find the right reviewers for a new paper (e.g a paper on "brain computer interface") and assess their competence
20
Research areas Institutions are placed on the map of research areas from Pascal Network Example shows which are the areas closely related to JSI
21
Collaborations Collaboration of institutions Collaboration of authors working on “text mining”
22
Paper Recommendation
23
Competence Search
24
Live Demo
25
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.