AstroTag MUG meeting, STScI December 2014. Data Tagging Storing associations between data sets and tags (words/phrases) – IPPPSSOOT {w_1, w_2, …, w_n}

Slides:



Advertisements
Similar presentations
Maurice Hendrix (Semi-)automatic authoring of AH.
Advertisements

Database Searching: How to Find Journal Articles? START.
Struggling or Exploring? Disambiguating Long Search Sessions
Natural Language Interfaces to Ontologies Danica Damljanović
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Taxonomies and Classification for Organizing Content Prentiss Riddle INF 385E 9/21/2006.
EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08/10/13.
Terrier Workshop: 24 th October 2007 Alasdair J G Gray.
Engineering Village ™ ® Basic Searching On Compendex ®
Skills: include images in Web pages Concepts: tag, attribute, value, path (to a stored file) This work is licensed under a Creative Commons Attribution-Noncommercial-Share.
LOCATING RESOURCES  Get to know the library  Know the library’s technology.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Learn how to search for information the smart way Choose your own adventure!
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
PSYB64: Library Resources Angela Hamilton
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Web 2.0: Concepts and Applications 4 Organizing Information.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
Semantic Publishing Update Second TUC meeting Munich 22/23 April 2013 Barry Bishop, Ontotext.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Vocabularies in the VO Alasdair J G Gray Norman Gray Iadh Ounis.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Internet Business Foundations © 2004 ProsoftTraining All rights reserved.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Track 1 – Part 1 What can we do to prepare the library of the future for researchers ? The Europeana Library Conference Madrid, December 2012.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Automatic Keyphrase Extraction (Jim Nuyens) Keywords are an everyday part of looking up topics and specific content. What are some of the ways of obtaining.
Grade 8 – Writing Standards Text Types and Purposes (1b) Write arguments to support claims with clear reasons and relevant evidence. Support claim(s) with.
Exploring and Mapping Vocabularies IVOA InterOp Semantics Session Trieste Italy, May 2008 Alasdair J. G. Gray.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
Rescue for the Technical Researcher & Writer. The Research Process 1.Planning the project 2.Selecting / refining a topic 3.Finding sources 4.Evaluating.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
New Tools for astronomy librarians D Donna Thompson SLA PAM Roundtable June 9, 2014.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
AGROVOC Thesaurus. 1980s: developed as multilingual structured thesaurus for agricultural terminology (“rice”) : parallel effort to express thesaurus.
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search Mathias Verbeke, Bettina Berendt,
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
APS Taxonomy Project Arthur Smith, American Physical Society April 2014.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
NDIIPP Access Project Building on Metadata NDIIPP Partner Meeting June 25, 2009.
MSG Reuse Catalog T.W. van den Berg 7 April 2010.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Third Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
SHAREPOINT METADATA & TAXONOMIES AUTOMATED
Lesson 6: Databases and Web Search Engines
IL Step 3: Using Bibliographic Databases
Lesson 6: Databases and Web Search Engines
Presentation transcript:

AstroTag MUG meeting, STScI December 2014

Data Tagging Storing associations between data sets and tags (words/phrases) – IPPPSSOOT {w_1, w_2, …, w_n} Tags have meaning – Identify types of objects within a data set E.g. Dwarf irregular galaxies – Identify data related to a particular area of study E.g. Galaxy interactions, galaxy formation

Tagging from a Thesaurus Tagging with concepts – Multiple labels in multiple languages can be used to express the same concept – Concept labels can change without changing concept/data set relations (& v.v.) Relationships between concepts add additional meaning – Broader term (BT), narrower term (NT), related term (RT), others

Generating Associations Concept labels Papers Datasets – Automatic matching of labels with papers/proposals – Manual association of papers with data sets Catalogs – Type, morphology, etc. User input (APT, etc.)

Thesaurus-enabled Features Hierarchical browsing Search (with browsing) Result Filtering Breadcrumbs Tag clouds * Customized data delivery * Topic ranking for data sets through citation mining.

Choosing a Tag Set Shape of tree: – Top-level terms – Depth – Poly-hierarchy Are the right concepts with the right labels available? – Astroparticle physics vs. Particle astrophysics What level of the concept tree do we tag at?

The UAT Unified Astronomy Thesaurus (astrothesaurus.org) Community authored/edited thesaurus, was maintained by CfA, but is now maintained by AAS Combines IVOAT, PACS, journal keywords Creative Commons Licensed

UAT Continued Pros – 15 top-level terms (easy to browse) – Community buy-in – Web standards (SKOS/RDF) Caveats – Not necessarily designed with tagging and search in mind – Sometimes consensus means slow to change – Combining RDF and relational data can be tricky

Thesaurus Evaluation UATIVOAT # concepts/labels1909 / / 3531 Max label length 8 “unidentified sources of radiation outside the solar system” 6 “low mass x-ray binary star” Found papers/progs99.1% / 87.3%~100% / 99.2% Not found papers/progs458 / / 74 Not found concepts/labels 433 (22.7%) / 969 (32.1%)548 (20%) / 745 (21.1%)

Next Steps Expanding vocabulary for search – Focusing on missing labels, not structure – How best to share data, collaborate/merge with UAT Tagging and search model

Open Questions Accuracy – Human / machine error will create bad associations. How harmful is a misidentification of a type X object? Completeness – Not everything will be tagged. How useful is a partial list of type X objects? Provenance – How best to provide the source of tags for a data set? E.g. The list of papers containing the tags / and data set associations.

Thanks!