By Borys Omelayenko, Ph.D.. Data and vocabularies Data is isolated Vocabularies link it to the world give some extra meaning to build new applications.

Slides:



Advertisements
Similar presentations
Support.ebsco.com Searching the Petroleum Abstracts TULSA ® Database Tutorial.
Advertisements

Access 2007 ® Use Databases How can Microsoft Access 2007 help you to get and stay organized?
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
Help the users find what they need using the Search Speaker: Frédérique Harmsze 15 th November 2014 Host: Matthew Hughes.
Advanced Searching Engineering Village.
CUSTOMER CENTRIC UNECE, Paris, June 30, 2010 Gerry McGovern
THAT STRANGER IS NOT YOUR FRIEND Facebook 101:. Housekeeping This presentation is not exhaustive It is just enough information to get you started; making.
NTIS on Engineering Village. What is the NTIS Database? The NTIS Database is the main resource for accessing the latest research.
Engineering Village ™ ® Basic Searching On Compendex ®
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Learn how to search for information the smart way Choose your own adventure!
There is a certain way that an HTML file should be set up. The HTML section declares a beginning and an ending. Within the HTML, there should be a HEAD.
Down the Rabbit Hole Searching, not Floundering, at the Libe.
Welcome to a guided tour of Oxford African American Studies Center. Please click the forward arrows to advance to the next section or click on a topic.
Academic Computing Services 2010 Microsoft ® Office Visio ® 2007 Training Get to know Visio.
1 Access Lesson 5 Creating and Modifying Reports Microsoft Office 2010 Introductory Pasewark & Pasewark.
Let’s Tour Washington D.C. Making Sure A Student Has Resources.
Education Google Calendar (GCal) English. Education Upon completion of this course, you will be able to:  Navigate the GCal interface  Search your calendar.
Instructional Guide Original presentation created by EasyBib, adapted by S. Hall for educational purposes following Fair Use Guidelines and permission.
Event Manager Training Part 3.  Edit Event Options - Customize FY11 Sites  Edit Event Webpages  Sending s (Recruitment/Engagement)  Help and.
ISquad - del.icio.us Session 1 of 2 Getting going with del.icio.us.
Include and Exclude (+ and -) There is no space between the + and the word, but there is a space between words.
Access 1998 Chadwyck-Healey, Inc. Stephen Rhind-Tutt, President October, 1998.
What is YouTube? - Why YouTube? - 8 Tips for Optimizing YouTube for SEO - How to Post to YouTube - Anatomy of a YouTube Upload Page - Video Content.
Conducting Research on the Web. This presentation will teach you about:  Different types of search engines  How to search on the Internet  How to cite.
End of Day Tasks!. THE CHALLENGE…. Select either a list or a library to create Add x3 items or documents one item or document inside of.
Now, please open your book to page 60, and let’s talk about chapter 9: How Data is Stored.
My Workspace ELearning in Sakai Randy Graff, PhD HSC Training.
A lesson approach © 2011 The McGraw-Hill Companies, Inc. All rights reserved. a lesson approach Microsoft® Excel 2010 © 2011 The McGraw-Hill Companies,
INTRODUCTION TO QUERIES William Klingelsmith. Reminders MyITLab Lesson C due 10/22 Homework #4 (Nielson TV Problem) due 10/26 Last day to drop is also.
Retrieval 1/2 BDK12-5 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
SEO Who knew 3 letters could mean so much?. What is SEO? Search Engine Optimization (SEO) is the practice of improving and promoting a web site in order.
PowerPoint For teachers, administrators, students, and all others who want to make great presentations by George Pilling, Supervisor of Library Media.
INTRODUCTION TO QUERIES William Klingelsmith. Reminders MyITLab Lesson C due 10/22 Homework #4 (Nielson TV Problem) due 10/26 Last day to drop is also.
HEILBRUNN’S TIMELINE OF ART HISTORY
EndNote: The Next Steps Rebecca Starkey Reference Librarian The Joseph Regenstein Library
EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker.
1 Lesson 18 Managing and Reporting Database Information Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Build a database V: Create forms for a new Access database Overview: A window into your data So far in this series of courses, you’ve built tables, relationships,
10.11 Data Manipulation 3 Reports. You will need… Specimen 2007 Paper 2 Task C (a PDF) q 43 Read this question carefully before you start The database.
Charnelle Bacon & Brandon Carr. Benefits of a Social Web Share Create Connect  The social web is a place that one can share a multiplex of information,
Created by Branden Maglio and Flynn Castellanos Team BFMMA.
Microsoft® Excel Create an Excel table. 1 Work with the Table Tools Design tab. 2 Sort and filter records in a table. 3 Identify structured references.
Tableau Server URL Parameterization and Limits. Background This short set of material covers how Tableau Server Views can be invoked via URLs while passing.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
+ Publishing Your First Post USING WORDPRESS. + A CMS (content management system) is an application that allows you to publish, edit, modify, organize,
 SEO Terms A few additional terms Search site: This Web site lets you search through some kind of index or directory of Web sites, or perhaps both an.
You spoke © 2008 Acquire Media We listened...
Wikis. What are Wikis? Could this be a Wiki? MoT0Ehttp:// MoT0E.
We build high quality innovative components, plug-in and modules for Joomla. Businesses all around the World use our products and services to create the.
Learning Objectives 1.Students will be able to identify and implement three different strategies for when they are getting too many sources in their search.
© Rush University Medical Center Secrets of FatWire Revealed! June 4, 2010.
DIALOGBRIEFING Training DataStarWeb Beginners: A Remote Learning Course.
Video #12 The hottest methods for attracting traffic to your Amazon Affiliate site.
TechKnowlogy Conference August 2, 2011 Using GoogleDocs for Collaboration.
Presentation by Giorgos Theodoridis. WordPress is a free web software you can use to create a beautiful website, blog, or app, (CMS) based on PHP and.
Working with Data DATA VISUALIZATION WITH TABLEAU ONLINE TUTORIAL DATA VISUALIZATION WITH TABLEAU ONLINE TUTORIAL Training Guide Fundamentals From TechAndMate.
Jacynthe Touchette, MSI JGH Health Sciences Library
Information Architecture
Searching the Petroleum Abstracts TULSA® Database
Dreamweaver MX Lesson 14: Using Find and Replace.
Taxonomy-Driven Web UI
Lesson 23 Managing and Reporting Database Information
Single Sample Registration
Alison Valk Georgia Tech
Webropol events – getting started 5
Data Mining Chapter 6 Search Engines
Studbook Institution List
Download from Zotero Home Page
You spoke... We listened... © 2008 Acquire Media
Presentation transcript:

By Borys Omelayenko, Ph.D.

Data and vocabularies Data is isolated Vocabularies link it to the world give some extra meaning to build new applications Tetracycline works by stopping the growth of bacteria.

What it is all about TitleCategoryLocation PotCookingParis AxeToolsLyon ChurchBuildingSofia LabelPpl. Paris2.1m Sofia1.2m LabelDescr. ToolsHummer FoodMeat Data records Vocabulary terms Vocabulary terms Inside a company or a private cloud

What are they, practically?

Vocabulary Linguistics “all the words known and used by a particular person” Source: Computing Vocabulary is a database of terms, known and used by your system

Term ‘Human’

Terms ‘Paris’ and ‘Sofia’

Source: Chemical

Source: Drug

Vocabulary Is an additional database Made by somebody else Focused and specialized Certain aspect of the terms Handful relations May have millions terms You want to use it It can quickly add up to your data Drag your data out of isolation Bring added value to your customers Next: three use cases

How to improve recall

Search results

Object page: женщина париж How did we found it?

Enriched records in SOLR Paris Paris 04 Paris Île-de-France France Woman Documents from Paris, don’t need to mention ‘France’, they will still be found on a query for ‘France’ Broader for ‘woman’ France Франция Frankrijk Paris Париж Parijs Orphelinat des postes … Woman Женщина Vrouw Population structure Структура населения

How to improve precision

Museums MultiMedian Dutch R&D project Completed in 2006 Link together A dozen museum databases A dozen vocabularies Source: e-culture.multimedian.nl (June 2015)

And try to navigate Autocomplete Searches these databases Groups Artefacts (data documents) Terms Combines them into Informative autocomplete Source: e-culture.multimedian.nl (June 2015)

Graph scheme Person Artefact Style Derain Portrait of Matisse Modern Place Event Paris worked in author title made at label name (label) label made in worked at held at associated with participated at More-or-less AAT structure

Source: e-culture.multimedian.nl (June 2015) Derain Clustered results Lets search Interesting pattern, without Derain but very relevant. Would never be found with text search

How to link documents to terms?

Lets search for terms Paris geonames.org/123 Looks simple: search for ‘Paris’ on geonames.org

Bring me to Paris! How many? Source: (June 2015) Which one?

Disambiguation tip: where Population Paris, Paris 04, Paris all nested Choose the most specific one London would become Westminster

Disambiguation tip: where Athens Greece Georgia (Georgia US, not Georgia) Choose by the country of data Or use time constraints (antique) Colonization Duplicates in place names Filtering Drop what you don’t need Administrative units for museums Skip parks, rivers, etc. Keep parks for a hiking web site

Disambiguation tip: what In culture they often use geoCultural origin Ancient Greece is limited in time & space But they mix up ‘what’ & ‘where’  Village in middle of France 20 residents and a million artefacts It was called ‘Roman’ Tip: use new links put records on a map

Disambiguation tip: who There are millions people involved in culture Names are often ambiguous Tips: Compare year of painting to birth-death Look at the ‘death’ field

Disambiguation tip: when Maybe the easiest Fairly limited area One-dimensional Many numerical values, like ‘early 13 th century’ Just stay in the past A museum was dating objects with future dates Tip: use new links Put your records on a timeline

How far we can go on cheap Fully automatic tagging Europeana 18.7 m records (2009) In total 11,2 m out of 18,7 m records gets at least one link. borys.name/blog/semantic_tagging_of_europeana_data.html VocabularySizeSOLR fieldLinks, m Whopainters from Wikipedia10,000creator0.01 WhatGEMET10,000subject2.4 WhereGeonames140,000coverage2.8 WhenSemium Time2,500date7.9 Each link is 20+ multilingual synonyms

Future Database theory gave us modern databases It’s time for the graph theory