Where the Web Went Wrong Hamish Cunningham Dept. Computer Science, University.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
Human Language Technologies for the Semantic Web Department of Computer Science, University of Sheffield Fabio Ciravegna and Yorick Wilks.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
GATE, Human Language and Machine Learning Hamish Cunningham, Valentin.
The Semantic Web and Language Technology BT Exact, Martlesham Hamish Cunningham Department of Computer Science, University of Sheffield Friday October.
Interoperability Scenarios All Working Groups Meeting May, Rome, Italy.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
CS570 Artificial Intelligence Semantic Web & Ontology 2
All Presentation Material Copyright Eurostep Group AB ® The Semantic Web Made Simple David Price December 2004
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
GATE, SWAN and Semantic TV Hamish Cunningham Department of Computer Science, University of Sheffield.
Mining the web to improve semantic-based multimedia search and digital libraries
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Elearning in the semantic age : Emerging web technologies provides exciting new posibilities. : Standards focusing on semantics rather than syntax provides.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Samad Paydar Web Technology Laboratory Computer Engineering Department Ferdowsi University of Mashhad 1389/11/20 An Introduction to the Semantic Web.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Text mining and the Semantic Web Dr Diana Maynard NLP Group Department of Computer Science University of Sheffield.
After OWL: defacto standards for semantic technologies (or: what do you get for €40m EU research money?)
Ontology-Aware Information Extraction Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb.
06/03/'07 upd 04/03/08CmpE 588 Spring 2008 EMU1 Tools for Semantic Annotation Atilla ELÇİ Dept. of Computer Engineering Eastern Mediterranean University.
What’s the difference between Tony Blair and Mother Theresa? (Human Language Technology for Preservation return on investment)
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield Wednesday.
GATE technical workshop: introduction Hamish Cunningham Sheffield, March.
Software Architecture for Language Engineering (SALE) – where next? Hamish.
GATE, a General Architecture for Text Engineering Hamish Cunningham Department.
Practical RDF Chapter 1. RDF: An Introduction
1 Building Semantic Applications Paul Warren
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Language Technology for the Semantic Web OntoWeb5,Florida,October 17 th,2003 WP12: Language Technology Overview SIG5 Paul Buitelaar.
Survey of Semantic Annotation Platforms
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Semantics, Syndication and Social Networks: Mechanisms for Future Structured Information Spaces Hamish Cunningham (University of Sheffield) Werner Haas.
GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles]
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
Connecting different ethnomusicological archives with ethnoArc Maurice Mengel Music Archive of the Ethnological Museum, National Museum in Berlin (EMEM)
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP2 – Media Semantics and Ontologies.
Semantic Web, Web Services and Museums: Mapping the Road to Implementation John Perkins “MESMUSES Workshop” Florence, June 16-17, 2003.
1 Language Technologies (1) Diana Maynard University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
MUMIS Franciska de Jong & Thijs Westerveld University of Twente Multimedia Indexing and Searching.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
GATE, a General Architecture for Text Engineering Hamish Cunningham, Kalina Bontcheva, Valentin Tablan, Diana Maynard, Yorick Wilks.
A centre of expertise in digital information management Shaping the e-future? Grids, Web Services and Digital Libraries Professor Tony.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Discover What’s Been Missing Vicky Hampshire 18th January 2017
GATE and the Semantic Web
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

Where the Web Went Wrong Hamish Cunningham Dept. Computer Science, University of Sheffield Graz, May 2004

2(21) Contents The Web, presentation, and syndication A Semantic Web for eCulture –annoy half the audience –annoy the other half eCulture, metadata and human language –motivation –Information Extraction: quantified language computing –MUMIS, GATE,... Cultural memory is not a luxury

3(21) Syndication and Mediation The web promotes diversity, but also fragmentation Original web: separate content and presentation (“this is a header”, not “set in 20 point bold font”) Now: many incompatible/inaccessible interfaces Memory Institutions (museums, libraries, archives) need to: –pool their impact: syndication in networked communities –support repurposable content Therefore data must be presentation independent Candidate technologies: DC, CIDOC, XML, RSS, RDF, OWL (“semantic web”)...

4(21) Semantic Web (1) Memory Institutions (museums, libraries, archives) host massively diverse content Fortunately, the differences are primarily at the level of data structure and syntax. Significant conceptual overlaps exist between the descriptive schema used by memory institutions; elemental concepts such as objects, people, places, events, and the interrelationships between them are almost universal. Building semantic bridges between museums, libraries and archives: The CIDOC Conceptual Reference Model, T. Gill, April 2004 Therefore we can add a semantic metadata layer to provide generalised inter-institution resource location Syndication and mediation for free!

5(21) Semantic Web (2): good news and bad news The good news: SW focus of AI and metadata work The bad news: AI always fails How does the machine tell the difference between “Mother Theresa is a saint” and “Tony Blair is a saint”? (Or, who tells Google which statement is important?) Other web users do, by linking (also cf. Amazon) Two solutions to the AI problem: –allow curators and users to build their own (simple specific models can succeed, but the cost may be too high) –use recommender systems to make the user a curator’s assistant (researchers and students may barter for access) Any route to searchable content!

6(21) IT context: the Knowledge Economy and Human Language Gartner, December 2002: taxonomic and hierachical knowledge mapping and indexing will be prevalent in almost all information-rich applications through 2012 more than 95% of human-to-computer information input will involve textual language A contradiction: to deal with the information deluge we need formal knowledge in semantics-based systems our archived history is in informal and ambiguous natural language The challenge: to reconcile these two phenomena

7(21) Human Language Formal Knowledge (ontologies and instance bases) (A)IE CLIE (M)NLG Controlled Language OIE Semantic Web; Semantic Grid; Semantic Web Services KEY MNLG: Multilingual Natural Language Generation OIE: Ontology-aware Information Extraction AIE: Adaptive IE CLIE: Controlled Language IE HLT: Closing the Loop

8(21) Information Extraction Information Extraction (IE) pulls facts and structured information from the content of large text collections. Contrast IE and Information Retrieval NLP history: from NLU to IE Progress driven by quantitative measures MUC: Message Understanding Conferences ACE: Advanced Content Extraction

9(21) “The shiny red rocket was fired on Tuesday. It is the brainchild of Dr. Big Head. Dr. Head is a staff scientist at We Build Rockets Inc.” IE Example ST: rocket launch event with various participants NE: "rocket", "Tuesday", "Dr. Head“, "We Build Rockets" CO:"it" = rocket; "Dr. Head" = "Dr. Big Head" TE: the rocket is "shiny red" and Head's "brainchild". TR: Dr. Head works for We Build Rockets Inc.

10(21) Performance levels (Extensive quantitative evaluation since early ’90s; mainly on text, ASR; now also video OCR) Vary according to text type, domain, scenario, language NE: up to 97% (tested in English, Spanish, Japanese, Chinese, others) CO: 60-70% resolution TE: 80% TR: 75-80% ST: 60% (but: human level may be only 80%)

11(21) Ontology-based IE XYZ was established on 03 November 1978 in London. It opened a plant in Bulgaria in … Ontology & KB Company type HQ establOn CityCountry Location partOf type “03/11/1978” XYZ London UK Bulgaria HQ partOf

12(21) Entity Person … Job-title president chancellor minister … G.Brown “ Gordon Brown met George Bush during his two day visit. Classes, instances & metadata Classes+instances before Bush 1.html 0 12 Gordon Brown …#Person …#Person George Bush …#Person …#Person67890 Classes+ instances after

13(21) An example: the MUMIS project Multimedia Indexing and Searching Environment Composite index of a multimedia programme from multiple sources in different languages ASR, video processing, Information Extraction (Dutch, English, German), merging, user interface University of Twente/CTIT, University of Sheffield, University of Nijmegen, DFKI, MPI, ESTEAM AB, VDA An important experimental result: multiple sources for same events can improve extraction quality –PrestoSpace applications in news and sports archiving

14(21) Semantic Query Not “goal Beckham” (includes e.g. missed goals, or “this was not a goal”) Instead: “goal events with scorer David Beckham”

15(21) The results: England win!

16(21) GATE, a General Architecture for Text Engineering is... An architecture A macro-level organisational picture for LE software systems. A framework For programmers, GATE is an object-oriented class library that implements the architecture. A development environment For language engineers, a graphical development environment. GATE comes with... Free components, and wrappers for other peoples’ stuff Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc. Free software (LGPL) at Used by thousands of people at hundreds of sites

17(21) A bit of a nuisance (GATE users) GATE team projects. Past: Conceptual indexing: MUMIS: automatic semantic indices for sports video MUSE, cross-genre entitiy finder HSL, Health-and-safety IE Old Bailey: collaboration with HRI on 17th century court reports Multiflora: plant taxonomy text analysis for biodiversity research e-science ACE / TIDES: Arabic, Chinese NE JHU summer w/s on semtagging EMILLE: S. Asian languages corpus hTechSight: chemical eng. K. portal Present: Advanced Knowledge Technologies: €12m UK five site collaborative project SEKT Semantic Knowledge Technology PrestoSpace MM Preservation/Access KnowledgeWeb Semantic Web Future: New eContent project LIRICS Thousands of users at hundreds of sites. A representative sample: the American National Corpus project the Perseus Digital Library project, Tufts University, US Longman Pearson publishing, UK Merck KgAa, Germany Canon Europe, UK Knight Ridder, US BBN (leading HLT research lab), US SMEs inc. Sirma AI Ltd., Bulgaria Stanford, Imperial College, London, the University of Manchester, UMIST, the University of Karlsruhe, Vassar College, the University of Southern California and a large number of other UK, US and EU Universities UK and EU projects inc. MyGrid, CLEF, dotkom, AMITIES, Cub Reporter, EMILLE, Poesia...

18(21) GATE – infrastructure for semantic metadata extraction Combines learning and rule-based methods (new work on mixed-initiative learning) Allows combination of IE and IR Enables use of large-scale linguistic resources for IE, such as WordNet Supports ontologies as part of IE applications - Ontology-Based IE Supports languages from Hindi to Chinese, Italian to German

19(21) PrestoSpace Semantics Architecture EN Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Sources IE IT Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Formal Text Signal md, Transcr- iptions ASR, etc. Forma l Text Forma l Text Forma l Text Forma l Text Forma l Text Forma l Text Forma l Text Forma l Text Forma l Text Forma l Text Forma l Text AV Signals Merging Final Annotations Forma l Text Forma l Text Forma l Text Anno- tations Multilingual Conceptual Q & A... Ontology- Based Metadata

20(21) Memory is not a luxury C21 st : all the C20 th mistakes but bigger & better? If you don’t know where you’ve been, how can you know where you’re going? Archives: ammunition in the war on ignorance Ammunition is useless if you can’t find it: new technology must make our history accessible to all, for all our futures

21(21) Links This talk: Related projects: