Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantics, Syndication and Social Networks: Mechanisms for Future Structured Information Spaces Hamish Cunningham (University of Sheffield) Werner Haas.

Similar presentations


Presentation on theme: "Semantics, Syndication and Social Networks: Mechanisms for Future Structured Information Spaces Hamish Cunningham (University of Sheffield) Werner Haas."— Presentation transcript:

1 Semantics, Syndication and Social Networks: Mechanisms for Future Structured Information Spaces Hamish Cunningham (University of Sheffield) Werner Haas (Johaneum Research) Ant Miller (BBC) Libby Miller (University of Bristol) Ralph Traphoener (Empolis / Bertelsmann) Paul Warren (British Telecom)

2 What’s the difference between Mother Theresa and Tony Bliar? http://gate.ac.uk/http://gate.ac.uk/ http://nlp.shef.ac.uk/http://nlp.shef.ac.uk/ Hamish Cunningham Dept. Computer Science, University of Sheffield

3 3 Why semantic metadata? 1.Different types of metadata allow different types of search (but also incur different costs and have different limits) full text: "find me Nevsky in Bulgaria" taxonomy / thesaurus / semantic annotation / ontology: "find me churches in Eastern Europe" E.g. BBC's INFAX taxonomic system: 66% of searches would fail if only full text 2.The web promotes diversity but also fragmentation; there's too much of it; less and less impact for curated data In face of this cultural memory institutions need Syndication and mediation (to pool outlets and multiply impact); this means presentation-independent, multipurpose content Users as assistants (to cut the cost of metadata); this can mean shared conceptualisations of content How do we get there?

4 4 The semantic web and why you can't have it (yet) The semantic web is about a semantic layer for interoperability, machine-readability, inference – ideal for semantic libraries? Problems: 1.Construction and maintenance of shared taxonomies, terminologies & ontologies is expensive 2.Annotation of content relative to them is v. expensive 3.How does a machine tell the difference between "Mother Theresa is a Saint" and "Tony Blair is a Saint"? (Beyond the shallow and the general we get into typical AI problems, the contextual and shifting nature of meaning, etc.)

5 5 Four promising directions 1.Use recommender systems to make the users into curators’ assistants (who tells Google which page is important? other web users do, by linking; also Amazon) 2.Allow curators and users to DIY simple specific ontologies and KBs (targetted adjuncts to general models like CIDOC) 3.Use Information Extraction (IE) to populate semantic models 4.Ride the next wave of social software and on-line communities (Wikis, Bloggs, OSN, file sharing / P2P, RSS/ATOM)

6 6 IT context: the Knowledge Economy and Human Language Gartner, December 2002: taxonomic and hierachical knowledge mapping and indexing will be prevalent in almost all information-rich applications through 2012 more than 95% of human-to-computer information input will involve textual language A contradiction: to deal with the information deluge we need formal knowledge in semantics-based systems our archived history is in informal and ambiguous natural language The challenge: to reconcile these two phenomena

7 7 Human Language Formal Knowledge (ontologies and instance bases) (A)IE CLIE (M)NLG Controlled Language OIE Semantic Web; Semantic Grid; Semantic Web Services KEY MNLG: Multilingual Natural Language Generation OIE: Ontology-aware Information Extraction AIE: Adaptive IE CLIE: Controlled Language IE HLT: Closing the Loop

8 8 Information Extraction Information Extraction (IE) pulls facts and structured information from the content of large text collections. Contrast IE and Information Retrieval NLP history: from NLU to IE Progress driven by quantitative measures MUC: Message Understanding Conferences ACE: Advanced Content Extraction General Architecture for Text Engineering (GATE): http://gate.ac.uk/http://gate.ac.uk/

9 9 “The shiny red rocket was fired on Tuesday. It is the brainchild of Dr. Big Head. Dr. Head is a staff scientist at We Build Rockets Inc.” IE Example ST: rocket launch event with various participants NE: "rocket", "Tuesday", "Dr. Head“, "We Build Rockets" CO:"it" = rocket; "Dr. Head" = "Dr. Big Head" TE: the rocket is "shiny red" and Head's "brainchild". TR: Dr. Head works for We Build Rockets Inc.

10 10 Ontology-based IE XYZ was established on 03 November 1978 in London. It opened a plant in Bulgaria in … Ontology & KB Company type HQ establOn CityCountry Location partOf type “03/11/1978” XYZ London UK Bulgaria HQ partOf

11 11 A Necessary Trade-Off Domain specificity vs. task complexity: complexity specificity acceptable accuracy domain specific bag-of-words events general simple complex relations entities

12 12 Open information, defended communities Trend 1: seconds out, round 5: file sharing is about to go social Trend 2: the living room is about to be computerised What will happen when all your living room devices fold into a single PC? Bill Gates hopes you'll be running Windoze, but Consumer Electronics firms bet on Linux & stable hardware (no viruses, no crashes, cheap,...) What if these two trends combine? Ubiquitous on-line communities centred on shared content, with a model of trust What if memory institutions provide means of organising, explaining, interlinking the cross-over between modern popular culture and the curated memory? Important because DRM is the beginning of the end of civilisation as we know it (controls how you consume media you buy; has the potential to be linked with censorship and with invasive behaviour logging) you can't make digital objects behave like physical objects - unless you totally control the hardware and the operating system if someone has control, then we may end up finding that someone has given the contract for preserving our culture to Haliburton

13 13 Memory is not a luxury C21 st : all the C20 th mistakes but bigger & better? If you don’t know where you’ve been, how can you know where you’re going? Libraries, museums, archives: ammunition in the war on ignorance (more dangerous than “terror”?) Ammunition is useless if you can’t find it: new technology must make our history accessible to all, for all our futures

14 14 Summary Cultural memory can benefit from semantic metadata, presentation-independence and repurposing Semantic web technology: –no: it won’t make machines intelligent –perhaps: simple specific models can work Four ways to cross the AI bridge: DIY models; recommenders; IE; OSN + P2P This talk: http://gate.ac.uk/talks/ecdl-sept-2004.ppt http://gate.ac.uk/talks/ecdl-sept-2004.ppt More: http://gate.ac.uk/ ● Related projects: http://gate.ac.uk/


Download ppt "Semantics, Syndication and Social Networks: Mechanisms for Future Structured Information Spaces Hamish Cunningham (University of Sheffield) Werner Haas."

Similar presentations


Ads by Google