Presentation is loading. Please wait.

Presentation is loading. Please wait.

DBpedia - A Crystallization Point

Similar presentations


Presentation on theme: "DBpedia - A Crystallization Point"— Presentation transcript:

1 DBpedia - A Crystallization Point
for the Web of Data Junghee - Han

2 Outline The DBpedia Project Understanding Linked Data
The DBpedia Knowledge Extraction Framework The DBpedia Knowledge Base Accessing the DBpedia Knowledge Base Applications facilitated by DBpedia DBpedia - A Crystallization Point for the Web of Data

3 The DBpedia Project DBpedia
위키피디아로부터 구조화된 정보를 추출하고, 이를 웹에서 이용할 수 있도록 만들기 위한 커뮤니티 Dbpedia is a community effort to Extract structured information from Wikipedia Make this information available on the Web under an open license Interlink the DBpedia dataset with other open datasets on the Web DBpedia - A Crystallization Point for the Web of Data

4 The DBpedia Project DBpedia knowledge base
Currently describes more than 2.6 million entities - 198,000 persons - 328,000 places - 101,000 musical works - 34,000 films - 20,000 companies. The knowledge base contains 3.1 million links to external web pages and 4.9 million RDF links into other Web data sources. DBpedia - A Crystallization Point for the Web of Data

5 Linked Data 참고:

6 Linked Data Web Browsers Search Engines HTTP HTTP 참고:

7 Linked Data RDF stands for RDF는 Graph Model을 갖고 있다.
Resource : URI를 갖는 모든 것(웹페이지, 이미지, 동영상등) Description : 자원(Resource)들의 속성, 특성, 관계기술 Framework : 위의 것들을 기술하기 위한 모델, 언어, 문법 RDF는 Graph Model을 갖고 있다. 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data

8 Linked Data Graph Model 예시
Triple 형식표현 RDF Syntax SPARQL(Simple Protocol and RDF Query Language) W3C에서 만든 RDF 질의 언어 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data

9 Linked Data 1. Use URI(Uniform Resource Identifier)s as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful RDF Information 4. Include RDF statements that link to other URIs so that they can discover related things Tim Berners-Lee

10 Linked Data 1. Use URIs as names for things 2017-04-26
1. Use URIs as names for things 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data

11 Linked Data 2. Use HTTP URIs so that people can look up those names
Linked Data 2. Use HTTP URIs so that people can look up those names 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data

12 Linked Data 3. When someone looks up a URI, provide useful RDF Information 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data

13 Linked Data 4. Include RDF statements that link to other URIs so that they can discover related things 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data

14 Linked Data 2017-04-26 HongGilDong [residences] Seoul
[sameAs] resource/Seoul [nearbyFeatures] [researches] [age] SemanticWeb [name] [hasPhotoCollection] resource/Semantic_Web photos/Semantic_Web Hong, Gil Dong 35 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data

15 URI RDF SPARQL HTTP Linked Data 로 식별하고, Linking 하고, 로 표현하고, 로 질의하고,
Linked Data URI RDF SPARQL HTTP 로 식별하고, Linking 하고, 로 표현하고, 로 질의하고, 로 유통하고, SQL SPARQL 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data

16 Linked Data 2017-04-26 민간 정보 해외 정보 국가 공공정보 16 TopQuadrant Korea Inc.,
공간정보 여행정보 교통정보 부동산정보 문화재정보 문헌정보 토지정보 환경정보 XXX 정보 상품정보 일자리정보 단절된 국가 공공정보 공간정보 여행정보 교통정보 부동산정보 문화재정보 문헌정보 토지정보 환경정보 XXX 정보 상품정보 일자리정보 연결된 국가 공공정보 포털 및 언론 대학 기타 민간 정보 DBPedia BBC etc 해외 정보 여행정보 공간정보 문헌정보 환경정보 XXX정보 국가 공공정보 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 16 TopQuadrant Korea Inc.,

17 Wikipedia Content Domain specific Data Images Infoboxes Title
Description Languages Web Links Categorization DBpedia - A Crystallization Point for the Web of Data

18 The DBpedia Knowledge Extraction Framework(1/2)
Currently 19 extractors Labels(title,rdfs:label) Abstracts(first paragraph,rdfs:comment) Interlanguage links. Images. Redirects. Disambiguation(depedia:disambiguates) External links(dbpedia:reference) Page links(dbpedia:wikilink) Homepages(foaf:homepage) Geo-coordinates. Person data. PND. SKOS categories. Page ID. Revision ID. Category label. Article categories. Mappings. Infobox. Until March 2010, the DBpedia project was using a PHP-based extraction framework to extract different kinds of structured information from Wikipedia. This framework has been superseded by the new Scala-based extraction framework and the old PHP framework is not maintained anymore DBpedia - A Crystallization Point for the Web of Data

19 Two Work-Flows The DBpedia Knowledge Extraction Framework(2/2)
Dump-based extraction The Wikimedia Foundation publishes SQL dumps of all Wikipedia editions on a monthly basis The dump-based workflow uses the DatabaseWikipedia page collection as the source of article texts and the N-Triples serializer as the output destination. Live extraction Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) DBpedia - A Crystallization Point for the Web of Data

20 Infobox Extraction dbpedia:BBC p:network_name
„British Broadcasting Corporation (BBC)“ dbpedia:BBC p:country dbpedia:United_Kingdom dbpedia:BBC p:key_people dbpedia:Michael_Lyons dbpedia:Mark_Thompson DBpedia - A Crystallization Point for the Web of Data

21 The DBpedia Knowledge Base
Identifying Entities Resources are assigned a URI according to the pattern (where Name is taken from the URL of the source Wikipedia article, which has the form Classifying Entities DBpedia entities are classified within four classification schemata in order to fulfill different application requirements. - Wikipedia Categories - YAGO - UMBEL(Upper Mapping and Binding Exchange Layer) - DBpedia Ontology Describing Entities Every DBpedia entity is described by a set of general properties DBpedia - A Crystallization Point for the Web of Data

22 Linked Data SPARQL Endpoint RDF Dumps Lookup Index
Accessing the DBpedia Knowledge Base over the Web Linked Data DBpedia resource identifiers(ex: SPARQL Endpoint RDF Dumps Lookup Index DBpedia - A Crystallization Point for the Web of Data

23 Interlinked Web Content
Currently contains 4.9 million outgoing RDF links DBpedia - A Crystallization Point for the Web of Data

24 Applications facilitated by Dbpedia(1/3)
Browsing and Exploration DBpedia Mobile DBpedia - A Crystallization Point for the Web of Data

25 Applications facilitated by Dbpedia(2/3)
Querying and Search DBpedia Query Builder . DBpedia - A Crystallization Point for the Web of Data

26 Applications facilitated by Dbpedia(3/3)
Querying and Search Relationship Finder . DBpedia - A Crystallization Point for the Web of Data

27 Conclusions and Future Work
The resulting DBpedia knowledge base covers a wide range of different domains and connects entities across these domains. Future Work Cross-language infobox knowledge fusion - Derive an astonishingly detailed multi-domain knowledge base Wikipedia article augmentation - Develop a MediaWiki extension that augments Wikipedia articles with additional information as well as media items (pictures, audio) from these sources Wikipedia consistency checking - Improve the overall quality of Wikipedia DBpedia - A Crystallization Point for the Web of Data


Download ppt "DBpedia - A Crystallization Point"

Similar presentations


Ads by Google