DBpedia - A Crystallization Point for the Web of Data 2011.10.05 Junghee - Han
Outline The DBpedia Project Understanding Linked Data The DBpedia Knowledge Extraction Framework The DBpedia Knowledge Base Accessing the DBpedia Knowledge Base Applications facilitated by DBpedia DBpedia - A Crystallization Point for the Web of Data
The DBpedia Project DBpedia 위키피디아로부터 구조화된 정보를 추출하고, 이를 웹에서 이용할 수 있도록 만들기 위한 커뮤니티 Dbpedia is a community effort to Extract structured information from Wikipedia Make this information available on the Web under an open license Interlink the DBpedia dataset with other open datasets on the Web DBpedia - A Crystallization Point for the Web of Data
The DBpedia Project DBpedia knowledge base Currently describes more than 2.6 million entities - 198,000 persons - 328,000 places - 101,000 musical works - 34,000 films - 20,000 companies. The knowledge base contains 3.1 million links to external web pages and 4.9 million RDF links into other Web data sources. DBpedia - A Crystallization Point for the Web of Data
Linked Data 참고:
Linked Data Web Browsers Search Engines HTTP HTTP 참고:
Linked Data RDF stands for RDF는 Graph Model을 갖고 있다. Resource : URI를 갖는 모든 것(웹페이지, 이미지, 동영상등) Description : 자원(Resource)들의 속성, 특성, 관계기술 Framework : 위의 것들을 기술하기 위한 모델, 언어, 문법 RDF는 Graph Model을 갖고 있다. 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
Linked Data Graph Model 예시 Triple 형식표현 RDF Syntax SPARQL(Simple Protocol and RDF Query Language) W3C에서 만든 RDF 질의 언어 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
2017-04-26 Linked Data 1. Use URI(Uniform Resource Identifier)s as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful RDF Information 4. Include RDF statements that link to other URIs so that they can discover related things Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html
Linked Data 1. Use URIs as names for things 2017-04-26 http://bibleontology.com/page/Bilhah 1. Use URIs as names for things 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
Linked Data 2. Use HTTP URIs so that people can look up those names 2017-04-26 Linked Data http://bibleontology.com/page/Bilhah 2. Use HTTP URIs so that people can look up those names 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
2017-04-26 Linked Data http://bibleontology.com/page/Bilhah 3. When someone looks up a URI, provide useful RDF Information 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
2017-04-26 Linked Data http:// http://bibleontology.com/page/Bilhah 4. Include RDF statements that link to other URIs so that they can discover related things 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
Linked Data 2017-04-26 HongGilDong [residences] Seoul [sameAs] http://dbpedia.org/ resource/Seoul http://sws.geonames.org/1835848/ http://sws.geonames.org/1835848/nearby.rdf [nearbyFeatures] [researches] [age] SemanticWeb [name] [hasPhotoCollection] http://dbpedia.org/ resource/Semantic_Web http://www4.wiwiss.fu-berlin.de/flickrwrappr/ photos/Semantic_Web Hong, Gil Dong 35 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
URI RDF SPARQL HTTP Linked Data 로 식별하고, Linking 하고, 로 표현하고, 로 질의하고, 2017-04-26 Linked Data URI RDF SPARQL HTTP 로 식별하고, Linking 하고, 로 표현하고, 로 질의하고, 로 유통하고, SQL SPARQL 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
Linked Data 2017-04-26 민간 정보 해외 정보 국가 공공정보 16 TopQuadrant Korea Inc., 공간정보 여행정보 교통정보 부동산정보 문화재정보 문헌정보 토지정보 환경정보 XXX 정보 상품정보 일자리정보 단절된 국가 공공정보 공간정보 여행정보 교통정보 부동산정보 문화재정보 문헌정보 토지정보 환경정보 XXX 정보 상품정보 일자리정보 연결된 국가 공공정보 포털 및 언론 대학 기타 민간 정보 DBPedia BBC etc 해외 정보 여행정보 공간정보 문헌정보 환경정보 XXX정보 국가 공공정보 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 16 TopQuadrant Korea Inc.,
Wikipedia Content Domain specific Data Images Infoboxes Title Description Languages Web Links Categorization DBpedia - A Crystallization Point for the Web of Data
The DBpedia Knowledge Extraction Framework(1/2) Currently 19 extractors Labels(title,rdfs:label) Abstracts(first paragraph,rdfs:comment) Interlanguage links. Images. Redirects. Disambiguation(depedia:disambiguates) External links(dbpedia:reference) Page links(dbpedia:wikilink) Homepages(foaf:homepage) Geo-coordinates. Person data. PND. SKOS categories. Page ID. Revision ID. Category label. Article categories. Mappings. Infobox. Until March 2010, the DBpedia project was using a PHP-based extraction framework to extract different kinds of structured information from Wikipedia. This framework has been superseded by the new Scala-based extraction framework and the old PHP framework is not maintained anymore DBpedia - A Crystallization Point for the Web of Data
Two Work-Flows The DBpedia Knowledge Extraction Framework(2/2) Dump-based extraction The Wikimedia Foundation publishes SQL dumps of all Wikipedia editions on a monthly basis The dump-based workflow uses the DatabaseWikipedia page collection as the source of article texts and the N-Triples serializer as the output destination. Live extraction Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) DBpedia - A Crystallization Point for the Web of Data
Infobox Extraction dbpedia:BBC p:network_name „British Broadcasting Corporation (BBC)“ dbpedia:BBC p:country dbpedia:United_Kingdom dbpedia:BBC p:key_people dbpedia:Michael_Lyons dbpedia:Mark_Thompson DBpedia - A Crystallization Point for the Web of Data
The DBpedia Knowledge Base Identifying Entities Resources are assigned a URI according to the pattern http://dbpedia.org/resource/Name (where Name is taken from the URL of the source Wikipedia article, which has the form http://en.wikipedia.org/wiki/Name) Classifying Entities DBpedia entities are classified within four classification schemata in order to fulfill different application requirements. - Wikipedia Categories - YAGO - UMBEL(Upper Mapping and Binding Exchange Layer) - DBpedia Ontology Describing Entities Every DBpedia entity is described by a set of general properties DBpedia - A Crystallization Point for the Web of Data
Linked Data SPARQL Endpoint RDF Dumps Lookup Index Accessing the DBpedia Knowledge Base over the Web Linked Data DBpedia resource identifiers(ex: http://dbpedia.org/resource/Berlin) SPARQL Endpoint http://dbpedia.org/sparql RDF Dumps http://wiki.dbpedia.org/Downloads32 Lookup Index http://lookup.dbpedia.org/api/search.asmx DBpedia - A Crystallization Point for the Web of Data
Interlinked Web Content Currently contains 4.9 million outgoing RDF links DBpedia - A Crystallization Point for the Web of Data
Applications facilitated by Dbpedia(1/3) Browsing and Exploration DBpedia Mobile DBpedia - A Crystallization Point for the Web of Data
Applications facilitated by Dbpedia(2/3) Querying and Search DBpedia Query Builder . http://querybuilder.dbpedia.org DBpedia - A Crystallization Point for the Web of Data
Applications facilitated by Dbpedia(3/3) Querying and Search Relationship Finder . DBpedia - A Crystallization Point for the Web of Data
Conclusions and Future Work The resulting DBpedia knowledge base covers a wide range of different domains and connects entities across these domains. Future Work Cross-language infobox knowledge fusion - Derive an astonishingly detailed multi-domain knowledge base Wikipedia article augmentation - Develop a MediaWiki extension that augments Wikipedia articles with additional information as well as media items (pictures, audio) from these sources Wikipedia consistency checking - Improve the overall quality of Wikipedia DBpedia - A Crystallization Point for the Web of Data