Download presentation
Presentation is loading. Please wait.
Published byAriel Haynes Modified over 9 years ago
1
Using linked data to interpret tables Varish Mulwad, Tim Finin, Zareen Syed and Anupam Joshi University of Maryland, Baltimore County November 8, 2010 1
2
Interpreting a table NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 http://dbpedia.org/class/yago/Natio nalBasketballAssociationTeams http://dbpedia.org/resource/Allen_Iverson Map numbers as values of properties dbprop:team
3
Interpreting a table NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 @prefix dbpedia:. @prefix dbpedia-owl:. @prefix yago:. "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer. "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams. "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan. dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer. "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls. dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams. @prefix dbpedia:. @prefix dbpedia-owl:. @prefix yago:. "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer. "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams. "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan. dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer. "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls. dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams.
4
Use Cases NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 Intelligent querying over data Create a ‘Semantic’ knowledge-base
5
Use Cases NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 @prefix dbpedia:. @prefix dbpedia-owl:. @prefix yago:. "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer. "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams. "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan. dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer. "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls. dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams. @prefix dbpedia:. @prefix dbpedia-owl:. @prefix yago:. "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer. "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams. "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan. dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer. "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls. dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams. Data Integration Search / Query over tables NameTeamPositionHeight Michael JordanChicagoShooting guard1.98 Allen IversonPhiladelphiaPoint guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower forward2.11 Confirm/Verify existing knowledge Add new knowledge to the LOD cloud Convert legacy data into Semantic Web formats
6
Motivation and Related Work
8
We are laying a strong foundation for the Semantic Web … … but an old problem haunts us …
9
Chicken ? Egg ? … No Chicken ? ~ 14.1 billion tables, 154 million with high quality relational data (Cafarella et al. 2008) 305,632 Datasets available as CSV or spreadsheets on Data.gov (US) + 7 Other nations establishing open data Where is structured data ?
10
Automate the process We need systems that can generate data from existing sources Not practical for humans to encode all this into RDF manually
11
Related Work Database to Ontology mapping (Barrasa, scar Corcho, & Gmez-prez 2004), (Hu & Qu 2007), (Papapanagiotou et al. 2006), and (Lawrence 2004) Mapping Relational databases to RDF [W3C working group – RDB2RDF]
12
Related Work Mapping spreadsheets to RDF [RDF123, XLWrap] Practical and helpful systems but … – Require significant manual work – Do not generate linked data Interpreting web tables to answer complex search queries over the web tables (Limaye et al. 2010)
13
T2LD Framework Predict Class for Columns Linking the table cells Identify and Discover relations T2LD Framework
14
Predict Class for Columns Predict Class for Columns Linking the table cells Identify and Discover relations
15
Predicting Class Labels for column Team Chicago Philadelphia Houston San Antonio Class Instance Class for the column Class 1 Class 2 Class 3 Class 4
16
Knowledge Base Yago Wikitology 1 – A hybrid knowledge base where structured data meets unstructured data 1 – Wikitology was created as part of Zareen Syed’s Ph.D. dissertation
17
Querying the Knowledge–Base 1. Chicago Bulls 2. Chicago 3. Judy Chicago 1. Chicago Bulls 2. Chicago 3. Judy Chicago 1. Philadelphia 2. Philadelphia 76ers 3. Philadelphia (film) 1. Philadelphia 2. Philadelphia 76ers 3. Philadelphia (film) 1. Houston Rockets 2. Houston 3. Allan Houston 1. Houston Rockets 2. Houston 3. Allan Houston {dbpedia-owl:Place,dbpedia- owl:City,yago:WomenArtist,yago :LivingPeople,yago:NationalBask etballAssociationTeams } Types {dbpedia-owl:Place, dbpedia- owl:PopulatedPlace, dbpedia- owl:Film,yago:NationalBasketb allAssociationTeams …. ….. ….. } {……………………………………………… ……………. } Team Chicago Philadelphia Houston San Antonio
18
Scoring the classes Possible Classes for the column - dbpedia-owl:Place dbpedia-owl:City yago:WomenArtist yago:LivingPeople yago:NationalBasketballAssociationTeams dbpedia-owl:PopulatedPlace dbpedia-owl:Film … … Possible Classes for the column - dbpedia-owl:Place dbpedia-owl:City yago:WomenArtist yago:LivingPeople yago:NationalBasketballAssociationTeams dbpedia-owl:PopulatedPlace dbpedia-owl:Film … … [Chicago, dbpedia-owl:City] [Philadelphia, dbpedia-owl:City] [Houston, dbpedia-owl:City] …. [Chicago,dbpedia-owl:Film] [Philadelphia,dbpedia-owl:Film] … [Chicago, dbpedia-owl:City] [Philadelphia, dbpedia-owl:City] [Houston, dbpedia-owl:City] …. [Chicago,dbpedia-owl:Film] [Philadelphia,dbpedia-owl:Film] … E.g. Processing class – “Chicago,yago:NationalBasketballAssociationTeams” String Chicago: (R = 1) Chicago Bulls {yago:NationalBasketballAssociationTeams} [PR = 6] (R = 2) Chicago {dbpedia-owl:PopulatedPlace, dbpedia-owl:City} [PR = 5] (R = 3) Judy Chicago {yago:WomenArtist,yago:LivingPeople} [PR = 4] Score = w x ( 1 / R ) + (1 – w) x (Normalized Page Rank) [Chicago, yago:NationalBasketballAssociationTeams] = (0.25 x 1 / 1 ) + (0.75 x 6 / 7) = 0.892 E.g. Processing class – “Chicago,yago:NationalBasketballAssociationTeams” String Chicago: (R = 1) Chicago Bulls {yago:NationalBasketballAssociationTeams} [PR = 6] (R = 2) Chicago {dbpedia-owl:PopulatedPlace, dbpedia-owl:City} [PR = 5] (R = 3) Judy Chicago {yago:WomenArtist,yago:LivingPeople} [PR = 4] Score = w x ( 1 / R ) + (1 – w) x (Normalized Page Rank) [Chicago, yago:NationalBasketballAssociationTeams] = (0.25 x 1 / 1 ) + (0.75 x 6 / 7) = 0.892
19
T2LD Framework Predict Class for Columns Linking the table cells Linking the table cells Identify and Discover relations
20
Machine Learning based Approach Table Cell + Column Header + Row Data + Column Type Requery KB with predicted class labels as additional evidence Generate a feature vector for the top N results of the query Classifier ranks the entities within the set of possible results Select the highest ranked entity A second classifier decides whether to link or not Link to “NIL” Link to the top ranked instance
21
Learning to Rank We trained a SVM rank classifier which learnt to rank entities within a given set Feature Vector Similarity Measures Popularity Measures Levenshtein distance Dice Score Levenshtein distance Dice Score Wikitology Score PageRank Page Length Wikitology Score PageRank Page Length
22
“To Link or not to Link … ’’ A second SVM classifier Feature vector included the feature vector of the top ranked entity and additional two features – – The SVM rank score of the top ranked entity – The difference in scores between the top two ranked entities
23
T2LD Framework Predict Class for Columns Linking the table cells Identify and Discover relations Identify and Discover relations
24
Identify Relations Name Michael Jordan Allen Iverson Yao Ming Tim Duncan Team Chicago Philadelphia Houston San Antonio Rel ‘A’ Rel ‘A’, ‘C’ Rel ‘A’, ‘B’, ‘C’ Rel ‘A’, ‘B’
25
Relation between columns Michael Jordan - Chicago Allen Iverson - Philadelphia Yao Ming - Houston Michael Jordan - Chicago Allen Iverson - Philadelphia Yao Ming - Houston dbprop:team dbprop:draftTeam dbprop:team dbprop:draftTeam dbprop:team dbprop:team dbprop:draftTeam Candidate relations
26
Scoring the relations Michael Jordan - Chicago Allen Iverson – Philadelphia Yao Ming - Houston Michael Jordan - Chicago Allen Iverson – Philadelphia Yao Ming - Houston dbprop:team dbprop:team dbprop:draftTeam dbprop:team Candidates: dbprop:team dbprop:draftTeam Candidates: dbprop:team dbprop:draftTeam dbprop:draftTeam Score: 0 dbprop:draftTeam Score:1 dbprop:team Score:3
27
T2LD Framework Predict Class for Columns Linking the table cells Identify and Discover relations
28
Annotating web tables for the Semantic Web
29
Table as linked RDF @prefix rdfs:. @prefix dbpedia:. @prefix dbpedia-owl:. @prefix yago:. "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer. "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams. "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan. dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer. "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls. dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams. @prefix rdfs:. @prefix dbpedia:. @prefix dbpedia-owl:. @prefix yago:. "Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer. "Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams. "Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan. dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer. "Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls. dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams. “Team”@en is rdfs:label of dbpedia-owl:Team. “Team” is the common / human name for the class dbpedia-owl:Team “Team”@en is rdfs:label of dbpedia-owl:Team. “Team” is the common / human name for the class dbpedia-owl:Team dbpedia:Chicago_Bulls a yago:NationalBasketballAssociationTeams. dbpedia:Chicago_Bulls is a type (instance) yago:NationalBasketballAssociationTeams dbpedia:Chicago_Bulls a yago:NationalBasketballAssociationTeams. dbpedia:Chicago_Bulls is a type (instance) yago:NationalBasketballAssociationTeams
30
Results
31
Dataset summary Number of Tables15 Total Number of rows199 Total Number of columns56 (52) Total Number of entities639 (611) * The number in the brackets indicates # excluding columns that contained numbers
32
Dataset summary
34
Evaluation for class label predictions
35
Evaluation # 1 (MAP) Compared the system’s ranked list of labels against a human ranked list of labels Metric - Mean Average Precision (MAP) Commonly used in the Information Retrieval domain to compare two ranked sets
36
Evaluation # 1 (MAP) 80.76 % System Ranked: 1. Person 2. Politician 3. President Evaluator Ranked: 1. President 2. Politician 3. OfficeHolder
37
Evaluation # 2 (Recall) Recall > 0.6 (75 %) System Ranked: 1. Person 2. Politician 3. President Evaluator Ranked: 1. President 2. Politician 3. OfficeHolder
38
Evaluation # 3 (Correctness) Evaluated whether our predicted class labels were “fair and correct” Class label may not be the most accurate one, but may be correct. – E.g. dbpedia-owl:PopulatedPlace is not the most accurate, but still a correct label for column of cities Three human judges evaluated our predicted class labels
39
Evaluation # 3 (Correctness) A category-wise breakdown for class label correctness Overall Accuracy: 76.92 % Column – Nationality Prediction – MilitaryConflict Column – Birth Place Prediction – PopulatedPlace
40
Evaluation for linking table cells to entities
41
Category-wise accuracy for linking table cells Overall Accuracy: 66.12 %
42
Relation between columns Idea – Ask human evaluators to identify relations between columns in a given table Pilot Experiment – Asked three evaluators to annotate five random tables from our dataset Evaluators identified 20 relations Our accuracy – 5 out of 20 (25 % ) were correct
43
Conclusion and Future Work
44
Conclusion We have demonstrated that it is possible to develop a automated framework for converting tables & spreadsheets to linked data Extending and adapting this framework for Open government data Discovery of new relations between entities
45
References Cafarella, M. J., Halevy, A., Wang, D. Z., Wu, E., Zhang, Y., 2008. Webtables:exploring the power of tables on the web. Proc. VLDB Endow.1 (1), 538- 549. Barrasa, J., Corcho, O., Gomez-perez, A., 2004. R2o, an extensible and semantically based database-to-ontology mapping language. In Proceedings of the 2nd Workshop on Semantic Web and Databases(SWDB2004). Vol. 3372. pp. 1069- 1070. Hu, W., and Qu, Y. 2007. Discovering simple mappings between relational database schemas and ontologies. In Aberer, K.; Choi, K.-S.; Noy, N. F.; Allemang, D.; Lee, K.- I.; Nixon, L. J. B.; Golbeck, J.; Mika, P.; Maynard, D.; Mizoguchi, R.; Schreiber, G.;and Cudre-Mauroux, P., eds., ISWC/ASWC, volume 4825 of Lecture Notes in Computer Science, 225238. Springer. Papapanagiotou, P.; Katsiouli, P.; Tsetsos, V.; Anagnostopoulos, C.; and Hadjiefthymiades, S. 2006. Ronto: Relational to ontology schema matching. In AISSIGSEMIS BULLETIN.
46
Lawrence, E. D. R. 2004. Composing mappings between schemas using a reference ontology. In In Proceedings of International Conference on Ontologies, Databases and Application of Semantics (ODBASE), 783800. Springer Han, L.; Finin, T.; Parr, C.; Sachs, J.; and Joshi, A. 2008. RDF123: from Spreadsheets to RDF. In Seventh International Semantic Web Conference. Springer. Han, L., Finin, T., Yesha, Y., 2009. Finding semantic web ontology terms from words. In: Proceedings of the Eight International Semantic Web Conference. Springer. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: Proc. of the 36th Int'l Conference on Very Large Databases (VLDB). (2010) References
47
This work was supported by:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.