Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Evaluating Health-Care Disparity Employing Linked Data and Data -driven Discovery Amrapali Zaveri AKSW, Institut für Informatik 1.

Similar presentations


Presentation on theme: "1 Evaluating Health-Care Disparity Employing Linked Data and Data -driven Discovery Amrapali Zaveri AKSW, Institut für Informatik 1."— Presentation transcript:

1 1 Evaluating Health-Care Disparity Employing Linked Data and Data -driven Discovery Amrapali Zaveri AKSW, Institut für Informatik 1

2 2 Outline Motivation Methodology o Datasets o CSV to RDF Conversion o Interlinking using SILK o Validation by Linked Data Querying Conclusions Limitations Future Work

3 3 Motivation According to the World Health Organization (WHO), more than one billion people (i.e. one sixth of the world’s population) suffer from one or more neglected tropical diseases. This shows a significant imbalance between the research intensity invested for the investigation of certain diseases and their prevalence. Reason current absence of accurate, interlinked data and information

4 4 Methodology

5 5 Datasets DATASET LINKED DATA VERSION NUMBER OF TRIPLES ClinicalTrials.govLinkedCT9.8 million PubMedBio2RDF’s PubMed797 million WHO’s Global Health Observatory (GHO) Not yet available-

6 6 CSV to RDF Conversion WHO’s GHO dataset Published as Excel sheets Advantage Readable by humans Disadvantages Cannot be queried efficiently Difficult to integrate with other data (in different formats) Our approach Converting data into a single data model - RDF Using SCOVO (Statistical Core Vocabulary)* designed particularly to represent multidimensional statistical data using RDF. *Michael Hausenblas,et.al. Scovo: Using statistics on the web of data. In ESWC, 2009. 6

7 7 What is SCOVO?

8 8 Semi-automated approach Transforming CSV to RDF in a fully automated way is not feasible. Dimensions may often be encoded in heading or label of a sheet Our semi-automatic approach: As a plug-in in OntoWiki # a semantic collaboration platform developed by the AKSW research group. A CSV file is converted into RDF using SCOVO # Sören Auer et.al.: OntoWiki: A Tool for Social Semantic Collaboration In: Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge CKC 2007 at the 16th International WWW2007 Banff, Canada, 2007

9 9 SCOVOfied GHO Data prefix ex: prefix scv: ex:Country rdfs:subClassOf scv:Dimension; rdf:type rdfs:Class; dc:title "Country". ex:Disease rdfs:subClassOf scv:Dimension; rdf:type rdfs:Class; dc:title "Disease". ex:CountryCode rdfs:subClassOf scv:Dimension; rdf:type rdfs:Class; dc:title "CountryCode". ex: Afghanistan rdf:type ex:Country; dc:title "Afghanistan". ex:Tuberculosis rdf:type ex:Disease; dc:title "Tuberculosis". ex:3010 rdf:type ex:CountryCode; dc:title “3010”. ex:c1-r6 rdf:type scv:Item; rdf:value 127; scv:dimension ex:Afghanistan; scv:dimension ex:Tuberculosis. scv:dimension ex:3010 Result: 3 million triples

10 10 Interlinking Datasets using SILK

11 11 Interlinking Results Number of interlinks obtained between datasets Interlinks for: Publications - already present Disease - used SILK $ Country - used SILK $ $ Julius Volz, Christian Bizer, Martin Gaedke, Georgi Kobilarov: Discovering and Maintaining Links on the Web of Data. International Semantic Web Conference (ISWC2009), Westfields, USA, October 2009. http://www4.wiwiss.fu-berlin.de/bizer/silk/spec/

12 12 Validation by Linked Data Querying PREFIX who: PREFIX ct: PREFIX pubmed: SELECT DISTINCT ?disease ?incidence ?country WHERE { ?x who:country "India". ?x who:incidence ?incidence. ?x who:disease ?disease. FILTER(?incidence>70) } SELECT DISTINCT ?disease ?country ?noOfTrials WHERE { ?diseasewho:disease"Tuberculosis". ?yct:disease?disease. ?yct:noOfTrials ?noOfTrials. ?yct:country?country. } SELECT ?country COUNT(?reference) WHERE { ?disease who:disease "Tuberculosis". ?z ct:disease ?disease. ?z ct:country ?country. ?z pubmed:reference ?reference. }GROUP BY ?country

13 13 Conclusions Which disease has the highest percentage of health-care disparity with respect to the burden of disease and the clinical trials conducted in a particular country? As a research policy maker, which research area would it be most beneficial to allocate funds? Who are the key people doing most research for a particular disease? What has been the trend, overtime, for the health-care disparity for a particular region?

14 14 Limitations Information Quality Coverage Interlinking Quality Propagation of Errors

15 15 Future Work Improve Interlinking Interlinking with other relevant datasets Updating knowledge-base as new data is published Creating a user interface

16 16 Acknowledgements Research group Agile Knowledge Engineering & Semantic Web (AKSW): http://aksw.org Research on Research Group: http://researchonresearch.duhs.duke.edu/site


Download ppt "1 Evaluating Health-Care Disparity Employing Linked Data and Data -driven Discovery Amrapali Zaveri AKSW, Institut für Informatik 1."

Similar presentations


Ads by Google