Presentation is loading. Please wait.

Presentation is loading. Please wait.

Helena F. Deus and Jonas S. Almeida

Similar presentations


Presentation on theme: "Helena F. Deus and Jonas S. Almeida "— Presentation transcript:

1 A method to propagate permissions in biomedical data using a semantic web framework
Helena F. Deus and Jonas S. Almeida The University of Texas M. D. Anderson Cancer Center

2 History of the web Web 1.0 Links -> Documents Web 2.0
Links -> Data Structures -> Web services Web 3.0 Links -> Web Services -> Links -> Web Services -> Links -> Web Services .…

3 Evolution of data representation
Nature Biotechnology Vol 23 Nr 29

4 Electronic Health Records
Data management in the life sciences Clinical/Medical data MDAxxxx Electronic Health Records RDBMS Life is good!

5 RDBMS Heterogeneous data management Core facilities data
Clinical/Medical data DNA Sequencing Microarrays RDBMS MDAxxxx Protein Arrays Data everywhere! Pulse Field Gel Electrophoresis

6 Semantic web of data: a set of best practices

7 A data pyramid W3C Wisdom Knowledge OWL, OBO RDF Information SPARQL
XML TEXT Data Files

8 S3DB Core Model

9 Snapshots of interfaces using S3DB’s API (Application Programming Interface). These applications exemplify why the semantic web designs can be particularly effective at enabling generic tools to assist users in exploring data documenting very specific and very complex relationships. Snapshot A was taken from S3DB’s web interface, which is included in the downloadable package. This interface was developed to assist in managing the database model and, therefore, is centered on the visualization and manipulation of the domain of discourse, its Collections of Items and Rules defining the documentation of their relations. The application depicted on snapshots B-D describe a document management tool S3DBdoc, freely available as a Bioinformatics Station module (see Figure 6). The navigation is performed starting from the Project (C), then to the Collection (B) and finally to the editing of the Statements about an Item (D). The snapshot B illustrates an intermediate step in the navigation where the list of Items (in this case samples assayed by tissue arrays, for which there is clinical information about the donor) is being trimmed according to the properties of a distant entity, Age at Diagnosis, which is a property of the Clinical Information Collection associated with the sample that originated the array results. This interaction would have been difficult and computationally intensive to manage using a relational architecture. The RDF formatted query result produced by the API was also visualized using a commercial tool, Sentient Knowledge Explorer (IO-Informatics Inc), shown in snapshot E, and by Welkin, F, developed by the digital inter-operability SIMILE project at the Massachusetts Institute of Technology. See text for discussion of graphic representations by these tools. To protect patient confidentiality some values in snapshots B and D are scrambled and numeric sample and patient identifiers elsewhere are altered. PLoS ONE Aug 13;3(8):e2946

10 Example: TCGA data structure

11 S3DB Rule Sample ?? Patient blood Tissue Patient Sample tumor S3DB Statement sampleX patientY R427

12 TCGA domain - instance PLoS ONE Dec;3(12):e4076

13 SPARQL

14 Code portability and distributed data
API API SPARQL API

15 Permission management
Markov Model

16 Permission propagation

17 Intermediate Ontologies Domain-Specific Ontologies
Experimental evolving ontologies Upper ontologies Intermediate Ontologies Domain-Specific Ontologies MGED and others Current entry level for computation Experimental, evolving Data Models Proposed entry level for computation Raw data

18

19 S3DB.ORG What is S3DB? What S3DB is not?
It is a web service that manages semantic web content distinguishing the domain of discourse from its instantiation. It was configured specifically for the needs of Biomedical Informatics projects where: Those who submit the data keep a fine tuned control over its access and use. The data model is deployed over a core ontology that allows its editing. It has a distributed deployment designed to deal with heterogeneous environments. What S3DB is not? It is not a client application. It is not a “work in progress”: a SPARQL endpoint assures that experimental data is not kept outside of the Linked Data Web until is matures

20 In Conclusion Dissolution of boundaries between data structures is a good thing… But doing it without losing the role of each data element is even better  Some level of explicit granularity in the data is necessary to implement a permission model.

21 Acknowledgements http://s3db.org Jonas S. Almeida Kadir Akdemir
Miriã Coelho Cintia Palú Pablo Freire The Integrative Bioinformatics Lab at the University of Texas MD Anderson Cancer Center (Houston, Tx) Instituto de Tecnologia Quimica e Biologica, Universidade Nova de Lisboa (Lisbon, Portugal)


Download ppt "Helena F. Deus and Jonas S. Almeida "

Similar presentations


Ads by Google