KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management Tenth Edition
Advertisements

3/5/2009Computer systems1 Analyzing System Using Data Dictionaries Computer System: 1. Data Dictionary 2. Data Dictionary Categories 3. Creating Data Dictionary.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
7M701 1 Software Engineering Object-oriented Design Sommerville, Ian (2001) Software Engineering, 6 th edition: Chapter 12 )
Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
The MetaDater Model and the formation of a GRID for the support of social research John Kallas Greek Social Data Bank National Center for Social Research.
1 Middleware for In silico Biology Phillip Lord
Send correspondence to: Bartha Maria Knoppers Chair Centre de recherche en Droit Public Université de Montréal 3101, Chemin.
Developing an Ontology-based Metadata Management System for Heterogeneous Clinical Databases By Quddus Chong Winter 2002.
February 10, 2004 Overview Adrian Pop  Programming Environments Laboratory  Linköping University  
Kamran Munir, M. Odeh, R. McClatchey
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Migrating to the Semantic Web: Bioinformatics as a case study.
Systems Engineering Foundations of Software Systems Integration Peter Denno, Allison Barnard Feeney Manufacturing Engineering Laboratory National Institute.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Common Data Elements and Metadata: Their Roles in Integrating Public Health Surveillance and Information Systems Ron Fichtner, Chief, Prevention Informatics.
Margaret Heritage, CRESST Raymond Yeagley, NWEA. National Forum on Education Statistics  Mission: improve the quality, usefulness, timeliness, and comparability.
E-Science NorthWest Jon MacLaren Monday 18 th to Friday 22 nd October 2004 GridPrimer Training Course University of Manchester GridPrimer An Introduction.
Metadata Tools and Methods Chris Nelson Metanet Conference 2 April 2001.
By: Md Rezaul Huda Reza 5Ps for SE Process Project Product People Problem.
Bsubt.embl complete entry in EMBL format (DNA and Features) bsubt.embl.Z bsubt.fasta complete DNA sequence in Fasta format bsubt.fasta.Z bsubt.con construct.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
High level Knowledge-based Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK myGrid project
KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approach February 7-8, 2005, Atlanta; GA Karolinska Institutet.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.
GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester
Standards and Ontologies to Enable Discovery Data and Information Integration Robin McEntire GlaxoSmithKline 19 Nov, 2002.
The Grid as Future Scientific Infrastructure Ian Foster Argonne National Laboratory University of Chicago Globus Alliance
Information System Development Courses Figure: ISD Course Structure.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
Send correspondence to: Bartha Maria Knoppers Chair of Interim Board Centre de recherche en Droit Public Université de Montréal.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Design Concepts By Deepika Chaudhary.
Creating a European entity Management Architecture for eGovernment CUB - corvinus.hu Id Réka Vas
Semantic Mediation in myGrid Chris Wroe Manchester University.
Barriers and Tools to the Present and Future of Population Genetics Pr Bartha Maria Knoppers Canada Research Chair in Law and Medicine HGM 2006.
High level Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK Robin McEntire, GSK.
MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.
Enabling complex queries to drug information sources through functional composition Olivier Bodenreider Lister Hill National Center for Biomedical Communications.
Public Population Projects in Genomics International Working Groups Working Meeting September th, 2005, Hinxton, UK.
Rachel Liao, PhD Coordinator of the Clinical Working Group and the BRCA Challenge demonstration project for the Global Alliance for Genomics and Health.
Metamodeling and Modeling language for Systems Biology SB-UML Magali ROUX-ROUQUIE CNRS, Paris.
CSCE 315 – Programming Studio Spring Goal: Reuse and Sharing Many times we would like to reuse the same process or data for different purpose Want.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
Barry Weiss 1/4/ Jet Propulsion Laboratory, California Institute of Technology Quality Elements in ISO Metadata Design for Proposed SMAP Data.
1 EMBL Outstation — The European Bioinformatics Institute Mus musculus - a model organism in SWISS-PROT.
The Human Genome Project
ArrayExpress Ugis Sarkans EMBL - EBI
Of 24 lecture 11: ontology – mediation, merging & aligning.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Web 3.0 – challenge or opportunity for accountants? Clive Holtham Cass Business School
Retention period(s) of samples/data Many jurisdictions, as well as hospitals and institutions, mandate retention periods for medical or research-related.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Enhancements to Galaxy for delivering on NIH Commons
Katy Wolstencroft University of Manchester
Building Enterprise Applications Using Visual Studio®
Civil Registration Process: Place, Time, Cost, Late Registration
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Project Information Management Jiwei Ma
Ontology-Based Approaches to Data Integration
Metadata The metadata contains
M-H Pinard-van der Laan
Presentation transcript:

KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge database Jan-Eric Litton Karolinska Institutet, Stockholm Sweden

KAROLINSKA INSTITUTET Sharing data ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC ) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE BINDS PEP (BY SIMILARITY). FT CONFLICT S-> A (IN REF. 3). SQ SEQUENCE 429 AA; MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

KAROLINSKA INSTITUTET Principle discovered around 400 BC Limited use until machine tools made mass production possible (18th cent.) Every machine shop and foundry made unique sizes and thread dimensions 1841: Joseph Whitworth presented “The Uniform System of Screw- Threads” to Britain’s Institute of Civil Engineers 1864: William Sellers proposes “On a Uniform System of Screw Threads” to the Franklin Institute, Philadelphia Enabled interchangeable parts and tooling for mechanization and mass production 1945: British and American standards merged A historical essay: The Machine Screw

KAROLINSKA INSTITUTET Point-to-point integration of data Application includes subprogram to each different data source Operations on data must be processed by an application –Lots of coding efforts –Fully dependent of data resources Merge results

KAROLINSKA INSTITUTET Data Warehouse Data are loaded in the database Data need filtering, cleaning, transformation Data must be refreshed –Scripts must be written –Timeconsuming to refresh data –Up-to-date data can not be guaranteed ODBC - JDBC

KAROLINSKA INSTITUTET Federated data Data stay untouched –Integrates heterogeneous local or remote data sources through wrappers Just need to know what data should be available to whom and how to access them It makes all data look like it is one virtual database hiding the data layer complexity ODBC – JDBC and more

KAROLINSKA INSTITUTET Ontologies Controlled vocabulary means only one controlled term is used for a given concept Data Model: – Data structuring mechanism in which an ontology is expressed

KAROLINSKA INSTITUTET Data model

KAROLINSKA INSTITUTET World Wide Biobanking.se.us id=1.ca The National Board of Health and Welfare ISO-code 3166 Sweden=

KAROLINSKA INSTITUTET World Wide Biobanking Communication with other biobanks XML

KAROLINSKA INSTITUTET Sample identification D Matrix code for DNA storage at normalized concentration SE KI Biobank # Sample ID

KAROLINSKA INSTITUTET International Working Group on Knowledge Curation And Information Technology P3Gdb Knowledgebase on Phenotypes, Genetic Analysis Methods, and Policies related to Biobanks and Population Genetics Research IT coreData Entry core P 3 G Knowledge Database Knowledge Curation and Information Technology

KAROLINSKA INSTITUTET The advantages of integrating databases in different aspects of Biobanks as public resources. 1.The first requirement that has to be fulfilled to enable biobank communication is a unique identity for each biobank 2.Second, a common nomenclature is needed in order to communicate between biobanks. P 3 G Knowledge Database Knowledge Curation and Information Technology

KAROLINSKA INSTITUTET P 3 G Knowledge Database The potential impact of integrating will be: Promote communication within and between major biobanking initiatives thereby helping to overcome existing fragmentation of population genomic research. Enhance the effective sharing and synthesis of information, thereby addressing the need for very large sample sizes and helping to promote collaborative international genetic epidemiological and clinical research. Avoid the expensive mistakes and inefficiencies that can arise when individual initiatives repeatedly “re-invent the wheel”, thereby saving funders and researchers a lot of time and money

KAROLINSKA INSTITUTET WG 1: Nomenclature WG 2:Sample handling WG 3:Biobank information WG 4:Phenotype data WG 5:Genotype data WG 6:Data modeling WG 7:Database Integration WG 8:Security WG 9:Output and analysis WG 10:Documentation P 3 G Knowledge Database Knowledge Curation and Information Technology The Road Map:

KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map: Phenotype Describe data format naming conventions P3G data format standard (Start with GenomEUtwin documents) Describe relations between the entities Describe entities and their attributes Sync genotype data Questionnaires (validation) Clinical measures Laboratory phenotypes

KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map: Data modeling Conceptual data modeling using UML (Unified Modeling Language) Build conceptual harmonized data model for genotype and phenotype data Sequence variation standardization Provide standardized data transfer format Tracking of samples XML and OWL for future use

KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map: Sampling handling Sample collection Sample identification Data collection Structure and standardization of data Quality control procedures Ethical and legal aspects

KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map:

KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities

KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities

KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities

KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities

KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities

KAROLINSKA INSTITUTET P 3 G Knowledge Database Donor entities

KAROLINSKA INSTITUTET P 3 G Knowledge Database Sampling entities

KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map: Using models which remain stable as the technological landscape changes around them - Model Driven Architecture

KAROLINSKA INSTITUTET 1: Nomenclature 2:Sample handling Biobank information 3:Phenotype data Genotype data Data modeling 4:Database Integration Security 5: Ethics, governance, policy, socio- demographic P 3 G Knowledge Database Knowledge Curation and Information Technology The Road Map: Starting point

KAROLINSKA INSTITUTET Name IWG-leaders Name Cores Now, open a KDB members area under to start the knowledge database IWG-KDB meeting late spring 2005 Coordinate with other activities P 3 G Knowledge Database Knowledge Curation and Information Technology The Road Map: Starting point

KAROLINSKA INSTITUTET