Download presentation
Presentation is loading. Please wait.
Published byEstella Joseph Modified over 9 years ago
1
KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge database Jan-Eric Litton Karolinska Institutet, Stockholm Sweden
2
KAROLINSKA INSTITUTET Sharing data ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S-> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
3
KAROLINSKA INSTITUTET Principle discovered around 400 BC Limited use until machine tools made mass production possible (18th cent.) Every machine shop and foundry made unique sizes and thread dimensions 1841: Joseph Whitworth presented “The Uniform System of Screw- Threads” to Britain’s Institute of Civil Engineers 1864: William Sellers proposes “On a Uniform System of Screw Threads” to the Franklin Institute, Philadelphia Enabled interchangeable parts and tooling for mechanization and mass production 1945: British and American standards merged A historical essay: The Machine Screw
4
KAROLINSKA INSTITUTET Point-to-point integration of data Application includes subprogram to each different data source Operations on data must be processed by an application –Lots of coding efforts –Fully dependent of data resources Merge results
5
KAROLINSKA INSTITUTET Data Warehouse Data are loaded in the database Data need filtering, cleaning, transformation Data must be refreshed –Scripts must be written –Timeconsuming to refresh data –Up-to-date data can not be guaranteed ODBC - JDBC
6
KAROLINSKA INSTITUTET Federated data Data stay untouched –Integrates heterogeneous local or remote data sources through wrappers Just need to know what data should be available to whom and how to access them It makes all data look like it is one virtual database hiding the data layer complexity ODBC – JDBC and more
7
KAROLINSKA INSTITUTET Ontologies Controlled vocabulary means only one controlled term is used for a given concept Data Model: – Data structuring mechanism in which an ontology is expressed
8
KAROLINSKA INSTITUTET Data model
9
KAROLINSKA INSTITUTET World Wide Biobanking.se.us id=1.ca The National Board of Health and Welfare ISO-code 3166 Sweden=752 124 840
10
KAROLINSKA INSTITUTET World Wide Biobanking Communication with other biobanks XML
11
KAROLINSKA INSTITUTET Sample identification 752-08-123456789-4 2D Matrix code for DNA storage at normalized concentration SE KI Biobank # Sample ID
12
KAROLINSKA INSTITUTET International Working Group on Knowledge Curation And Information Technology P3Gdb Knowledgebase on Phenotypes, Genetic Analysis Methods, and Policies related to Biobanks and Population Genetics Research IT coreData Entry core P 3 G Knowledge Database Knowledge Curation and Information Technology
13
KAROLINSKA INSTITUTET The advantages of integrating databases in different aspects of Biobanks as public resources. 1.The first requirement that has to be fulfilled to enable biobank communication is a unique identity for each biobank 2.Second, a common nomenclature is needed in order to communicate between biobanks. P 3 G Knowledge Database Knowledge Curation and Information Technology
14
KAROLINSKA INSTITUTET P 3 G Knowledge Database The potential impact of integrating will be: Promote communication within and between major biobanking initiatives thereby helping to overcome existing fragmentation of population genomic research. Enhance the effective sharing and synthesis of information, thereby addressing the need for very large sample sizes and helping to promote collaborative international genetic epidemiological and clinical research. Avoid the expensive mistakes and inefficiencies that can arise when individual initiatives repeatedly “re-invent the wheel”, thereby saving funders and researchers a lot of time and money
15
KAROLINSKA INSTITUTET WG 1: Nomenclature WG 2:Sample handling WG 3:Biobank information WG 4:Phenotype data WG 5:Genotype data WG 6:Data modeling WG 7:Database Integration WG 8:Security WG 9:Output and analysis WG 10:Documentation P 3 G Knowledge Database Knowledge Curation and Information Technology The Road Map:
16
KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map: Phenotype Describe data format naming conventions P3G data format standard (Start with GenomEUtwin documents) Describe relations between the entities Describe entities and their attributes Sync genotype data Questionnaires (validation) Clinical measures Laboratory phenotypes
17
KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map: Data modeling Conceptual data modeling using UML (Unified Modeling Language) Build conceptual harmonized data model for genotype and phenotype data Sequence variation standardization Provide standardized data transfer format Tracking of samples XML and OWL for future use
18
KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map: Sampling handling Sample collection Sample identification Data collection Structure and standardization of data Quality control procedures Ethical and legal aspects
19
KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map:
20
KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities
21
KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities
22
KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities
23
KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities
24
KAROLINSKA INSTITUTET P 3 G Knowledge Database Physical entities
25
KAROLINSKA INSTITUTET P 3 G Knowledge Database Donor entities
26
KAROLINSKA INSTITUTET P 3 G Knowledge Database Sampling entities
27
KAROLINSKA INSTITUTET P 3 G Knowledge Database The road map: Using models which remain stable as the technological landscape changes around them - Model Driven Architecture
28
KAROLINSKA INSTITUTET 1: Nomenclature 2:Sample handling Biobank information 3:Phenotype data Genotype data Data modeling 4:Database Integration Security 5: Ethics, governance, policy, socio- demographic P 3 G Knowledge Database Knowledge Curation and Information Technology The Road Map: Starting point
29
KAROLINSKA INSTITUTET Name IWG-leaders Name Cores Now, open a KDB members area under www.p3gconsortium.org, to start the knowledge database www.p3gconsortium.org, IWG-KDB meeting late spring 2005 Coordinate with other activities P 3 G Knowledge Database Knowledge Curation and Information Technology The Road Map: Starting point
30
KAROLINSKA INSTITUTET jan-eric.litton@meb.ki.se Isabel.fortier@Mail.mcgill.ca Mdeschenes@p3gconsortium.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.