BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data Pavan Kumar A Big Data Analytics Team C-DAC KP, Bengaluru
Outline Pharmacogenomics Biological Data Repositories Graph Databases (What and Why) Big Data Platform for Pharmacogenomcis Databases Neo4j & Pharmacogenomics Graph Database MapReduce : BLAST Web Application for querying and visualization
Pharmacogenomics Pharmacogenomics = Pharma + Gene + Omics Drug therapy consists of three major processes Pharmacokinetic process Pharmacodynamic process Therapeutic process
Pharmacogenomics Pharmacogenomics led us Personalized Medicine What and Why Personalized Medicine?
Pharmacogenomics : ADR ERRORS’s in Health Care Some Facts of ADR 1.6–41.4 % of patients undergo therapy prone to ADR’s $17–29 billion spent annually to preventable ADR In US, ADRs responsible for ~100,000 deaths annually
Pharmacogenomics : ADR Factors for ADR’s Genetic Factors Pharmacokinetics Pharmacodynamics SNPs (Single Nucleotide Polymorphism) Environmental Factors Tobacco, Alcohol, Pollution, Diet habits and so on Physiological Factors Age, Gender, Disease state, Pregnancy, Starvation, Microbial Composition and so on
Pharmacogenomics : Pharmacokinetics What the body does to drug. This is captured by actions like Movement of drug into the body, through the body and out of the body, which is referred as ABSORBTION, BIOAVAILABILTY, DISTRIBUTION, METABOLISM and EXCRETION
Pharmacogenomics : Pharmacodynamics What the drug does to body. This is captured by actions like Receptor Binding Post-receptor effects Chemical Interactions
Pharmacogenomics : SNPs Single Nucleotide Polymorphisms Most common way type of Genetic Variation among people
Pharmacogenomics : SNPs Diseases caused SNPs Autoimmune Diseases Genetic Diseases Cancers Neurodegenerative Disorders Cardiovascular Diseases Neuro-psychological Neuro-psychological Digestive Disorders Addiction Dependence Female-Specific Diseases
Pharmacogenomics : SNPs
Pharmacogenomics : Microbial Composition Microbes in our body makeup to 100 Trillion cells (10 fold the number of human cells) Image source: http://www.freegrab.net/Immune Digestive System Connection.htm
Protein structural variations Microbial Composition Pharmacogenomics Pharmacogenomics Finally… Protein structural variations SNPs Metabolomics Gene Expression Environmental factors: Chemicals, Diet, Tobacco, Alcohol etc Physiological factors: Age, Gender, Disease state, Pregnancy, Circadian rhythm, Starvation Microbial Composition
How We Study ? Data related to different domains are stored as Open Data Repositories Download the data Data Format : XML, CSV or Excel Query a database via web application
Biological Data Repositories Following are some of Pharmacogenomics Databases PharmGKB – Pharmacogenomics Knowledge Base DrugBank - chemical, pharmaceutical and pharmacological data IGVdb - Indian Genome Variation Database CTD - Comparative Toxicogenomics Database STITCH (Search Tool for Interactive Chemicals) – Chemical Protein Interaction Networks TTD - Therapeutic Target Database KEGG (Kyoto Encyclopaedia of Genes and Genomes)
Integration of Biological Data Repositories Data is spread across many repositories. User has to navigate many pages on the web or across many websites. So there is a need to integrate all the data to get consolidated information on place
Interlinked Biological Data Databases Consortiums Tools Information from Articles, Literature Pasha and Scaria etal 2013 Omics for personalized medicine
Integrating Databases Integrating many databases based on Internationalized Resource Identifiers (IRI) Sample for SCN5A(Sodium channel protein type 5 subunit alpha) Database Gene Organism Len Interacting Chemical Disease/ Disorder Pathway/s Chrom_Start Chrom_End Uniprot SCN5A Human 2016 CTD SCN5A sodium arsenite Atrial Fibrillation Developmental Biology PharmGKB SCN5A 38564558 38666167 Database Gene Organism Len Interacting Chemical Disease/ Disorder Pathway/s Chrom_Start Chrom_End My_DB SCN5A Human 2016 sodium arsenite Atrial Fibrillation Developmental Biology 38564558 38666167
NoSQL database family
Graph Databases Graph Databases are NoSQL databases Family. Pictorial representation of data in the form of Nodes and Edges (with or without properties) Image Source : https://www.3pillarglobal.com/insights/exploring-the-different-types-of-nosql-databases
Why Graph Databases? Ref : https://neo4j.com/use-cases/ Graph Databases are well suited for interconnected data. Some of the use cases of Graph Databases Fraud Detection Graph-Based Search Network and IT Operations Real-Time Recommendations Engines Social Network Identity and Access Managements Ref : https://neo4j.com/use-cases/
Graph Databases : Properties Two important properties of graph databases technologies Native Graph Storage Some serialize to RDMS Native Graph Processing (a.k.a “index-free adjacency”) Connected nodes physically “point” to each other
Graph Databases
Graph Database : Neo4j Most of the Biological data is interconnected, Graph databases are well suited. World’s Leading Graph Database : Open Source and Welcoming UI Native graph storage with Native GPE(Graph Processing Engine) Easy to represent connected data Faster to retrieve/traversal/navigation of more Connected data Represents Semi-structured data
Graph Database : Neo4j In Neo4j, Cypher Query Language (CQL) is used to create nodes, labels, edges and properties Example:
Pharmacogenomics Graph Database
Pharmacogenomics Graph Database
Pharmacogenomics Graph Database
Pharmacogenomics Graph Database
Pharmacogenomics Graph Database
BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data Tools and Technologies:
BDPGx - A Big Data Platform for Graph-based Pharmacogenomics Data BDPGx has 4 107 474 Nodes 3 994 226 Properties 46 840 614 Relationships 15 Relationship types
Conclusion Biological data is generated from various sources and available in different formats Finding correlations among the available data can give better insights BDPGx User-friendly access to get most appropriate information to the researcher
THANK YOU