Presentation is loading. Please wait.

Presentation is loading. Please wait.

Personalized genomics

Similar presentations


Presentation on theme: "Personalized genomics"— Presentation transcript:

1 Personalized genomics

2 Goal Input Output Genomic sequence (WGS) from family
Pedigree & affectedness Disease (standard ontology needed) Output Genes/mutations relevant to the disease.

3 Disease Gene/Mutation
Pedigree & Affectedness Genomic Sequence BAM VCF FASTQ GRCh37 Read Mapping known sites BAM prep SNV Calling SV Calling & Validation Merge VCFs dbSNP SeattleSeq Variant annotation HGMD Variant filtering Disease Gene/Mutation

4 Steps Sequencing Mapping BAM Prep Variant calling (SNV)
Variant Calling (SV) VCF manipulation/merging Variant annotation Variant filtering Disease gene association

5 1. Sequencing Platform Mode HiSeq/MiSeq PacBio Ion proton (life) CGI …
Whole Genome Exome RNA-Seq

6 2. Mapping Map short reads (FASTQ format) to a reference
Output a BAM file Mapping tools BWA Bowtie Custom Compute/disk intensive part of the pipeline. WGS file size: ~200Gb per sample.

7 3. BAM Prep Input: BAM file Output: BAM file Sorting BAM
Picard Tools Marking (PCR) Duplicates INDEL Re-alignment GATK Base-Q Covariates & Recalibration Compute intensive part of the pipeline

8 4. Variant Calling Input: multiple BAMs
Output: VCF (loci that differ from the reference) SNVs Broad’s GATK Caller SVs Custom pipelines needed Browsing variant calls Genome Savant Confirming variants via resequencing Compute intensive part of the pipeline. Integrating SVs and SNVs.

9 5. SV calling & validation
Extract FASTQ Bowtie BreakDancer CNVer Reprever Zygosity calling GQL+Genome Savant VCF merging and validation

10 Push-button pipeline or VM
BAM Extract FASTQ Bowtie unpaired BAM GATK GQL BreakDancer CNVer SNPs (VCF) SVs CNVs genome browser VCF to CSV script Known SNPs de novo SNPs ISCA Reprever Recombination Blocks Insertions True positive SNPs Zygosity calls Validation Filters Our Code Deleteriousness (Nitin) Red arrows indicate functional Interesting SVs Interesting SNPs VCF

11 6. Merging VCFs Given multiple VCF files, merge them (each column corresponds to an individual sample). Can be mostly done by VCFtools. Our goal would be to visualize problematic regions for manual validation, and design primers for confirmation automatically.

12 7. Variant annotation Input: variant calls (raw VCF)
Output: annotation of variants (annotated VCF) Coding Synonymous Splice-variant Regulatory ncRNA Annotating coding variation for deleteriousness SIFT Polyphen GERP SeattleSeq

13 GERP score

14 8. Variant Filtering Filtering tools: Input: VCF (annotated)
Output: set of relevant variants/genes Filters based on variant annotation deleterious: missense/nonsense/splice Filters based on inheritance patterns Disease model (recessive/dominant/compound het) Filtering tools: Gemini ( FamAnn (

15 9. Annotating genes Input: collection of genes with mutations.
Output: relevant diseases, functional information Basic Information Genecards Adding pathway Ingenuity Databases of Disease gene links HGMD OMIM ClinVar We are currently using an outdated version of HGMD, but can possibly do better, or just replace it with Step 9.

16 9. Identifying Disease genes
Automated machine learning approach to correlating genes with diseases Standard ontologies for diseases MeSH Disease Ontology Standard vocabulary for gene names ML approach (parse abstracts to make these connections)

17 Computational Resource Consumption
Disk/sample CPU/sample Read Mapping 800 Gb 320 h* BAM prep 150 Gb 140 h SNV & INDEL calling 20 Gb 540 h* SV & CNV calling 200 Gb 30h + 30h Merging VCF 1.5 Gb 1h Variant Annotation 1 h Variant Filtering - Disease/Gene Assoc. ? *amenable to multithread parallelization (up to a point when memory becomes bottleneck)

18 Gene Prioritization Variant annotation Variant filtering
Gene Disease connection

19 The HPO aims to act as a central resource to connect several genomics datasets with the diseasome.
The HPO aims to act as a central resource to connect several genomics datasets with the diseasome. Thus, the HPO can act as a scaffold for enabling the interoperability between molecular biology and human disease. For example, phenotypic abnormalities in genetically modified model organisms can be mapped to human disease phenotypes (2). Sebastian Köhler et al. Nucl. Acids Res. 2014;42:D966-D974 © The Author(s) Published by Oxford University Press.

20 Human Phenotype Ontology
10,000 terms describing human phenotypic abnormalities, (7300 human hereditary syndromes). 2741 genes used to create DAG (Disease Associated Genes) 3 independent sub-ontologies mode of inheritance onset and clinical course phenotypic abnormalities The phenotypic terms are cross-linked

21 Applications of HPO

22 Differential diagnosis using Phenomizer

23 Sequencing Whole genome sequencing Exome sequencing
Disease associated genome sequencing

24 Depth of coverage (exome or disease oriented sequencing)
At 20X coverage, what fraction of het variants will be called? 15% will be missed

25 Phenotypic interpretation of eXomes: PhenIX
Remove off-target and synonymous variants Test population frequency of other variants frequency score: max(0, exp(100*f)) These are known SNPs Scores from SIFT/Polyphen Most pathogenic score was taken Final variant score: pathogenic score X frequency score Clinical relevance score: semantic similarity between phenotypic abnormalities and 2741 genes. Average (clinical, variant)

26 Phenotypic interpretation of eXomes: PhenIX
Simulated mutation data from HGMD


Download ppt "Personalized genomics"

Similar presentations


Ads by Google