Download presentation
1
Personalized genomics
2
Goal Input Output Genomic sequence (WGS) from family
Pedigree & affectedness Disease (standard ontology needed) Output Genes/mutations relevant to the disease.
3
Disease Gene/Mutation
Pedigree & Affectedness Genomic Sequence BAM VCF FASTQ GRCh37 Read Mapping known sites BAM prep SNV Calling SV Calling & Validation Merge VCFs dbSNP SeattleSeq Variant annotation HGMD Variant filtering Disease Gene/Mutation
4
Steps Sequencing Mapping BAM Prep Variant calling (SNV)
Variant Calling (SV) VCF manipulation/merging Variant annotation Variant filtering Disease gene association
5
1. Sequencing Platform Mode HiSeq/MiSeq PacBio Ion proton (life) CGI …
Whole Genome Exome RNA-Seq
6
2. Mapping Map short reads (FASTQ format) to a reference
Output a BAM file Mapping tools BWA Bowtie Custom Compute/disk intensive part of the pipeline. WGS file size: ~200Gb per sample.
7
3. BAM Prep Input: BAM file Output: BAM file Sorting BAM
Picard Tools Marking (PCR) Duplicates INDEL Re-alignment GATK Base-Q Covariates & Recalibration Compute intensive part of the pipeline
8
4. Variant Calling Input: multiple BAMs
Output: VCF (loci that differ from the reference) SNVs Broad’s GATK Caller SVs Custom pipelines needed Browsing variant calls Genome Savant Confirming variants via resequencing Compute intensive part of the pipeline. Integrating SVs and SNVs.
9
5. SV calling & validation
Extract FASTQ Bowtie BreakDancer CNVer Reprever Zygosity calling GQL+Genome Savant VCF merging and validation
10
Push-button pipeline or VM
BAM Extract FASTQ Bowtie unpaired BAM GATK GQL BreakDancer CNVer SNPs (VCF) SVs CNVs genome browser VCF to CSV script Known SNPs de novo SNPs ISCA Reprever Recombination Blocks Insertions True positive SNPs Zygosity calls Validation Filters Our Code Deleteriousness (Nitin) Red arrows indicate functional Interesting SVs Interesting SNPs VCF
11
6. Merging VCFs Given multiple VCF files, merge them (each column corresponds to an individual sample). Can be mostly done by VCFtools. Our goal would be to visualize problematic regions for manual validation, and design primers for confirmation automatically.
12
7. Variant annotation Input: variant calls (raw VCF)
Output: annotation of variants (annotated VCF) Coding Synonymous Splice-variant Regulatory ncRNA Annotating coding variation for deleteriousness SIFT Polyphen GERP SeattleSeq
13
GERP score
14
8. Variant Filtering Filtering tools: Input: VCF (annotated)
Output: set of relevant variants/genes Filters based on variant annotation deleterious: missense/nonsense/splice Filters based on inheritance patterns Disease model (recessive/dominant/compound het) Filtering tools: Gemini ( FamAnn (
15
9. Annotating genes Input: collection of genes with mutations.
Output: relevant diseases, functional information Basic Information Genecards Adding pathway Ingenuity Databases of Disease gene links HGMD OMIM ClinVar We are currently using an outdated version of HGMD, but can possibly do better, or just replace it with Step 9.
16
9. Identifying Disease genes
Automated machine learning approach to correlating genes with diseases Standard ontologies for diseases MeSH Disease Ontology Standard vocabulary for gene names ML approach (parse abstracts to make these connections)
17
Computational Resource Consumption
Disk/sample CPU/sample Read Mapping 800 Gb 320 h* BAM prep 150 Gb 140 h SNV & INDEL calling 20 Gb 540 h* SV & CNV calling 200 Gb 30h + 30h Merging VCF 1.5 Gb 1h Variant Annotation 1 h Variant Filtering - Disease/Gene Assoc. ? *amenable to multithread parallelization (up to a point when memory becomes bottleneck)
18
Gene Prioritization Variant annotation Variant filtering
Gene Disease connection
19
The HPO aims to act as a central resource to connect several genomics datasets with the diseasome.
The HPO aims to act as a central resource to connect several genomics datasets with the diseasome. Thus, the HPO can act as a scaffold for enabling the interoperability between molecular biology and human disease. For example, phenotypic abnormalities in genetically modified model organisms can be mapped to human disease phenotypes (2). Sebastian Köhler et al. Nucl. Acids Res. 2014;42:D966-D974 © The Author(s) Published by Oxford University Press.
20
Human Phenotype Ontology
10,000 terms describing human phenotypic abnormalities, (7300 human hereditary syndromes). 2741 genes used to create DAG (Disease Associated Genes) 3 independent sub-ontologies mode of inheritance onset and clinical course phenotypic abnormalities The phenotypic terms are cross-linked
21
Applications of HPO
22
Differential diagnosis using Phenomizer
23
Sequencing Whole genome sequencing Exome sequencing
Disease associated genome sequencing
24
Depth of coverage (exome or disease oriented sequencing)
At 20X coverage, what fraction of het variants will be called? 15% will be missed
25
Phenotypic interpretation of eXomes: PhenIX
Remove off-target and synonymous variants Test population frequency of other variants frequency score: max(0, exp(100*f)) These are known SNPs Scores from SIFT/Polyphen Most pathogenic score was taken Final variant score: pathogenic score X frequency score Clinical relevance score: semantic similarity between phenotypic abnormalities and 2741 genes. Average (clinical, variant)
26
Phenotypic interpretation of eXomes: PhenIX
Simulated mutation data from HGMD
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.