Figure 2. VIPUR training ROC and PR performance

Slides:



Advertisements
Similar presentations
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Advertisements

Figure 1. Schematic of the corA leader mRNA
Figure 1. Single-molecule analysis through microfluidic enrichment of rolling circle amplification products. (A) Sample preparation and target detection.
Fig. 1. Genomic structure of the csd gene in A
Figure 1. Unified models predicting gene regulation based on landscapes of gene-regulating factors. For each gene, position specific combinatorial patterns.
Figure 2. Number of cases required in a Mendelian randomization analysis with a binary outcome and a single instrumental variable for 80% power with a.
J Exp Bot. 2017;68(17): doi: /jxb/erx352
Figure 1. The scheme of RCAS
Figure 1. Overview of the Ritornello approach
Figure 1. AsCpf1 and LbCpf1-mediated gene editing in human cells
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Figure 3. Definition of the outside distance, Dout, as the sum of the distances from each of the two protein binding sites to their corresponding.
Figure 2. The graphic integration of CNAs with altered expression genes in lung AD and SCC. The red lines represent the amplification regions for CNA and.
Figure 1. A3B NTD mediates enzyme activity and oligomerization
From: DynaMIT: the dynamic motif integration toolkit
Figure 1. drb7.2 mutant plants display altered accumulation of endoIR-siRNA. Wild-type (Col-0) and drb7.2 mutant plants were subjected to high throughput.
Scheme 1. Rotation around the glycosidic bond leads to the syn-conformation, that in turn, stabilizes the 8-oxoG:A basepair mismatch. Biophysical properties,
Figure 2. Number of SNPs detected from empirical ddRAD-Seq analysis
From: Introducing the PRIDE Archive RESTful web services
Figure 1. PARP1 binds to abasic sites and DNA ends as a monomer
Figure 1: Diagnostic performance of mortality prediction indices MPI4 (dash-dotted line), MPI5 (dashed line), and MPI6 (continuous line) for identification.
Figure 2. Ritornello captures the FLD from single-end sequencing data
Figure 2. Sodium dodecylsulphate-polyacrylamide gel electrophoresis (SDS-PAGE) analysis of recombinant wild-type (WT) and mutant VSV L proteins. The WT.
Figure 1. The flow chart illustrates the construction process of anti-CRISPRdb, and the information that users can obtain from anti-CRISPRdb. From: Anti-CRISPRdb:
Figure 1. Complete work-flow of the Scasat
Figure 2. Workflow of MethMotif Batch Query
From: Dynamic protein–RNA interactions in mediating splicing catalysis
Figure 1. The number of unique PDB, UniProt and Pfam accessions represented in the MemProtMD database over time. A selection of landmark structures are.
Figure 3. Schematic representation of co-expression networks for BipA, SidD, and Erg11 encoding genes. Genes are represented by circles, with positive.
Figure 4. (A) Scatterplot of RPC4 T statistic (between TP0 and TP36) for the indicated groups of isolated tRNA genes (RPC4 peak only, n = 35; RPC4 + H3K4me3.
Figure 1. An example of a thioviridamide-like molecule, thioalbamide, and inset, a proposed biochemical route to ... Figure 1. An example of a thioviridamide-like.
Figure 1. Biased distribution of different SV classes
Figure 1. Overview of the workflow of NetworkAnalyst 3.0.
Figure 1. Effect of random T/A→dU/A substitutions on transcription by T7 RNAP using a 321 bp DNA transcription template ... Figure 1. Effect of random.
Figure 1. Domain organization of IMP1
Figure 2. Ectopic expression of PLZF modulates H3K27ac at enhancer regions in myeloid cells. (A) Box plots showing ... Figure 2. Ectopic expression of.
Figure 1. Schematic illustration of CSN and NDM construction and our statistic model. (A) CSN and NDM construction. (i) ... Figure 1. Schematic illustration.
Figure 1. Ratios of observed to expected numbers of exon boundaries aligning to boundaries of domain and disorder ... Figure 1. Ratios of observed to expected.
Figure 1. autoMLST workflow depicting placement and de novo mode
Figure 1. The 12 species in this study and details of the improved G4-seq method. (A) Phylogenetic representation of ... Figure 1. The 12 species in this.
Figure 1. Nanopore methylation calls are consistent with expected results and established technologies. (A) Metaplot of ... Figure 1. Nanopore methylation.
Figure 4. The mean of spermatocyte of various treatment groups
Figure 1. Chemical structures of DNA and tc-DNA
Figure 1. Analysis of human TRIM5α protein with Blast-Search and PhyML+SMS ‘One click’ workflow. (A) NGPhylogeny.fr ... Figure 1. Analysis of human TRIM5α.
Figure 1. Illustration of DGR systems and their prediction using myDGR
Figure 1. The pipeline of Aggrescan3D 2.0 server.
Figure 1. Summary of experimental conditions and data normalization
Figure 1. The workflow of Cistrome-GO
Figure 1. Prediction result for birch pollen allergen Bet v 1 (PDB: 1bv1), as obtained by comparison to the cherry ... Figure 1. Prediction result for.
Figure 1. Using Voronoi tessellation to define contacts
Figure 1. Designed cotranscriptional RNA structures
Figure 4. RLS spectra of (A) TMPipEOPP and (B) OMHEPzEOPP in the presence of different concentrations of KRAS. The RLS ... Figure 4. RLS spectra of (A)
Figure 1. PaintOmics 3 workflow diagram
Figure 1. The configuration menu allows the user to choose among the main visualization/processing options of the ... Figure 1. The configuration menu.
Figure 1. (A) Architecture of Doc2Hpo. (B) Interactive user interface
Figure 1. MERMAID web server interface (Start page, Parameter page): MERMAID provides two ways to submit a protein ... Figure 1. MERMAID web server interface.
Figure 1. The framework of NetGO with seven steps
Figure 1. Workflow of the HawkDock server that is divided into three major steps: (i) input of unbound or bound protein ... Figure 1. Workflow of the HawkDock.
Figure 1. Overview of features that can be assessed in a single RegulationSpotter VCF analysis run. Depending upon a ... Figure 1. Overview of features.
Fig. 1. —Synteny analysis of melon chromosome 1 (brown) and cucumber chromosome 7 (green) based on melon-cucumber ... Fig. 1. —Synteny analysis of melon.
Figure 1. SQL schema used by RetroRules
Figure 1 Genetic results. No case had more than one diagnostic result
Figure 1. The overlap between Ensembl/GENCODE, RefSeq and UniProtKB genes. The number of genes classified as coding in ... Figure 1. The overlap between.
Figure 1. DNA-guided RNA cleavage activity
Figure 1. GWAS Catalog associations for coronary artery disease plotted across all chromosomes. Associations added ... Figure 1. GWAS Catalog associations.
Figure 1. Crystal structures of Rim1
Fig. 1. —GO categories enriched in gene families showing high or low omega (dN/dS) values for Pneumocystis jirovecii. ... Fig. 1. —GO categories enriched.
Figure 1. Removal of the 2B subdomain activates Rep monomer unwinding
Figure 4. In vitro mo5U forming activity of TrmR
Figure 1. Unrooted phylogenetic trees for UL73 and UL74 based on amino acid sequences derived from 243 genome ... Figure 1. Unrooted phylogenetic trees.
Presentation transcript:

Figure 2. VIPUR training ROC and PR performance Figure 2. VIPUR training ROC and PR performance. ROC and PR curves for VIPUR and other popular methods. Curves (A,B) are averaged from 100 random splits (80% training, 20% testing) evaluated only on the leave-out testing sets. (A) Our combined classifier (black) has increased specificity compared to PROVEAN (green) with comparable sensitivity and higher AUROC than all other methods tested. PROVEAN and our sequence-only classifier have very similar AUROC but appear to emphasize sensitivity and specificity respectively. (B) VIPUR has notably increased AUPR over all other classifiers tested. Inclusion of the structure-based features improves classification (+2.5% accuracy) and dramatically improves ranking ability (.020 ΔAUPR). (C,D) We cannot directly compare performance of VIPUR to classifiers trained on the same variants (HumDiv) or restricted to predictions of human proteins. A VIPUR-like classifier was trained using 7935 variants from HumDiv and non-human proteins to compare performance with PolyPhen2 and PROVEAN on a set of 1542 human variants. The VIPUR-like classifier achieves higher AUROC (C) and AUPR (D) than both PolyPhen2 and PROVEAN. From: Robust classification of protein variation using structural modelling and large-scale data integration Nucleic Acids Res. 2016;44(6):2501-2513. doi:10.1093/nar/gkw120 Nucleic Acids Res | © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Figure 1. VIPUR analysis pipeline Figure 1. VIPUR analysis pipeline. Starting from a structural model of the native protein and a list of variants to be tested, VIPUR generates features using PSIBLAST and ROSETTA. Structure-based features are extracted from ROSETTA simulations comparing the native and variant protein structures. Variant structures are refined using the ddg_monomer protocol and the FastRelax protocol to consider a distribution of protein conformations. Features are combined in a logistic regression classifier that is trained on 9477 variants from over 360 species. VIPUR outputs the predicted label (deleterious or neutral), a confidence score, the top scoring 3D models of the variant protein structure and an automated interpretation of the variant effect derived from the weighted contributions of each feature to produce a physical description of protein disruption. From: Robust classification of protein variation using structural modelling and large-scale data integration Nucleic Acids Res. 2016;44(6):2501-2513. doi:10.1093/nar/gkw120 Nucleic Acids Res | © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Figure 3. VIPUR scores clearly identify pathogenic variants Figure 3. VIPUR scores clearly identify pathogenic variants. VIPUR predictions on ClinVar variants match expectations from their phenotype annotations. Left: Pathogenic variants have a skewed distribution of VIPUR deleterious scores (>.5) and have similar trends for PolyPhen2 and PROVEAN. Center: Benign variants have a broad distribution of VIPUR neutral scores (<.5), while PolyPhen2 pushes variants to high and low scores. We expect most genetic variations to be benign and unlikely to disrupt protein function, however, datasets like ClinVar have a high pathogenic label bias. Right: Predictions on ClinVar variants annotated with uncertain effect highlight VIPUR's ability to identify a small set of likely deleterious variants, while PolyPhen2's high false positive rate leads to an overwhelming number of high-confidence ‘probably damaging’ predictions. VIPUR's score distribution resembles the benign variants with a small set of confident deleterious predictions while PROVEAN scores are uniformally distributed. From: Robust classification of protein variation using structural modelling and large-scale data integration Nucleic Acids Res. 2016;44(6):2501-2513. doi:10.1093/nar/gkw120 Nucleic Acids Res | © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Figure 4. S204P disrupts a critical helix interface in IL6 Figure 4. S204P disrupts a critical helix interface in IL6. VIPUR predicts S204P is deleterious (.835), matching the UniProt annotation ‘87% loss of function’ and infers the deleterious label due to destabilized disulfide bond, while PROVEAN predicts S204P is neutral (−1.20 score). Every residue in IL6 is coloured by the difference in Rosetta energy between the native and variant protein structures, highlighting the destabilization introduced by S204P (top left). The PSSM generated by PSIBLAST does not indicate strong conservation for serine at position 204 (top right, PSSM columns shown for surrounding residues). The native S204 structure has a stable interface (bottom left, residues coloured by Rosetta energy of a representative model) but becomes destabilized in the P204 variant model (bottom right). Perturbing this helix could accommodate the proline destabilization, however, this strains the nearby C101–C111 disulfide bond (bottom right), leading to an accurate deleterious prediction. From: Robust classification of protein variation using structural modelling and large-scale data integration Nucleic Acids Res. 2016;44(6):2501-2513. doi:10.1093/nar/gkw120 Nucleic Acids Res | © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Figure 5. T168P destabilizes an active site loop in GCK Figure 5. T168P destabilizes an active site loop in GCK. T168P is predicted deleterious (.987) due to disrupted backbone interaction (hydrogen-bonding) and is also predicted deleterious confidently by PROVEAN (−5.82). Every residue in GCK is coloured by the difference in Rosetta energy between the native and variant protein structures, highlighting the high energy of P168 (top left). The PSSM generated by PSIBLAST indicates both the native and variant amino acids are not favoured at position 168 (top right, PSSM columns shown for surrounding residues). The native T168 forms a hydrogen bond to the substrate, D-glucose (bottom left, ligand position from PDB 3F9M, residues coloured by Rosetta energy of a representative model), which is absent in the P168 variant. Although interaction with D-glucose is not simulated during classification, VIPUR predicts proline is destabilizing due to disrupted backbone hydrogen bonding, suggesting other active sites can be accurately classified even without bound ligands. From: Robust classification of protein variation using structural modelling and large-scale data integration Nucleic Acids Res. 2016;44(6):2501-2513. doi:10.1093/nar/gkw120 Nucleic Acids Res | © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Figure 6. VIPUR deleterious predictions are enriched for autism-associated mutations. Predictions for various methods on the SSC, containing 1,335 de novo mutations found in children with autism-spectrum disorders (probands) and 891 de novo mutations found in unaffected siblings. For each prediction method, the distribution of confidence scores is shown for proband (red) and sibling (blue) mutations. The ratio of these counts for each score bin is shown along with the background expectation (dashed line, 1.50, 1,335/891). We expect high deleterious/damaging scores to be enriched for proband mutations and low scores to be enriched for mutations found in siblings. (A) VIPUR predicts most mutations in both sets have neutral effects while properly enriching high deleterious scores for proband mutations and de-enriching low scores for probands mutations. (B) PolyPhen2 effectively splits mutations into a high-confidence bin versus everything else, however, this top scoring bin is not strongly enriched for proband mutations. (C) SIFT scores are distributed similarly to PolyPhen2 with similar overall correlation, however, its fluctuations around the background expectation are different. (D) Using the simple BLOSUM62 score (negative scores are deleterious) yields an excellent enrichment for proband mutations, however, the scores are not truly continuous leading to fewer scores (smaller P-value). From: Robust classification of protein variation using structural modelling and large-scale data integration Nucleic Acids Res. 2016;44(6):2501-2513. doi:10.1093/nar/gkw120 Nucleic Acids Res | © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.