Proteogenomic Novelty in 105 TCGA Breast Tumors

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

Proteogenomic Novelty in 105 TCGA Breast Tumors
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Big Data & the CPTAC Data Portal Nathan Edwards, Peter McGarvey Mauricio Oberti, Ratna Thangudu Shuang Cai, Karen Ketchum Georgetown University & ESAC.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular.
TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping
Kelly Ruggles, Ph.D. Proteomics Informatics Week 9
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Annotating genomes using proteomics data Andy Jones Department of Preclinical Veterinary Science.
Kelly Ruggles, Ph.D. Proteomics Informatics March 31, 2015
Identifier mapping: where do I go? Q5S007 ENSG ?
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
Novel Peptide Identification using ESTs and Sequence Database Compression Nathan Edwards Center for Bioinformatics and Computational Biology University.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Spectral Counting. 2 Definition The total number of identified peptide sequences (peptide spectrum matches) for the protein, including those redundantly.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
AMANDA MORRIS UMBERTO NAPOLETANO JESSE ROBINSON DR. AUDREY SHOR DR. NIKETA PATEL Alternative Splicing of the PKCδ Gene.
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
New Tools Samifier: A tool which converts results from protein tandem mass spectrometry into SAM format. This enables co-visualization of genomics, transcriptomics,
New data and tools at TAIR (The Arabidopsis Information Resource)
Karl Clauser Proteomics and Biomarker Discovery Bioinformatics of Phosphopeptide Identification, Phosphosite Localization, and iTRAQ Quantitation in Phosphoproteomics.
Karl Clauser Proteomics and Biomarker Discovery Breast Cancer Proteomics and the use of TCGA Mutational Data - Broad Institute update/issues Karl Clauser.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University.
Supplementary Figure 2A. A. ZMYM6-variant missing Exon 2 C. ZMYM6-variant missing Exon 4 B. ZMYM6-variant missing Exon 5 D. ZMYM6-variant missing Exons.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Standards for proteomics: The HUPO Proteomics Standards Initiative (HUPO PSI) Public Repository for Mass spectrometry spectral.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Molecular characterization of the DYX1C1 gene and its application as a cancer biomarker Heui-Soo Kim 1, Yun-Ji Kim 1, Jae-Won Huh 1,2, Dae-Soo Kim 1,3,
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Research about Alternative Splicing recently 楊佳熒.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Clinical Proteomic Tumor Analysis Consortium: Ontology Considerations
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Considerations for multi-omics data integration Michael Tress CNIO,
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Algorithms and Computation: Bottom-Up Data Analysis Workflows
Connecting Cancer Genomics to Cancer Biology using Proteomics
Clinical trial matching.
Fig. 8. Recurrent copy number amplification of BRD4 gene was observed across common cancers. Recurrent copy number amplification of BRD4 gene was observed.
A Long Noncoding RNA Signature That Predicts Pathological Complete Remission Rate Sensitively in Neoadjuvant Treatment of Breast Cancer  Gen Wang, Xiaosong.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Proteomics Informatics David Fenyő
Volume 5, Issue 4, Pages e5 (October 2017)
Development and Validation of a Template-Independent Next-Generation Sequencing Assay for Detecting Low-Level Resistance-Associated Variants of Hepatitis.
Volume 58, Issue 4, Pages (May 2015)
Working with RNA-Seq Data
Complementary identification and novel protein discovery
Schematic representation of proteogenomic annotation strategy.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Basic Local Alignment Search Tool
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Proteomics Informatics David Fenyő
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Subtype classification of breast functional screening results.
Presentation transcript:

Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer Research Center Washington University New York University CPTAC Data Jamboree November 12, 2013 National Institutes of Health Bethesda, Maryland

Tumor-specific protein databases for MS/MS-spectra searches Kelly Ruggles, David Fenyo, NYU

Preliminary novel findings Proteogenomic mapping: Genetic alterations can be observed on protein level (81 tumors) | work in progress Preliminary novel findings | work in progress Low confidence thresholds applied to Genome calls Variants: >2 QUAL phred-scaled quality score in ALT Alternative splices: >1 read This document http://www.1000genomes.org/node/101 defines the quality value as: "QUAL phred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong). If ALT is ”.” (no variant) then this is -10log_10 p(variant), and if ALT is not ”.” this is -10log_10 p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. (Numeric)” Low confidence thresholds applied to Genome calls High confidence thresholds applied to Proteome calls (<1% FDR) 0.7-2.5% of alternative splice junctions and single AA variants observable by proteomics mRNA may not be translated or at low abundance Proteome coverage is incomplete

Global proteome and phosphoproteome discovery workflow for TCGA breast tumors 1 mg total protein per tumor Internal reference: equal representation of basal, Her2 and Luminal A/B subtypes

Serial Search Strategy with Personalized Databases 11,636,317 leftover spectra Concatenated FASTA files, 102 patients Altered proteins only Removed redundant entries > Refseq Protein – Variant Patient 1 SIGNALINGPATHWAHREGULATOR >Canonical Protein – Variant Patient 2 SIKNALINGPATHWAYREGULATOR Variants: 132,181 > Refseq Protein – Alternate splice Patient 1 SIGNALINGREGULATOR >Canonical Protein – Alternate splice Patient 2 SIGNALINGPATHREGULATOR Alternate Spliceforms: 67,035 Low confidence thresholds applied to Genome calls Variants: >2 QUAL score (phred-scaled) Alternative splices: >1 read >Refseq Protein SIGNALINGPATHWAYREGULATOR 19,673,636 Spectra (81 patients) (27 iTRAQ experiments) (25 LC-MS/MS runs / experiment) RefSeq-Human-37: 32,800 8,037,319 Spectra Matched (41% of total) (1% FDR) Can combined FDR be calculated? Can search engine retain speed by skipping unchanged peptides? 3028 Variants Matched (N Spectra) (2294 proteins) 279 Splice Junctions Matched (y Spectra)

Single AA Variants may be Somatic in Some Patients, Germline in Others Genomic Highly Interesting, should correlate with prognosis and/or subtype. May correlate with prognosis? Might as well be canonical isoforms? Detectable, but too rare to indicate biology. Proteomic G&S mix genomic variants have the highest observation rate by Proteomics. Genomic variants present in only a single patient are observable by Proteomics

Not all Germline &Somatic mix Single AA Variants are “Essentially” Germline Genomic Proteomic Is G&S mix status primarily an artifact of variant calling accuracy/sensitivity? Is there some cancer biology involved for high S/G ratio variants? Are patients with germline form more cancer prone? Does somatic form correlate with prognosis, development of drug-resistance?

1 experiment: 3 individual patients + 1 Common control (40 patients) 155/279 Alternative Splice Junctions were observed in >1 Proteomics Experiment 279 Alternative Splice Junctions observed in 27 proteomics experiments (iTRAQ 4-plex) 1 experiment: 3 individual patients + 1 Common control (40 patients)

Wide Range of Somatic Single AA Variants/Patient Low confidence thresholds applied to calls Variants: >2 QUAL score (phred-scaled) Alternative splices: >1 read

Frequency of Single AA Variants and Alternative Splices Across Patients Somatic variants are less frequent than germline variants Some germline variants are very common Rare germline variants present in the reference sequence (RefSeq) Some alternative splice forms are very common Should be in RefSeq very common

Next steps: Analyze data from all tumors (81/105 so far) Examine “other” category Fusion genes (junction-spanning) Novel exon Novel gene Frame shift Novel splicing (junction-spanning) Analyze phosphoproteomics data Use updated output of Genomic analysis pipeline Employ more thorough FDR calculation for PSM’s Single-pass search of all spectra against concatenated database Reference proteome, Variants, Alternate splice forms, “Other”

Acknowledgments Broad Institute/FHCRC Steve Carr Karl Clauser Michael Gillette Jana Qiao Philipp Mertins DR Mani Eric Kuhn Sue Abbatiello Amanda Paulovich Pei Wang Sean Wang Ping Yan Washington U./MD Anderson/NYU Sherri Davies Matthew Ellis David Fenyo Kelly Ruggles Reid Townsend Li Ding NCI Staff Emily Boja Mehdi Mesri Rob Rivers Chris Kinsinger Henry Rodriguez Funding National Cancer Institute