Identification of Cancer-Specific Motifs in

Slides:

Advertisements

Similar presentations

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"

Advertisements

RNA-Seq based discovery and reconstruction of unannotated transcripts

Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.

HLAMatchmaker-Based Analysis of MICA Antibody Reactivity Patterns Rene J. Duquesnoy 1, Justin W. Mostecki 2, Jayasree Hariharan 2, Kristi Colacioppo 2,

Estimating the penetrances of breast and ovarian cancer in the carriers of BRCA1/2 mutations Silvano Presciuttini University of Pisa, Italy.

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.

Phage Display and its Applications Matt Brown Human Genetics Dr. Nancy Bachman.

Profile-profile alignment using hidden Markov models Wing Wong.

1 Nicholas Mancuso Department of Computer Science Georgia State University Joint work with Bassam Tork, GSU Pavel Skums, CDC Ion M ӑ ndoiu, UConn Alex.

Microarrays: Tools for Proteomics

Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.

Sense-Antisense Proteins Vision Lab Presentation Ruchir Shah April 16, 2003.

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

1 Masterseminar „A statistical framework for the diagnostic of meningioma cancer“ Chair for Bioinformatics, Saarland University Andreas Keller Supervised.

Supplementary material Figure S1. Cumulative histogram of the fitness of the pairwise alignments of random generated ESSs. In order to assess the statistical.

Using Random Peptide Phage Display Libraries for early Breast cancer detection Ekaterina Nenastyeva.

An in vitro selection technique using a peptide or protein genetically fused to the coat protein of a bacteriophage.

Multiple testing correction

AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.

The virochip (UCSF) is a spotted microarray. Hybridization of a clinical RNA (cDNA) sample can identify specific viral expression.

False Biomarker Discovery due to Reactivity of a Commercial ELISA for CUZD1 with Cancer Antigen CA125 I. Prassas, D. Brinc, S. Farkona, F. Leung, A. Dimitromanolakis,

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.

Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,

Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:

Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)

Autoantibody Reactivity to Tumor Associated Antigens as a Biomarker for Early Lung Cancer J Hung 1, M Jagen 1, AK Greenberg 1, E Tan 2, D Naidich 1, H.

R. Ray and K. Chen, department of Computer Science engineering  Abstract The proposed approach is a distortion-specific blind image quality assessment.

My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Scenario 6 Distinguishing different types of leukemia to target treatment.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark 1/31 Prediction of significant positions in biological sequences.

Adaptive randomization

Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.

Ranjit Ganta, Raj Acharya, Shruthi Prabhakara Department of Computer Science and Engineering, Penn State University DATA WAREHOUSE FOR BIO-GEO HEALTH CARE.

Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.

Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:

Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.

Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.

Cluster validation Integration ICES Bioinformatics.

Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.

Motif Search and RNA Structure Prediction Lesson 9.

Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.

Final Report (30% final score) Bin Liu, PhD, Associate Professor.

A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.

Dengue fever caused by dengue virus (DENV), a member of Flaviviridae leads to large global disease burden. Detection of immunoglobulin M (IgM) and nucleic.

What is phage display? An in vitro selection technique using a peptide or protein genetically fused to the coat protein of a bacteriophage.

1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter ： Zhao-Wei Luo Che-Jung Chang,Der-Chiang.

prostate cancer protein and mRNA biomarkers

* Potential use of HP and AMBP in urine for the screening of prostate cancer Sanja Kiprijanovska1, Selim Komina2, Gordana Petrusevska2,

Volume 63, Issue 2, Pages (February 2003)

Performance Comparison of CA125 and the Combination of the Other Serum Biomarkers for the Early Detection of the Ovarian Cancer Hye-Jeong Song1,3,

Membrane protein preparation Equipped with talent expert teams and rich experiences in this area, Creative Biolabs now provides state-of-the-art anti-membrane.

Antibody affinity maturation services Affinity maturation is the process to improve antibody affinity for an antigen. In vivo, natural affinity maturation.

Membrane protein preparation Equipped with talent expert teams and rich experiences in this area, Creative Biolabs now provides state-of-the-art anti-membrane.

Phage panning Creative Biolabs is one of the well-recognized experts who is professional in applying advanced phage display technologies for a broad range.

Volume 18, Issue 1, Pages (January 2017)

A novel counting algorithm to detect common fetal trisomies

Somatic promoters correlate with immunoediting signatures.

Oligonucleotide aptamer-targeted immune modulation.

Volume 18, Issue 1, Pages (January 2017)

Rachel L Winston, Joel M Gottesfeld Chemistry & Biology

Pan-Cancer Analysis of Mutation Hotspots in Protein Domains

Characterization of the Anti-BP180 Autoantibody Reactivity Profile and Epitope Mapping in Bullous Pemphigoid Patients1 Giovanni Di Zenzo, Fabiana Grosso,

Autoantibodies to Bullous Pemphigoid Antigen 180 Induce Dermal–Epidermal Separation in Cryosections of Human Skin Cassian Sitaru, Enno Schmidt, Steffen.

Analysis of GFP expression in gfp loss-of-function mutants.

Autoantigens in Vitiligo Identified by the Serological Selection of a Phage-Displayed Melanocyte cDNA Expression Library Elizabeth A. Waterman, David.

Fig. 1 General VirScan analysis of the human virome.

Presentation transcript:

Identification of Cancer-Specific Motifs in Mimotope Profiles of Serum Antibody Repertoire Ekaterina Nenastyeva¹, Yurij Ionov², Ion Mandoiu³, and Alex Zelikovsky¹ ¹ Department of Computer Science, Georgia State University, Atlanta, GA 30303; ² Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263; ³Department of Computer Science & Engineering, University of Connecticut, Storrs, CT 06269 ABSTRACT For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient’s own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. The whole proteome can be represented by a random peptide phage display library (RPPDL). This allows to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients’ and healthy donors’ global peptide profiles of antibody specificities. A global peptide profile generated by next generation sequencing contains the enormously large number of sequences. To decrease this number we used the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences. We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides then motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified. INTRODUCTION The panels of antibody reactivities can be used for detecting cancer with high sensitivity and specificity. For any antibody the peptide motif representing the best binder can be selected from the RPPDL (Figure 1). The profiles generated by next-gen sequencing following several iterative round of affinity selection and amplification in bacteria can consist of millions of peptide sequences. A significant fraction of these sequences are “parasitic” produced by nonspecific binding and preferential amplification in bacteria. The affinity selected sequences can be clustered into the groups of similar sequences with shared consensus motifs, while the “parasitic” sequences are usually represented by single representatives. To get rid of “parasitic” sequences we propose a novel motif identification method (CMIM) based on CAST clustering (Amir Ben-Dor et al.). METHODS 1. The experiment for generating mimotope profiles of serum antibody repertoire is outlined in the flowchart in Figure 2. The first step of the experiment was library enrichment, the second step was directly generating of mimotipe profiles and next-gen sequencing. 2. For motif finding we proposed the CAST-based motif identification method (CMIM) (Figure 3). A motif was defined as a group of peptides having common sequence pattern and their consensus sequence. RESULTS Data set 15 serum samples of the stage 0 and 1 breast cancer patients; 15 serum samples of the healthy donors. experiment was performed separately on each serum sample using the same enriched library. In average, for the experimental condition selected: total number of distinct peptides in one sample=18450 (σ=6205); count value (expression) of a sample = 407335 (σ = 252393). After CMIM in average: motifs per a control sample = 3000(1073), case sample = 3490(1315) size of a case motif = 7.1(1.8), size of a control motif = 6.8(1.3) peptides number of motifs consisting of 20 and more peptides = 154(71) and 131(53) for cases and controls respectively. Motif validation Pseudo mimotope profiles using two strategies: Random permutation of amino acides in a sample peptides - motifs in a random sample = 6639(1967); size = 4.2(0.7) peptides, largest motif =17, >95% motifs with size <=4. Random selection of peptides from existing samples In average we obtained - motifs in a random sample = 3890(34); size = 5.71(0.04) peptides; - Kruskal–Wallis test (William H Kruskal, W Allen Wallis) - the single sample motifs are significantly different from random sample motifs Cancer-specific motifs - motifs significantly prevalent in cases. If two samples share any consensus sequence, they share the corresponding motif. A motif was associated with cancer if probability of its appearance in cases against controls by chance was less than 0.05. case/control combinations for 15 cases and 15 controls: 5-0, 6-0,...,15-0, 6-1,...,15-1,8-2,...15-2,9-3,...15-3,10-4,...,15-4,11-5,...15-5,12-6,...,15-6,13-7,...,15-7,14-8,...,15-8,...,15-11; also combinations with probability less than 0.04,0.03,0.02 and 0.01; number of expected and observed motifs as well as False Discovery Rate adjustment are shown in Table 1: CONCLUSION AND FUTURE WORK We have shown that the amino-acid order is meaningful in mimotope motifs found by CMI. Also the single sample motifs are shown to be significantly different from motifs in peptides drawn from multiple samples. CMIM was applied to case-control data and identified numerous cancer specific motifs. Although no motif is statistically significant after adjusting to multiple testing, we have shown that the number of found motifs is much larger than expected and may therefore contain useful cancer markers. Identified peptides allow screening a large number of serum samples specifically for the reactivity against these peptides. In future we plan to modify our experimental part using not original RPPDL, but make it more specific using library enriched by cancer specific peptides. And we want to increase the number of samples to have significant statistical power. Next-gen sequencing Generating peptide mimotope profile of serum antibodies Initial phage library Pooled cancer serum 3X Incubation Amplification in E.coli Elution of antibody bound phage Incubation with individual sera Making DNA library from phage DNA F R D K c E P A Q V N Y L C W Phage envelop DNA Peptide coding sequence Figure 2. A scheme for generating mimotope profiles of serum antibody repertoire. Probability Observed Expected FDR <0.05 67 51.9 0.77 <0.04 27 20.5 0.76 <0.03 24 16.6 0.69 <0.02 10 8.1 0.81 <0.01 4 4.2 1.06 Figure 1. Any antigen can be substituted by a library of random peptides. ... Cluster A Cluster B >50% intersection CAST Distance b/w every 2 peptides <= threshold 7 positions with min entropy consensus Motif M Figure 3. A scheme of motif finding algorithm – CMIM.