Understanding genes using mathematical tools Adam Sartiel COMPUGEN.

Slides:



Advertisements
Similar presentations
Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Control of Gene Expression
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Comparative ab initio prediction of gene structures using pair HMMs
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
MiRNA targets Using undergraduate molecular biology labs to discover targets of miRNAs in humans Adam Idica, Jordan Thompson, Irene Munk Pedersen, Pavan.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
RNA.
Transcription: Synthesizing RNA from DNA
Protein Synthesis.
MCB 7200: Molecular Biology
Fine Structure and Analysis of Eukaryotic Genes
Using DNA Subway in the Classroom Red Line Lesson Sketch.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
AP Biology Ch. 20 Biotechnology.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
Finish up array applications Move on to proteomics Protein microarrays.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Verna Vu & Timothy Abreo
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
MCB 720: Molecular Biology Biotechnology terminology Common hosts in biotechnology research Transcription & Translation Prokaryotic gene organization &
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
Sackler Medical School
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Mark D. Adams Dept. of Genetics 9/10/04
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
DNA Technology and Genomics
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
Canadian Bioinformatics Workshops
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
Human Genome Project.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Section 3: Gene Technologies in Detail
Introduction to Bioinformatics II
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Presentation transcript:

Understanding genes using mathematical tools Adam Sartiel COMPUGEN

2 Short History Of Compugen 1993: Founded 1994: First Bioccelerator sold (Merck) 1997: LEADS project initiated 1998: Pfizer collaboration 1999: USPTO agreement; LabOnWeb launched 2000: Launch of Z3; IPO 2001: Gencarta and OligoLibraries launched; Novartis collaboration

3 Unique R&D Team Substantial –120 professionals – 32 PhD/MD, 37 M.Sc. Multidisciplinary –Algorithm development, Molecular biology, Software engineering, Statistics, Physics, Chemistry Integrated –Synergy between disciplines and feedback

4 Gene analysis using mathematics Drug discovery and Bioinformatics Principles of sequence alignment The EST opportunity and the Transcriptome Applications (Gencarta and DNA chips)

5 Cellular pathways are highly complex - identified targets

6 $500M The Drug Development Process

7 Some definitions ‘Drug’ – protein, lipid, antibody, or small organic molecule which has proven effect and approved safety level. ‘Lead’ – A molecule in development which may one day become a drug ‘Target’ – A protein (in most cases) which activity a drug lead would affect, in order to create a desirable effect on the body. ‘Validated target’ – A target which has a proven, demonstrated effect on a disease or condition.

8 30,000 GENES? Fewer genes than initially thought? Some complexity due to alternative splicing Gene prediction is problematic Complex genes (interleaved, nested,...) are especially difficult to identify Both HGP and Celera tried to minimize false positives Conclusion: more genes may be found Wright et al., Genome Biology (7): There are 65,000 – 75,000 genes

9 ONE GENE  ONE PROTEIN??? Old Dogma Gene mRNA Protein Gene mRNA Protein Current understanding mRNA Protein Edited mRNA Modified protein Protein

10 Gene identification using sequence comparison

11 Similar sequences, common ancestor common ancestor, similar function Understand genes = know your targets

12 The genetic code is redundant

13 Proteins ‘see’ deeper Unrelated DNA sequences? Highly related proteins! TTACTCCGTCATGATGGGGUG CTGATAAGGAAAGAAGGCTAT LeuLeuArgHisAspGlyVal LeuIleArgLysGluGlyTyr

14 How to align proteins? MARQGEFPSILK M-RHGEFP-LLKWC ‘Good’ ‘Bad’ A good algorithm, vs databases, requires super-computers

15 Another direction: find genes by sequence ACGATCGAGCATGCATCATCAGCATCTAGCGATCAGCAGGCATCGAGCAGCTAGCATGCATG TGCTAGCACGTACGTAGTAGTCGTAGATCGCTAGTCGTCCGTAGCTCGTCGATCGTACGTCAC - Gene regions have different nucleotide composition than non-coding regions. - Intron and exons are distinct in sequences - Splice junctions are clearly detectable

16 Genomic DNA One step ahead: the story of the ESTs mRNA cDNA exon 1 exon 2exon 3 EST cDNA clone Public domain ESTs (Expressed Sequence Tags): > 5,000,000 Craig Venter

17 The ESTs: Rough Diamonds? Short, inaccurate, badly annotated Abundant with repeats, alternative splicing Too many… The shredder effect

18 Input: GenBank- a pool of ESTs and mRNAs Process 1-clustering Process 2- Assembly Output: The transcriptome USING ESTS TO GET THE TRANSCRIPTOME Cluster 1 Cluster 2 Cluster 3 Cluster 4

19 The Transcriptome - Definition “The mRNA collection content, present at any given moment in a cell or a tissue, and its behavior over time and cell states”

20 Introducing the Transcriptome The Genome: –Index to the range of possible proteins –Useful as map and for inter-organisms analysis The Proteome: –Describes what actually happens in the cell –Complex tools, partial results The Transcriptome: –“Golden path”: Proteome information in DNA technology.

21 Transcriptome applications Discovery of new proteins –Which are present in specific tissues –Which have specific cell locations –Which respond to specific cell states Discovery of new variants –Of important genes –Which work to increase/decrease the activity of the ‘native’ protein.

22 Example: Alternative Splicing One Gene - Multiple mRNAs Various Mature mRNA Transcripts Pre m RNA Alternative Splicing 3  4  (tissue A) (tissue B) (Other tissues)

23 Alternative Splicing vs. “Contiging” “Contiging”: “Assembling”: Contig impossible

24 Extreme example of alternative splicing Mature PSA PSA precursor PSA RNA Genomic Modified mRNA LM precursor Mature LM protein Stop codon Signal peptide Alternative splicing Though coded by the same gene, mature proteins PSA and LM have not one residue in common!

25 PSA genomic exon1 exon 2exon 3 exon 4 exon1exon 2exon 3 exon 4 exon 5 KLK-2 genomic LM KLM *Stop codon Is This The Only Example? * * **

26 Validation: Northern Blot Like PSA, LM expression is restricted to prostate tissue Multiple bands may reflect conserved regions or alternative splicing

27 Example: receptor with DN Dominant Negative

28 Natural Antisense – a regulation mechanism?

29 LEADS Antisense Prediction When analyzing EST data for Antisense: –Use original EST orientation annotation –Check splicing signals on both strands –Examine library description for enzymes used –Mark PolyA signals and PolyA tails (compare to genomic PolyA) –Take into account NotI sites

30 Example: A Putative SNP Cluster T07189 Position 347

31 Cluster T07189 Position 347 SNP Verification

32 Using Compugen’s Transcriptome Technology Large-scale collaborations: Pfizer, Novartis Co-development of molecules: TNF, Chemokine receptors, kinases, GPCRs Academia research: UCSF, NYU, TAU. Database products DNA chip design Mass-spec analysis Gene Ontology

33 Chip Design on Alternative Splicing Variant-specific or common probes can be designed

34 How many ‘genes’ are there really? Raw data: –3,770,969 human sequences –2,061,357 mouse sequences – 297,568 rat sequences Non-singleton ‘clusters’: 120,372 H, 63,043 M, 33,396 R % with splice variants: 26% (H), 32% (M), 23% (R) Homology (to SwissProt+Trembl, InterPro, other GC proteins): 20% (H+M), 27% (R). Total unique proteins: 236,797 (H), 106,119 (M), 32,352 (R)

35 The Novartis Agreement Signed August 2001 Novartis non-exclusively licensed the LEADS platform and related software, and plans to use it for: –In-silico drug target identification and prioritization –Genome wide chip design Agreement was signed after a detailed pilot study run in November 2000 –Discovered novel genes and splice variants using Incyte and Celera data Genes were subsequently verified in Novartis laboratory.

36 GENCARTA Result of LEADS applied to: –Public genome information –Published mRNA –ESTs In-house designed interface, Oracle- based infrastructure. Installed: Kyowa-Hakko, Avalon Pharma, Weizmann Institute, YU Version 2.2 out in October 2001.

37 Let’s go for the real thing… Gencarta Demonstration OligoLibrary Demonstration

38 Conclusion: Advantages of the Transcriptome Identify new drug targets Understand splice variant behavior Isolate “natural” drugs Annotate Proteomics experiments Design better DNA chips Solve the real bottlenecks in drug discovery and development

Understanding genes using mathematical tools Adam Sartiel COMPUGEN