Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding genes using mathematical tools Adam Sartiel COMPUGEN.

Similar presentations


Presentation on theme: "Understanding genes using mathematical tools Adam Sartiel COMPUGEN."— Presentation transcript:

1 Understanding genes using mathematical tools Adam Sartiel COMPUGEN

2 2 Short History Of Compugen 1993: Founded 1994: First Bioccelerator sold (Merck) 1997: LEADS project initiated 1998: Pfizer collaboration 1999: USPTO agreement; LabOnWeb launched 2000: Launch of Z3; IPO 2001: Gencarta and OligoLibraries launched; Novartis collaboration

3 3 Unique R&D Team Substantial –120 professionals – 32 PhD/MD, 37 M.Sc. Multidisciplinary –Algorithm development, Molecular biology, Software engineering, Statistics, Physics, Chemistry Integrated –Synergy between disciplines and feedback

4 4 Gene analysis using mathematics Drug discovery and Bioinformatics Principles of sequence alignment The EST opportunity and the Transcriptome Applications (Gencarta and DNA chips)

5 5 Cellular pathways are highly complex - identified targets

6 6 $500M The Drug Development Process

7 7 Some definitions ‘Drug’ – protein, lipid, antibody, or small organic molecule which has proven effect and approved safety level. ‘Lead’ – A molecule in development which may one day become a drug ‘Target’ – A protein (in most cases) which activity a drug lead would affect, in order to create a desirable effect on the body. ‘Validated target’ – A target which has a proven, demonstrated effect on a disease or condition.

8 8 30,000 GENES? Fewer genes than initially thought? Some complexity due to alternative splicing Gene prediction is problematic Complex genes (interleaved, nested,...) are especially difficult to identify Both HGP and Celera tried to minimize false positives Conclusion: more genes may be found Wright et al., Genome Biology 2001 2(7): There are 65,000 – 75,000 genes

9 9 ONE GENE  ONE PROTEIN??? Old Dogma Gene mRNA Protein Gene mRNA Protein Current understanding mRNA Protein Edited mRNA Modified protein Protein

10 10 Gene identification using sequence comparison

11 11 Similar sequences, common ancestor...... common ancestor, similar function Understand genes = know your targets

12 12 The genetic code is redundant

13 13 Proteins ‘see’ deeper Unrelated DNA sequences? Highly related proteins! TTACTCCGTCATGATGGGGUG CTGATAAGGAAAGAAGGCTAT LeuLeuArgHisAspGlyVal LeuIleArgLysGluGlyTyr

14 14 How to align proteins? MARQGEFPSILK M-RHGEFP-LLKWC ‘Good’ ‘Bad’ A good algorithm, vs. 2001 databases, requires super-computers

15 15 Another direction: find genes by sequence ACGATCGAGCATGCATCATCAGCATCTAGCGATCAGCAGGCATCGAGCAGCTAGCATGCATG TGCTAGCACGTACGTAGTAGTCGTAGATCGCTAGTCGTCCGTAGCTCGTCGATCGTACGTCAC - Gene regions have different nucleotide composition than non-coding regions. - Intron and exons are distinct in sequences - Splice junctions are clearly detectable

16 16 Genomic DNA One step ahead: the story of the ESTs mRNA cDNA exon 1 exon 2exon 3 EST cDNA clone Public domain ESTs (Expressed Sequence Tags): > 5,000,000 Craig Venter

17 17 The ESTs: Rough Diamonds? Short, inaccurate, badly annotated Abundant with repeats, alternative splicing Too many… The shredder effect

18 18 Input: GenBank- a pool of ESTs and mRNAs Process 1-clustering Process 2- Assembly Output: The transcriptome USING ESTS TO GET THE TRANSCRIPTOME Cluster 1 Cluster 2 Cluster 3 Cluster 4

19 19 The Transcriptome - Definition “The mRNA collection content, present at any given moment in a cell or a tissue, and its behavior over time and cell states”

20 20 Introducing the Transcriptome The Genome: –Index to the range of possible proteins –Useful as map and for inter-organisms analysis The Proteome: –Describes what actually happens in the cell –Complex tools, partial results The Transcriptome: –“Golden path”: Proteome information in DNA technology.

21 21 Transcriptome applications Discovery of new proteins –Which are present in specific tissues –Which have specific cell locations –Which respond to specific cell states Discovery of new variants –Of important genes –Which work to increase/decrease the activity of the ‘native’ protein.

22 22 Example: Alternative Splicing One Gene - Multiple mRNAs 64521 Various Mature mRNA Transcripts 63521 643521 643521 Pre m RNA Alternative Splicing 3  4  (tissue A) (tissue B) (Other tissues)

23 23 Alternative Splicing vs. “Contiging” “Contiging”: “Assembling”: Contig impossible

24 24 Extreme example of alternative splicing Mature PSA PSA precursor PSA RNA Genomic Modified mRNA LM precursor Mature LM protein Stop codon Signal peptide Alternative splicing Though coded by the same gene, mature proteins PSA and LM have not one residue in common!

25 25 PSA genomic exon1 exon 2exon 3 exon 4 exon1exon 2exon 3 exon 4 exon 5 KLK-2 genomic LM KLM *Stop codon Is This The Only Example? * * **

26 26 Validation: Northern Blot Like PSA, LM expression is restricted to prostate tissue Multiple bands may reflect conserved regions or alternative splicing

27 27 Example: receptor with DN Dominant Negative

28 28 Natural Antisense – a regulation mechanism?

29 29 LEADS Antisense Prediction When analyzing EST data for Antisense: –Use original EST orientation annotation –Check splicing signals on both strands –Examine library description for enzymes used –Mark PolyA signals and PolyA tails (compare to genomic PolyA) –Take into account NotI sites

30 30 Example: A Putative SNP Cluster T07189 Position 347

31 31 Cluster T07189 Position 347 SNP Verification

32 32 Using Compugen’s Transcriptome Technology Large-scale collaborations: Pfizer, Novartis Co-development of molecules: TNF, Chemokine receptors, kinases, GPCRs Academia research: UCSF, NYU, TAU. Database products DNA chip design Mass-spec analysis Gene Ontology

33 33 Chip Design on Alternative Splicing Variant-specific or common probes can be designed

34 34 How many ‘genes’ are there really? Raw data: –3,770,969 human sequences –2,061,357 mouse sequences – 297,568 rat sequences Non-singleton ‘clusters’: 120,372 H, 63,043 M, 33,396 R % with splice variants: 26% (H), 32% (M), 23% (R) Homology (to SwissProt+Trembl, InterPro, other GC proteins): 20% (H+M), 27% (R). Total unique proteins: 236,797 (H), 106,119 (M), 32,352 (R)

35 35 The Novartis Agreement Signed August 2001 Novartis non-exclusively licensed the LEADS platform and related software, and plans to use it for: –In-silico drug target identification and prioritization –Genome wide chip design Agreement was signed after a detailed pilot study run in November 2000 –Discovered novel genes and splice variants using Incyte and Celera data Genes were subsequently verified in Novartis laboratory.

36 36 GENCARTA Result of LEADS applied to: –Public genome information –Published mRNA –ESTs In-house designed interface, Oracle- based infrastructure. Installed: Kyowa-Hakko, Avalon Pharma, Weizmann Institute, YU Version 2.2 out in October 2001.

37 37 Let’s go for the real thing… Gencarta Demonstration OligoLibrary Demonstration

38 38 Conclusion: Advantages of the Transcriptome Identify new drug targets Understand splice variant behavior Isolate “natural” drugs Annotate Proteomics experiments Design better DNA chips Solve the real bottlenecks in drug discovery and development

39 Understanding genes using mathematical tools Adam Sartiel COMPUGEN


Download ppt "Understanding genes using mathematical tools Adam Sartiel COMPUGEN."

Similar presentations


Ads by Google