Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Function Project We thank for support: Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise Corporate collaborators.

Similar presentations


Presentation on theme: "Genome Function Project We thank for support: Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise Corporate collaborators."— Presentation transcript:

1 Genome Function Project We thank for support: Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise Corporate collaborators & sponsors: Affymetrix, GTC, Mosaic, Aventis, Dupont UCSC George Church 24 Aug 2001

2 gcggatttagctcagt tgggag agcgc cagact gaaga tttgga ggtcc tgtgtt cgatc cacagaattcgcacca Post- Structural Genomics Data

3 Post-300 Genome Sequences 0.5 to 7 Mbp10 Mbp to 1000 Gbp figure

4 DNA RNA Protein Metabolites Growth rate Expression Interactions Environment Function Genomics Measures & Models

5 Exponential technologies 1993 first browser 1994 commercial www

6 Agenda 1. mapping human variation (haplotype map) 2. obtaining a complete and validated set of human genes including - multiple alleles, transcripts, protein or structural RNA products - regulatory elements 3. understanding the diversity of life through genomic analysis of many organisms, and understanding how one organism works by comparative genomics with others - how genomes evolved 4. creating a new quantitative systems biology, beyond drawing circles and arrows on paper and labeling them with names nobody can remember - mapping the key interactions - mathematical/computational models of pathways and systems - dealing with multiple levels from atoms to cells

7 In vitro minigenome Steve Blackwell, HMS: pure IF, EF Tony Forster, BWH: tRNAs & modified bases Manz Ehrenberg, Dieter Soll : tRNA-synthetases Josh LaBaer, HMS-HIP: Expression constructs Jingdong Tian, HMS: Protein synthesis Rob Mitra & Xiaohua Huang, HMS: Polymerases, RCA Gloria Culver, Iowa State: ribosomal proteins & rRNA Harry Noller, UCSC: ribosomes

8 In vitro minigenome A) From atoms to evolving minigenomes and cells. This could improve in vitro transcription/translation/replication systems and conceptually link atomic (mutational) changes via molecular and systems modeling to population evolution. The synthesis of pure systems of proteins with natural or novel modifications would be or great significance. This could give an incredible focus to structural genomics. B) From cells to tissues. Modeling the effects of combinations of membrane signals and genome-programming on RNA and protein expression profiles, would allow, among other things, manipulating stem- cell fate and stability. Stability would be key to both cell culture and to long-term avoidance of cancerous stem-cell proliferation. The ability of "programmed" cells to replace or augment small molecule drugs could be rigorously assessed. C) From tissues to systems Computational programming of cell and tissue morphology can develop quantitative concepts in complexity, chaos, robustness, evolvability to engineer useful models such as sensor-effector neural feedback systems where macro aspects of the system determine the past (Darwinian) or future (prosthetic) function of the altered genomes.

9 Grand Challenges: goals (& details) The Manhattan Project ’43-45: Nuclear chain reaction (without igniting the atmosphere) The Apollo Project ’62-69: Send a person to the moon (& back) The Smallpox Eradication ’66-77: from the whole globe (including freezers) The Human Genome Project ’90-05: 3 billion bases (at 99.99% accuracy & searchable)

10 Grand Challenges: goals (& details) The Manhattan Project ’43-45: Nuclear chain reaction (without igniting the atmosphere) The Apollo Project ’62-69: Send a person to the moon (& back) The Smallpox Eradication ’66-77: from the whole globe (including military freezers?) The Human Genome Project ’90-05: 3 billion bases (at 99.99% accuracy with comparisons) The BioSystems Project ’02- ??

11 Potential BioSystems Project Challenges Programming smart biomaterials 1. 0.1 nanometer positioning at 1kHz in a 50nm cube (Foresight Feynman Challenge) 2. I/O to sub-nano memory in DNA Programming cells & populations: 3. 10 sec. mini-cell cycle, 85kbp genome 4. Bioremediation microbial populations Programming ourselves: 5. Drug structure-activity prioritization 6. Universal, non-aging human stem cells

12 Potential BioSystems Project Challenges Programming smart biomaterials 1. 0.1 nanometer positioning at 1kHz in a 50nm cube (Foresight Feynman Challenge) 2. I/O to sub-nano memory in DNA Programming cells & populations: 3. 10 sec. mini-cell cycle, 85kbp genome 4. Bioremediation microbial populations Programming ourselves: 5. Drug structure-activity prioritization 6. Universal, non-aging human stem cells

13 Why the genome project worked Hood’75-00, Hunkapiller’77-00, Carruthers’79... Polymer synthesis & sequencing Shotgun & mapping Sanger’77, Brenner’72-02, Sulston’90, Olson’80-00... Ulam ’61-74, Staden’79, Lipman’87, Myers’87, Green’93... Sequence searching Tabor’93, Karger’94, Mathies’96, Mullis’84... Chemistry Infrastructure Wada’82, DeLisi’84, Gilbert’87, Watson’88, Venter’91...

14 Automate Data Model Similarity quality quality search X-ray 1960 resolution  |o-c|/  o DALI,etc. diffraction < 0.2nm R < 0.2 Sequence 1988 discrepancy conserved BLAST bp <0.01% proteins Metrics for structural & functional data Expression 1999 cc, t-test shared motifs, Biclustering shared function Interact/growth outliers optimality as above?

15 Types of Systems Interaction Models Quantum Electrodynamicssubatomic Quantum mechanicselectron clouds Molecular mechanicsspherical atoms nm-fs Master equationsstochastic single molecules Fokker-Planck approx.stochastic Macroscopic rates ODEConcentration & time (C,t) Flux Balance OptimadC ik /dt optimal steady state Thermodynamic modelsdC ik /dt = 0 k reversible reactions Steady State  dC ik /dt = 0 (sum k reactions) Metabolic Control Analysisd(dC ik /dt)/dC j (i = chem.species) Spatially inhomogenousdCi/dx Population dynamics as above km-yr Increasing scope, decreasing resolution

16 Capillary electrophoresis $300,000 (DNA Sequencing) : 0.4Mb/day Chromatography-Mass Spectrometry (eg. peptide LC-ESI-MS) : 20Mb/day Microarray scanners (eg. RNA) : 300 Mb/day mpg mpg Reagent costs: mpg mpg Electrophoresis (DNA Sequencing) : 10 ul per 0.5 Kb Microarray reactions: 10 ul per 1000 Kb Intel cmos microscope $99 Sources of Data for BioSystems Modeling:

17 RNA quantitation Aach, Rindone, Church, (2000) Genome Research 10: 431-445. Microarrays 1 Affymetrix 2 SAGE 3 experiment control R/G ratios R, G values quality indicators ORF PM MM Averaged PM-MM “presence” feature statistics 25-mers Counts of SAGE 14-mers sequence tags for each ORF ORF SAGE Tag concatamers 1 DeRisi, et.al., Science 278:680-686 (1997) 2 Lockhart, et.al., Nat Biotech 14:1675-1680 (1996) 3 Velculescu, et.al, Serial Analysis of Gene Expression, Science 270:484-487 (1995)

18 Array opportunities 22 bp ds-RNAi array modulates single cell type Drug array time-release or photo-release Primer pair arrays for haplotyping Gene & genome synthesis (DARPA)

19 Polypeptide arrays Photo-deprotect peptides (Affymax) Piezo or contact spotting (Harvard-CGR, Stanford) Phage or ribosome display capture (Bulyk) In situ ribosomal synthesis (Tian) Harvard Inst. Proteomics, FLEXGene consortium

20

21 A’ B B B B B B A Single Molecule From Library B B A’ 1st Round of PCR Primer is Extended by Polymerase B A’ B Primer A has 5’ immobilizing (Acrydite) modification.

22 1. Remove 1 strand of DNA. 2. Hybridize Universal Primer. 3. Add Red (Cy3) dTTP. BB’ 3’5’ A G T.. T 4. Wash; Scan Red Channel Sequence polonies by sequential, fluorescent single-base extensions BB’ 3’5’ G C G..

23 5. Add Green (FITC) dCTP 6. Wash; Scan Green Channel BB’ 3’5’ A G T. T Sequence polonies by sequential, fluorescent single-base extensions C BB’ 3’5’ G C G.. C

24 Polony Template 3’ P’ P 5’ AATACAATTCACACAGGAAACAGCTATGACATTC TATTGTTAAAGTGTGTCCTTTGTCGATACTGGTA…5’ FITC ( C )CY3 ( T ) Primer Extension 26 cycles, 34 Nucleotides Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43

25 Polony haplotyping Trans Cis

26 DNA RNA Protein Metabolites Growth rate Environment Function Genomics Measures & Models microbes stem cells cancer cells multicellular organisms RNAi Insertions SNPs

27 Competition among multiple mutations & multiple homologous domains 123 123 thrA metL 1.16.7 1.8 12 lysC 10.4 probes Selective disadvantage in minimal media

28 Multiple mutations per gene Correlation between two selection experiments

29 Comparison of selection data with FBO predictions (scale up from79 to 488 genes) predictionsnumber of genes negatively selected not negatively selected essential1438063 reduced growth rate 462422 non essential 299119180 P-value Chi Square = 0.004 > < Novel duplicates? Position effects?

30 DNA RNA Protein Metabolites Expression Environment Function Genomics Measures & Models

31 RNA quantitation (Frequently Asked Questions) Is less than a 2-fold RNA-ratio ever important? Yes; 1.5-fold in trisomies. Why oligonucleotides rather than cDNAs? Alternative RNAs, gene families. Using a subset of the genome or ratios to various control RNAs? Trouble for later (meta) analyses.

32 Lpp mRNA start & structure See: Selinger et al Nat Biotech

33 Oligo selection PGA/Smith group already designing software for oligo selection Church Lab / Lipper Center has additional tools –Unique oligos (cu-15s) –RNA string matching program gene- specific oligos controls, text, border oligos gene sequences parameters (Tm, length,...) generate candidate oligos background sequences predict cross- hybridization filter & select oligos generate chip layout experimental results generate control, border oligos chip layout Figure courtesy of Adnan Derti

34 Combinatorial arrays for binding constants (EGR1) ds-DNAarray HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen MRC: Yen Choo

35 Combinatorial arrays for binding constants Combinatorial DNA-binding protein domains ds-DNAarray PhagepVIIIpIII Antibodies

36 Combinatorial arrays for binding constants Phycoerythrin - 2º IgG Combinatorial DNA-binding protein domains ds-DNAarray Martha Bulyk et al Phage

37 Interactions of Adjacent Basepairs in EGR1 Zinc Finger DNA Recognition Isalan et al., Biochemistry (‘98) 37:12026-12033

38 Wildtype EGR1 Microarray high [DNA] (+) ctrl sequence for wt binding alignment oligos etc.

39 WildtypeRSDHLTT Motifs weight all 64 K a appRGPDLARREDVLIR LRHNLET TGG 2.8 nM GCG 16 nM 2.5 nM TAT 5.7 nM AAA,AAT,ACT,AGA, AGC,AGT,CAT,CCT, CGA,CTT,TTC,TTT AAT 240 nM KASNLVS

40 For more information: arep.med.harvard.edu


Download ppt "Genome Function Project We thank for support: Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Lipper, Armenise Corporate collaborators."

Similar presentations


Ads by Google