Genomics, Proteomics, and Systems Biology 5. 5 Genomics, Proteomics, and Systems Biology Genomes and Transcriptomes Proteomics Systems Biology.

Slides:



Advertisements
Similar presentations
Recombinant DNA Technology
Advertisements

The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
Recombinant DNA technology
Production of the Antimalarial Drug Precursor Artemisinic Acid in Engineered Yeast February 12, 2007 Patrick Gildea By J.D. Keasling et all.
Chapter 18 Regulation of Gene Expression.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
2 March, 2005 Chapter 12 Mutational dissection Normal gene Altered gene with altered phenotype mutagenesis.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Alternative splicing and evolution Daniel Jeffares.
10 Genomics, Proteomics and Genetic Engineering. 2 Genomics and Proteomics The field of genomics deals with the DNA sequence, organization, function,
Bacterial Physiology (Micr430)
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Genetics: From Genes to Genomes
General Microbiology (Micr300) Lecture 11 Biotechnology (Text Chapters: ; )
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Cloning, genomes, and proteomes
Manipulating the Genome: DNA Cloning and Analysis 20.1 – 20.3 Lesson 4.8.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Synthetic biology Genome engineering Chris Yellman, U. Texas CSSB.
Genetic Engineering Do you want a footer?.
Introduction to biotechnology Haixu Tang School of Informatics.
Control of Gene Expression Eukaryotes. Eukaryotic Gene Expression Some genes are expressed in all cells all the time. These so-called housekeeping genes.
AP Biology Ch. 20 Biotechnology.
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
歐亞書局 PRINCIPLES OF BIOCHEMISTRY Chapter 9 DNA-Based Information Technologies.
Molecular Techniques.
Transfection. What is transfection? Broadly defined, transfection is the process of artificially introducing nucleic acids (DNA or RNA) into cells, utilizing.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
CO 10.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Library screening Heterologous and homologous gene probes Differential screening Expression library screening.
Microarray Technology
Section 2 Genetics and Biotechnology DNA Technology
Copyright © 2009 Pearson Education, Inc. Art and Photos in PowerPoint ® Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino Chapter 21.
Finish up array applications Move on to proteomics Protein microarrays.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
1 RNA Bioinformatics Genes and Secondary Structure Anne Haake Rhys Price Jones & Tex Thompson.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
 The process by which desired traits of certain plants and animals are selected and passed on to their future generations is called selective breeding.
Proteomics The science of proteomics Applications of proteomics Proteomic methods a. protein purification b. protein sequencing c. mass spectrometry.
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
By Melissa Rivera.  GENE CLONING: production of multiple identical copies of DNA  It was developed so scientists could work directly with specific genes.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Central dogma: the story of life RNA DNA Protein.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
Chapter 12 Assessment How could manipulating DNA be beneficial?
How many genes are there?
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
DNA Technology & Genomics CHAPTER 20. Restriction Enzymes enzymes that cut DNA at specific locations (restriction sites) yielding restriction fragments.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Biotechnology.
Part 3 Gene Technology & Medicine
MCB 7200: Molecular Biology
Genomics is an interdisciplinary field of science within the field of
Section 2 Genetics and Biotechnology DNA Technology
Today… Review a few items from last class
Genomes and Their Evolution
The Study of Biological Information
From Mendel to Genomics
Human Genome Project Seminal achievement. Scientific milestone.
Presentation transcript:

Genomics, Proteomics, and Systems Biology 5

5 Genomics, Proteomics, and Systems Biology Genomes and Transcriptomes Proteomics Systems Biology

Introduction Genome sequencing projects introduced large-scale experimental approaches, that generate vast amounts of data, to the study of biological systems. Complete genome sequences can be determined, as well as large-scale analyses of all the RNAs and proteins expressed in a cell.

Introduction These global experimental approaches form the basis of the new field of systems biology, which seeks a quantitative understanding of the integrated behavior of complex biological systems.

Genomes and Transcriptomes The Human Genome Project: the effort to sequence the entire human genome (3 billion base pairs), published in The genome sequences of many other species have also been determined, and advances in sequencing technology now allow rapid sequencing of individual genomes.

Genomes and Transcriptomes The first complete genome was reported in 1995, of the bacterium Haemophilus influenzae. It contains 1.8 × 10 6 base pairs. Protein-coding regions were identified by computer analysis to detect open- reading frames—long stretches that don’t contain any stop codons.

Figure 5.1 The genome of Haemophilus influenzae

Genomes and Transcriptomes In bacteria, most of the DNA encodes proteins. The E. coli genome is twice the size of H. influenzae, 4.6 × 10 6 base pairs (about 4,000 genes). Nearly 90% of the DNA is protein-coding. More than 2,000 bacterial genomes have now been sequenced.

Genomes and Transcriptomes The yeast Saccharomyces cerevisiae has the simplest eukaryotic genome, making it a useful model for eukaryotic cells. Yeasts have about 6,000 genes; about 70% of the genome codes for proteins.

Genomes and Transcriptomes Multicellular organisms (C. elegans, Drosophila, and Arabidopsis), were sequenced next. These genomes are about 10 times larger than yeast, but had fewer genes than expected for more complex organisms. Much less of the DNA is protein-coding than in bacteria and yeasts.

Table 5.1 Representative Genomes

Genomes and Transcriptomes Drosophila has fewer genes than C. elegans. Sequencing revealed the fact that biological complexity is not just related to number of genes.

Genomes and Transcriptomes The genome of Arabidopsis thaliana was sequenced in 2000 and found to have about 26,000 genes. Even more genes occur in other plant genomes (e.g., 57,000 in apples).

Genomes and Transcriptomes The human genome has about 3 × 10 9 base pairs. Draft sequences were published in 2001 by two different groups using different approaches. The complete sequence was published in 2004.

Genomes and Transcriptomes The International Human Genome Sequencing Consortium sequenced DNA fragments derived from BAC (bacterial artificial chromosome) clones that had been previously mapped to human chromosomes.

Key Experiment, Ch. 5, p. 161 (3)

Genomes and Transcriptomes A team led by Craig Venter of Celera Genomics used a shotgun approach: Small DNA fragments were cloned and sequenced; overlaps between sequences were then used to assemble the sequence of the genome.

Genomes and Transcriptomes A major surprise from the human genome sequence was that there are only 21,000 protein-coding genes, about 1% of the total genome.

Genomes and Transcriptomes 40% of human proteins are related to proteins in simpler eukaryotes; most function in basic cellular processes. Most proteins that are unique to humans are made up of domains that are also found in other organisms, but are arranged in novel combinations.

Genomes and Transcriptomes The genomes of many other vertebrates have now been sequenced. This allows comparisons to the human genome, and helps identify functional sequences. Comparison of human, mouse, chicken, and zebrafish genomes shows that about half of protein- coding genes are common to all vertebrates.

Figure 5.2 Evolution of sequenced vertebrates

Figure 5.3 Comparison of vertebrate genomes

Genomes and Transcriptomes Mice, rats, and humans have 90% of their genes in common. Mouse and rat genome sequences provide essential databases for research in mammalian genetics and human physiology and medicine.

Genomes and Transcriptomes The dog genome sequence has become important in understanding the genetic basis of morphology, behavior, and a variety of diseases. Characteristics of the many dog breeds are highly specific, which facilitates identification of the responsible genes.

Genomes and Transcriptomes Many diseases, including cancer, are common in some breeds, and understanding the genetic basis will benefit both veterinary and human medicine.

Genomes and Transcriptomes Genome sequences of other primates may help pinpoint unique features that distinguish humans. Human and chimpanzee genomes are nearly 99% identical. But sequence differences often alter the coding sequences, leading to different amino acid sequences of most of the proteins in the two species.

Genomes and Transcriptomes Neandertals and modern humans diverged 300,000 to 400,000 years ago, and their genomes are about 99.9% identical. The differences alter coding sequences of only 90 genes that are conserved in modern humans.

Genomes and Transcriptomes The human genome project used the dideoxynucleotide technique first described by Fred Sanger in But even with automation, this approach is slow and expensive. Next-generation sequencing: new techniques that increased speed and lowered costs.

Figure 5.4 Progress in DNA sequencing

Genomes and Transcriptomes Next-generation, or massively parallel sequencing, are methods in which millions of templates are sequenced simultaneously.

Figure 5.5 Next-generation sequencing

Genomes and Transcriptomes The first individual human genomes to be sequenced were those of Craig Venter and James Watson (2007 and 2008). Since then, thousands of individual genomes have been sequenced. Personal sequences will allow therapies to be specifically tailored to the needs of individual patients.

Genomes and Transcriptomes In the future, genome sequencing may be important in disease prevention by identifying genes that confer susceptibility to particular diseases.

Genomes and Transcriptomes Transcriptome: all the RNAs that are transcribed in a cell. Complete genome sequences allow study of gene expression for the whole genome, instead of one gene at a time. One method used is hybridization to DNA microarrays.

Genomes and Transcriptomes Oligonucleotides are printed by a robotic system onto glass or silicon chips. Each spot on the array consists of a single oligonucleotide. DNA microarrays can be used to compare gene expression between two cell types.

Figure 5.6 DNA microarrays

Genomes and Transcriptomes cDNAs are synthesized from mRNAs by reverse transcription, labeled with fluorescent dyes and hybridized to DNA microarrays. The relative level of expression of each gene is indicated by intensity of fluorescence at each position on the microarray.

Genomes and Transcriptomes RNA-seq reveals the sequences of all mRNAs in a cell. Cellular mRNAs are reverse transcribed to cDNAs, which are analyzed by next-generation sequencing. The frequency of mRNAs found also indicates their abundance in the cell.

Figure 5.7 RNA-seq

Proteomics To understand cell function, it is necessary to know what proteins are expressed and how they function within the cell. The large-scale analysis of cell proteins is called proteomics. The goal is to identify and quantify all proteins expressed in a given cell (the proteome).

Proteomics The number of proteins expressed in a cell is greater than the number of genes. Many genes can be expressed to yield several distinct mRNAs, which encode different polypeptides as a result of alternative splicing. Proteins can also be modified in various ways.

Proteomics The first technology to separate proteins was two-dimensional gel electrophoresis. Proteins are separated based on charge and then size. This technique is biased toward the most abundant proteins.

Figure 5.8 Two-dimensional gel electrophoresis

Proteomics The main tool currently used is mass spectrometry. A protease cleaves the protein into small peptides. These are ionized and analyzed in a mass spectrometer, which determines the mass-to-charge ratio of each peptide. The mass spectrum is compared to a data base of known spectra.

Figure 5.9 Identification of proteins by mass spectrometry

Proteomics A “shot-gun” approach eliminates the gel electrophoresis. Cell proteins are digested with protease and the whole mixture sequenced by tandem mass spectrometry.

Figure 5.10 Tandem mass spectrometry

Proteomics Determining the locations of proteins in cells and organelles is also important. Organelles are isolated by subcellular fractionation and the proteins are analyzed by mass spectrometry. The proteome of a variety of organelles and structures have been characterized.

Table 5.2 Protein composition of cellular structures

Proteomics Proteins function by interacting with other proteins in protein complexes and networks. The systematic analysis of these complexes and interactions has become an important goal of proteomics.

Proteomics Proteins can be isolated from cells under gentle conditions so that protein complexes are not disrupted. Typically, an antibody against a protein of interest would be used to isolate the protein from a cell extract by immunoprecipitation.

Figure 5.11 Immunoprecipitation

Proteomics Immunoprecipitated protein complexes can then be analyzed by mass spectrometry. The protein against which the antibody was directed can be identified, along with other proteins it was associated with in the cell extract.

Figure 5.12 Analysis of protein complexes

Proteomics Alternative approaches include screens for protein interactions in vitro, and screens that detect interactions between pairs of proteins introduced into yeast cells.

Proteomics In the yeast two-hybrid system, two different cDNAs (e.g., from human cells) are joined to two distinct domains of a protein that stimulates expression of a target gene in yeast.

Figure 5.13 The yeast two-hybrid system

Proteomics Screens have identified thousands of protein–protein interactions, which can be presented as maps that depict a network of interacting proteins within a cell.

Figure 5.14 A protein interaction map of Drosophila

Bioinformatics and Systems Biology Genome sequencing, proteomics, and other large-scale experiments have yielded vast amounts of data. Bioinformatics, at the interface between biology and computer science, uses computational methods to analyze and extract biological information from all this data.

Bioinformatics and Systems Biology These large-scale experimental approaches form the basis of the new field of systems biology. The goal: A quantitative understanding of the integrated dynamic behavior of complex biological systems and processes.

Figure 5.15 Systems biology

Bioinformatics and Systems Biology Systematic screens of gene function: One approach to study gene function is to inactivate (knockout) each gene. Collections of strains with mutations in all known genes are available for E. coli, yeast, Drosophila, C. elegans, and Arabidopsis thaliana.

Bioinformatics and Systems Biology A large-scale international project to systematically knockout all genes in the mouse is also under way. Targeted mutagenesis has determined functions of more than 7,000 mouse genes.

Bioinformatics and Systems Biology Other large-scale screening projects are based on RNA interference (RNAi). Double-stranded RNAs are used to induce degradation of homologous mRNAs in cells.

Figure 4.38 RNA Interference

Bioinformatics and Systems Biology With the availability of complete genome sequences, libraries of double-stranded RNAs can be designed and used in genome-wide screens to identify all of the genes involved in any biological process.

Figure 5.16 Genome-wide RNAi screen for cell growth and viability

Bioinformatics and Systems Biology Regulation of gene expression: Understanding the mechanisms that control gene expression is a central undertaking in cell and molecular biology. It is far more difficult to identify gene regulatory sequences than protein- coding sequences.

Bioinformatics and Systems Biology Most regulatory elements are short sequences, typically only about ten base pairs. Consequently, sequences resembling regulatory elements occur frequently by chance in genomic DNA. Identifying regulatory sequences is a major challenge in systems biology.

Bioinformatics and Systems Biology Global studies of gene expression, using microarrays or RNA-seq can reveal overall changes in gene regulation associated with discrete cell behaviors, such as the response of cells to a particular hormone. Changes in expression of multiple genes can help pinpoint shared regulatory elements.

Bioinformatics and Systems Biology Computational approaches are also used to characterize regulatory elements. Comparative analysis of genome sequences of related organisms assumes that functionally important sequences are conserved in evolution, and nonfunctional segments diverge more rapidly.

Bioinformatics and Systems Biology Computational analysis to identify noncoding sequences that are conserved between the mouse, rat, dog, and human genomes has helped identify sequences that control gene transcription.

Figure 5.17 Conservation of functional gene regulatory elements

Bioinformatics and Systems Biology Genome-wide analysis of the binding sites of regulatory proteins have also been developed. Genome-wide analysis of the sites of histone modifications can also provide identification of gene regulatory sequences.

Bioinformatics and Systems Biology ENCODE (Encyclopedia of DNA Elements) utilized RNA-seq to characterize all transcribed RNAs, plus global methods to determine gene regulatory sequences in 147 different types of human cells. One result: Many transcribed noncoding sequences play important roles in gene regulation.

Bioinformatics and Systems Biology Networks: Classical experimental biology focuses on single genes and proteins, which often act sequentially to catalyze reactions in a metabolic pathway. Signaling pathways act similarly to transmit information from the environment, such as presence of a hormone, to targets within the cell.

Figure 5.18 Example of a signaling pathway

Bioinformatics and Systems Biology But metabolic and signaling pathways do not operate in isolation. There is extensive crosstalk between pathways, so that multiple pathways interact with one another to form networks. Computational modeling of networks is currently a major challenge in systems biology.

Bioinformatics and Systems Biology Many pathways are controlled by feedback loops (e.g., feedback inhibition of metabolic pathways, or negative feedback loop). Feedforward relays: activity of one component of a pathway stimulates a distant downstream component.

Bioinformatics and Systems Biology Crosstalk: interaction of one pathway with another; can be positive (one pathway stimulates the other) or negative (one pathway inhibits the other).

Figure 5.19 Elements of signaling networks

Bioinformatics and Systems Biology In this view of the cell as an integrated system, a full understanding of cell signaling will require development of network models. A model of a gene regulatory network controlling development of an embryonic cell lineage in sea urchins has recently been developed.

Figure 5.20 A gene regulatory network

Bioinformatics and Systems Biology Synthetic biology: The goal is to design and create new (unnatural or synthetic) systems, to create useful products and to better understand how the behavior of existing cells is controlled.

Bioinformatics and Systems Biology Synthetic biologists can synthesize new molecules with biological properties, such as RNA, or engineer new systems using components of existing cells. The ability to engineer a novel biological system tests and expands our understanding of how natural systems function.

Bioinformatics and Systems Biology Genetic circuits in E. coli were first engineered in A genetic toggle switch was designed to confer stability and memory on a network regulating gene expression. The key feature is that two repressors control expression of each other as well as a reporter gene.

Figure 5.21 A genetic toggle switch

Bioinformatics and Systems Biology Similar genetic circuits have since been engineered in eukaryotic models. This has substantially advanced our understanding of how a regulatory circuit can alternate between two stable states—a common feature of networks involved in many aspects of cell signaling and regulation of cell proliferation.

Bioinformatics and Systems Biology Practical applications of synthetic biology—treating malaria: Malaria is a serious parasitic disease, caused by the protozoan Plasmodium and transmitted by mosquitoes. Research on vaccine development is underway, but none is currently available.

Molecular Medicine, Ch. 5, p. 180

Bioinformatics and Systems Biology The most effective antimalarial drug right now is artemisinin, a compound produced by a plant that takes 8 months to mature. The supply of artemisinin from these plants is limited and the price fluctuates.

Figure 5.22 Structure of artemisinin

Bioinformatics and Systems Biology Synthetic biologists have developed strains of yeast engineered to produce a precursor to artemisin, which is then used for commercial production of this important drug.

Bioinformatics and Systems Biology The first cell with a completely synthetic genome was created in Venter et al. synthesized overlapping oligonucleotides corresponding to the complete genome sequence of Mycoplasma mycoides.

Bioinformatics and Systems Biology The synthetic genome was then introduced into a different mycoplasma subspecies, M. capricolum. These cells grew normally and showed the morphology of normal M. mycoides. Because the cell proteins are specified by the synthetic genome, they represent the first synthetic cells.

Figure 5.23 First cell with a synthetic genome