Genome analysis. Genome – the sum of genes and intergenic sequences of a haploid cell.

Slides:



Advertisements
Similar presentations
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Basics of Comparative Genomics Dr G. P. S. Raghava.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
A Real-life Application of Barabasi’s Scale-Free Power-Law Presentation for ENGS 112 Doug Madory Wed, 1 JUN 05 Fri, 27 MAY 05.
Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.
Network Statistics Gesine Reinert. Yeast protein interactions.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Global topological properties of biological networks.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Advanced Topics in Data Mining Special focus: Social Networks.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Error and Attack Tolerance of Complex Networks Albert, Jeong, Barabási (presented by Walfredo)
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Computer Science 1 Web as a graph Anna Karpovsky.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
The Erdös-Rényi models
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
九大数理集中講義 Comparison, Analysis, and Control of Biological Networks (3) Domain-Based Mathematical Models for Protein Evolution Tatsuya Akutsu Bioinformatics.
Biological Pathways & Networks
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
DNA properties.. Sugar-phosphate backbones form ridges on edges of helix. Copyright © Ramaswamy H. Sarma 1996.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Bioinformatics Center Institute for Chemical Research Kyoto University
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Informatics tools in network science
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Models of Web-Like Graphs: Integrated Approach
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Lecture II Introduction to complex networks Santo Fortunato.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Lecture 23: Structure of Networks
Structures of Networks
Biological networks CS 5263 Bioinformatics.
Basics of Comparative Genomics
Glycolysis.
Cellular Respiration Stage 1: Glycolysis
Lecture 23: Structure of Networks
Biological Networks Analysis Degree Distribution and Network Motifs
Recitation 7 2/4/09 PSSMs+Gene finding
Department of Computer Science University of York
Lecture 23: Structure of Networks
Basics of Comparative Genomics
Presentation transcript:

Genome analysis. Genome – the sum of genes and intergenic sequences of a haploid cell.

The value of genome sequences lies in their annotation Annotation – Characterizing genomic features using computational and experimental methods Genes: Four levels of annotation –Gene Prediction – Where are genes? –What do they look like? –What do they encode? –What proteins/pathways involved in?

Koonin & Galperin

Accuracy of genome annotation. In most genomes functional predictions has been made for majority of genes 54-79%. The source of errors in annotation: - overprediction (those hits which are statistically significant in the database search are not checked) - multidomain protein (found the similarity to only one domain, although the annotation is extended to the whole protein). The error of the genome annotation can be as big as 25%.

Sample genomes SpeciesSizeGenesGenes/Mb H.sapiens3,200Mb35,00011 D.melanogaster 137Mb C.elegans 85.5Mb18, A.thaliana 115Mb25, S.cerevisiae 15Mb 6, E.coli 4.6Mb 4,300934

So much DNA – so “few” genes …

Human Genome project.

Comparative genomics - comparison of gene number, gene content and gene location in genomes.. Campbell & Heyer “Genomics”

Analysis of gene order (synteny). Genes with a related function are frequently clustered on the chromosome. Ex: E.coli genes responsible for synthesis of Trp are clustered and order is conserved between different bacterial species. Operon: set of genes transcribed simultaneously with the same direction of transcription

Analysis of gene order (synteny). Koonin & Galperin “Sequence, Evolution, Function”

Analysis of gene order (synteny). The order of genes is not very well conserved if %identity between prokaryotic genomes is less than 50% The gene neighborhood can be conserved so that all neighboring genes belong to the same functional class. Functional prediction can be based on gene neighboring.

Role of “junk” DNA in a cell. SpeciesSizeGenesGenes/Mb H.sapiens3,200Mb35,00011 D.melanogaster 137Mb C.elegans 85.5Mb18, A.thaliana 115Mb25, S.cerevisiae 15Mb 6, E.coli 4.6Mb 4, There is almost no correlation between the number of genes and organism’s complexity. 2.There is a correlation between the amount of nonprotein-coding DNA and complexity.

New interpretation of introns. 1.Modern introns envaded eukaryotes late in evolution, they are derived from self-splicing mobile genetic elements similar to group II introns. 2.Nucleus which separates transcription and translation, appears only in eukaryotes. For prokaryotes there would not be time for introns to splice themselves out. 3.Hypothesis: important regulatory role of introns.

Regulatory role of non-coding regions. -“Micro-RNAs” control timing of processes in development and apoptosis. -Intron’s RNAs inform about the transcription of a particular gene. -Alternative splicing can be regulated by non-coding regions. -Non-coding regions can be very well conserved between the species and many genetic deseases have been linked to variations/mutations in non-coding regions.

COGs – Clusters of Orthologous Genes. Orthologs – genes in different species that evolved from a common ancestral gene by speciation; Paralogs – paralogs are genes related by duplication within a genome.

Classwork I: Comparing microbial genomes. Go to Select Thermus thermophilus genome View TaxTable What gene clusters do you see which are common with Archaea?

Systems biology. Integrative approach to study the relationships and interactions between various parts of a complex system. Goal: to develop a model of interacting components for the whole system.

Basic notions of networks. Network (graph) – a set of vertices connected via edges. The degree of a vertex – the total number of connections of a vertex. Random networks – networks with a disordered arrangement of edges.

Properties of networks. Vertex degree distribution/connectivity. Clustering coefficient. Network diameter.

Characteristics of networks: vertex degree distribution. K=2 K=3 K=1 P(k,N) – degree distribution, k - degree of the vertex, N - number of vertices. If vertices are statistically independent and connections are random, the degree distribution completely determines the statistical properties of a network.

Characteristics of networks: vertex degree distribution.

Characteristics of networks: clustering coefficient. The clustering coefficient characterizes the density of connections in the environment close to a given vertex. d – total number of edges connecting nearest neighbors; n – number of nearest verteces for a given vertex C = 2/6

Characteristics of networks: diameter, small- world. Diameter of a network – shortest path along the existing links averaged over all pairs of verteces. Distance between two verteces = the smallest number of steps one can take to reach on vertex from another. Small-world character of the networks: any two verteces can be connected by relatively short paths. For random networks the diameter increases logarithmically with the addition of new verteces.

Different network models: Erdos-Renyi model. Start with the fixed set of vertices. Iterate the following process: Chose randomly two vertices and connect them by an edge. Stop at certain number of edges. Degree distribution – Poisson distribution, λ – average degree ln(P(k)) ln( k )

Different network models: model 2. At each step, a new vertex is added to the graph Simultaneously, a pair of randomly chosen vertices is connected by an edge. This is a non-equilibrium model – the total number of vertices is not fixed. ln(P(k)) ln(k) Degree distribution – exponential distribution.

Different network models: Barabasi-Alberts. Model of preferential attachment. At each step, a new vertex is added to the graph The new vertex is attached to one of old vertices with probability proportional to the degree of that old vertex. ln(P(k)) ln(k) Degree distribution – power law distribution.

Power Law distribution Multiplying k by a constant, does not change the shape of the distribution – scale free distribution. From T. Przytycka

Difference between scale-free and random networks. Random networks are homogeneous, most nodes have the same number of links. Scale-free networks have a few highly connected verteces.

D-Glucose D-Glucose-6P D-Fructose-6P D-Fructose-1,6P 2 Glyceraldehyde-3PGlycerone-P Glycerate-1,3P 2 Glycerate-3P Glycerate-2P Phosphoenol-pyruvate Pyruvate Lactate ATP ADP ATP ADP NAD + + Pi NADH + H + ADP ATP ADP ATP NADH + H + NAD + Hexokinase Phosphofructokin ase Aldolase Triose phosphate isomerase Glyceraldehyde 3-P dehydrogenase H2OH2O Enolase Pyruvate kinase Lactate dehydrogenase Phosphoglucose isomerase Phoshoglycerate kinase Phosphoglycerate mutase Pentose phosphate cycle Pyruvate metabolism Apicoplast FA synthesis Glycerolipid metabolism Glycolysis metabolic network enzymes subsbstrate Slide credit: Hagai Ginsburg Example 1: the large-scale organization of metabolic networks.

Jeong et al, Nature, 2000: -Compared metabolic networks of 43 organisms. -Verteces – substrates connected with each other through links/metabolic reactions. Results: - Scale-free nature of metabolic networks for all organisms, γ = Diameters of metabolic networks for all organisms are the same.

Biological interpretations of power-law connectivity. Few verteces dominate the overall connectivity of network. Self-similarity of networks. Small diameter, respond quickly to a mutation which can destroy an enzyme, activate different paths quickly.

Protein-protein interaction networks. Sneppen & Maslov: Verteces – proteins, edges connect those proteins which interact in a cell Network: 3278 interactions,1289 proteins Scale free network,  = 2.5 +/- 0.3 Sneppen & Maslov