Codon Bias and its Relationship to Gene Expression Presented through a virtual grant by the Virtual Student Union.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Hypothesis testing –Revisited A method for deciding whether the sample that you are looking at has been changed by some type of treatment (Independent.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Ribosome footprinting
1 DNA Analysis Amir Golnabi ENGS 112 Spring 2008.
Bioinformatics “Other techniques raise more questions than they answer. Bioinformatics is what answers the questions those techniques generate.” SheAvery
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Sequencing and Sequence Alignment
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Computational Biology, Part 4 Protein Coding Regions Robert F. Murphy Copyright  All rights reserved.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Statistics in Bioinformatics May 12, 2005 Quiz 3-on May 12 Learning objectives-Understand equally likely outcomes, counting techniques (Example, genetic.
Biological Motivation Gene Finding in Eukaryotic Genomes
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Basal Promoter Elements Basal Promoter Element = BPE TATA Box: (G or A)TATA(A or T)AA –Nature 381: (1996) –Science 272: (1996) CAAT Box:
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Measures of Variability Objective: Students should know what a variance and standard deviation are and for what type of data they typically used.
Analyzing and Interpreting Quantitative Data
The Correlational Research Strategy
Organizing information in the post-genomic era The rise of bioinformatics.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Chapter 3: Averages and Variation Section 2: Measures of Dispersion.
Central dogma: the story of life RNA DNA Protein.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Chapter 6: Analyzing and Interpreting Quantitative Data
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
STATISTICS FOR SCIENCE RESEARCH (The Basics). Why Stats? Scientists analyze data collected in an experiment to look for patterns or relationships among.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
The Genetic Code. The DNA that makes up the human genome can be subdivided into information bytes called genes. Each gene encodes a unique protein that.
A study involving stress is done on a college campus among the students. The stress scores are known to follow a uniform distribution with the lowest stress.
1 Codon Usage. 2 Discovering the codon bias 3 In the year 1980 Four researchers from Lyon analyzed ALL published mRNA sequences of more than about 50.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Discovering the codon bias
Sequence similarity, BLAST alignments & multiple sequence alignments
STATISTICS FOR SCIENCE RESEARCH
Analyzing and Interpreting Quantitative Data
Small RNA and Cyanobacteria
الأستاذ المساعد بقسم المناهج وطرق التدريس
Statistical Evaluation
1 Department of Engineering, 2 Department of Mathematics,
What are the Patterns Of Nucleotide Substitution Within Coding and
1 Department of Engineering, 2 Department of Mathematics,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
1 Department of Engineering, 2 Department of Mathematics,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Using The Genetic Code.
What do you with a whole genome sequence?
From Mendel to Genomics
Advanced challenges in assessing translation efficiency.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Week 11.
The Toy Exon Finder.
Patterns of amino acid usage and its GC-content of synonymous codons in 65 nuclear genomes in this study. Patterns of amino acid usage and its GC-content.
Hypothesis Testing - Chi Square
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Codon Bias and its Relationship to Gene Expression Presented through a virtual grant by the Virtual Student Union.

What is Codon Bias? b Codon bias is the probability that a given codon will be used to code for an amino acid over a different codon which codes for the same amino acid.

How May Codon Bias Relate to Gene Expression? b Genes that are always expressed at a high rate should have a different codon bias than those genes that are always expressed at a low rate. b Genes whose expression varies from low expression to high expression as a given environmental condition changes may have a codon bias similar to the highly expressed genes.

How May Codon Bias Relate to Gene Expression? (cont’d) b If a gene is expressed at a low level for all known conditions, but shares a codon bias similarity with the highly or variably expressed genes, it is possible that the gene is expressed at a high rate under some as yet unknown environmental condition.

How Does One Verify This Hypothesis? b Must use a genome in which the sequence of every ORF is known. b Must use a genome in which the transcriptional rate is known for every ORF as a standard condition varies.

Specifically, How was this Done? b Specifically, the S. cerevisiae geneome was used, because every ORF is sequenced. b The Yeast Expression Database contains the rate of transcription for each ORF in the S. cerevisiae genome. b The Yeast Expression Database also measures the change in transcription rate as the yeast move from a high glucose to low glucose concentration.

What was Done with the Data? b First, the every ORF in the Metabolic Database was ranked from highest to lowest expressed genes. b Next, the ORFs in the Database were ranked using the genes that had the greatest difference in expression between high and low glucose concentration. b The codon frequencies from the 5 highest, 5 lowest and 3 most varied sequences were then analyzed further.

Codon Frequency Results b Glu and Val codon expression is similar in high and variably expressed genes. b Glu and Val codon expression appears different in the typical low expressed gene. b Chi squared analysis suggests that these variances are due to more than random probability.

b Increase the sample size to include the entire genome What are Our Future Goals? b Look for genes with low expression whose codon bias more closely resembles the highly expressed or variably expressed genes.

Future goals (cont’d) b A scoring system must be created to classify gene expression. b Must search genome using many different environmental condtions. b The technique can be applied to other genomes.