DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.

Slides:



Advertisements
Similar presentations
An Introduction to Bioinformatics Finding genes in prokaryotes.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Ab initio gene prediction Genome 559, Winter 2011.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Finding Eukaryotic Open reading frames.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
CISC667, F05, Lec18, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Gene Prediction and Regulation.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Review of Laboratory 3 Spectrophotometric determination of DNA quantity, purity Abs 260 nmAbs 280 nmAbs 320 nmAbs 260/Abs
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Eukaryotic Gene Finding
Lecture 12 Splicing and gene prediction in eukaryotes
Eukaryotic Gene Finding
1. Important Features a. DNA contains genetic template" for proteins.
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao C havan Maharashtra Open University, Nashik.
Biological Motivation Gene Finding in Eukaryotic Genomes
Finding prokaryotic genes and non intronic eukaryotic genes
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Gene Structure and Identification
Fine Structure and Analysis of Eukaryotic Genes
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Protein Synthesis. DNA acts like an "instruction manual“ – it provides all the information needed to function the actual work of translating the information.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
From Gene to Phenotype DNA molecule Gene 1 Gene 2 Gene 3 DNA strand (template) TRANSCRIPTION mRNA Protein TRANSLATION Amino acid A CCAAACCGAGT U G G U.
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
RNA and Protein Synthesis
Genomics: Gene prediction and Annotations Kishor K. Shende Information Officer Bioinformatics Center, Barkatullah University Bhopal.
Monday, October 18, 1:43:47 PM Outline for today Lec 06
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Exam #1 is T 2/17 in class (bring cheat sheet). Protein DNA is used to produce RNA and/or proteins, but not all genes are expressed at the same time or.
Genome analysis. Genome – the sum of genes and intergenic sequences of a haploid cell.
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Translation- taking the message of DNA and converting it into an amino acid sequence.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Bacterial infection by lytic virus
bacteria and eukaryotes
Bacterial infection by lytic virus
Human Genome Project.
Transcription.
Exam #1 is T 9/23 in class (bring cheat sheet).
Exam #1 W 9/26 at 7-8:30pm in UTC 2.102A Review T 9/25 at 5pm in WRW 102 and in class 9/26.
Genes, Genomes, and Genomics
CHAPTER 12 DNA Technology and the Human Genome
Ab initio gene prediction
Recitation 7 2/4/09 PSSMs+Gene finding
Introduction to Bioinformatics II
How Proteins are Made.
Expression of the Genome
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Basic Local Alignment Search Tool
Comparison Of DNA And RNA Synthesis in Prokaryotes and Eukaryotes
Presentation transcript:

DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG ACAGTCddG ACAGTCGATTddG Fragments are separated according to their sizes in gel electrophoresis. The lengths show the positions of “G” in the original DNA sequence.

Nucleotides and phosphodiester bond. Phosphodiester bond

Genomic sequencing. Individual chromosomes are broken into 100kb random fragments. This library of fragments is screened to find overlapping fragments – contigs. Unique overlapping clones are chosen for sequencing. Put together overlapping sequenced clones using computer programs.

Sequencing cDNA libraries. mRNA is pooled from the tissues which express genes. cDNA libraries are prepared by copying of mRNA with reverse transcriptase. Expressed Sequence Tags (EST) – partial sequences of expressed genes. Comparing translated ESTs to annotated proteins – annotation of genes.

Gene prediction. Gene – DNA sequence encoding protein, rRNA, tRNA … Gene concept is complicated: -Introns/exons -Alternative splicing -Genes-in-genes -Multisubunit proteins

Gene structure. ATGTER Promoter sequences ATG – start codon; TER (TAA, TAG,TGA) – termination codons Gene

Codon usage tables. - Each amino acid can be encoded by several codons. - Each organism has characteristic pattern of codon usage.

Problems arising in gene prediction. Distinguishing pseudogenes (not working former genes) from genes. Exon/intron structure in eukaryotes, exon flanking regions – not very well conserved. Exon can be shuffled alternatively – alternative splicing. Genes can overlap each other and occur on different strands of DNA.

Homology-based gene prediction –Similarity Searches (e.g. BLAST, BLAT) –ESTs Ab initio gene prediction –Prokaryotes ORF identification –Eukaryotes Promoter prediction PolyA-signal prediction Splice site, start/stop-codon predictions Gene identification

Ab initio gene prediction. Predictions are based on the observation that gene DNA sequence is not random: - Gene-coding sequence has start and stop codons. -Each species has a characteristic pattern of synonymous codon usage. -Non-coding ORFs are very short. -Gene would correspond to the longest ORF. These methods look for the characteristic features of genes and score them high.

Prokaryotic genes – searching for ORFs. -Small genomes have high gene density Haemophilus influenza – 85% genic -No introns -Operons One transcript, many genes -Open reading frames (ORF) – contiguous set of codons, start with Met-codon, ends with stop codon.

Example of ORFs. There are six possible ORFs in each sequence for both directions of transcription.

Gene preference score – important indicator of coding region. Observation: frequencies of codons and codon pairs in coding and non- coding regions are different. Given a sequence of codons: and assuming independence, the probability of finding coding region: The probability of finding sequence “C” in non-coding regions: The gene preference score:

Classwork I. Calculate the gene preference score for the following human DNA sequence: AGTACA

Ab initio gene prediction methods. Grail II – predicts exons, promoters, Poly(A) sites. Neural network plus dynamic programming. GeneParser – predicts the most likely combination of exons/introns. Dynamic programming. GeneMark – mostly for prokaryotes, Hidden Markov Models. GeneScan – Fourier transform of DNA sequence to find characteristic patterns.

Confirming gene location using EST libraries. Expressed Sequence Tags (ESTs) – sequenced short segments of cDNA. They are organized in the database “UniGene”. If region matches ESTs with high statistical significance, then it is a gene or pseudogene.

Gene prediction accuracy. True positives (TP) – nucleotides, which are correctly predicted to be within the gene. Actual positives (AP) – nucleotides, which are located within the actual gene. Predicted positives (PP) – nucleotides, which are predicted in the gene. Sensitivity = TP / AP Specificity = TP / PP

Gene prediction accuracy. GenScan Website

Common difficulties First and last exons difficult to annotate because they contain UTRs. Smaller genes are not statistically significant so they are thrown out. Algorithms are trained with sequences from known genes which biases them against genes about which nothing is known.

Gene prediction: classwork II. Go to and view all hemoglobin genes of H. sapienshttp:// Find 6 hemoglobin genes on chromosome 11, view the DNA sequence of this chromosome region Submit this sequence to GenScan server at

Genome analysis. Genome – the sum of genes and intergenic sequences of haploid cell.

The value of genome sequences lies in their annotation Annotation – Characterizing genomic features using computational and experimental methods Genes: levels of annotation –Gene Prediction – Where are genes? –What do they encode? –What proteins/pathways involved in?

From Koonin & Galperin

Accuracy of genome annotation. In most genomes functional predictions has been made for majority of genes 54-79%. The source of errors in annotation: - overprediction (those hits which are statistically significant in the database search are not checked) - multidomain protein (found the similarity to only one domain, although the annotation is extended to the whole protein). The error of the genome annotation can be as big as 25%.

Sample genomes. SpeciesSizeGenesGenes/Mb H.sapiens3,200Mb35,00011 D.melanogaster 137Mb C.elegans 85.5Mb18, A.thaliana 115Mb25, S.cerevisiae 15Mb 6, E.coli 4.6Mb 4, There is almost no correlation between the number of genes and organism’s complexity. 2.There is a correlation between the amount of nonprotein-coding DNA and complexity.

Human Genome project.

Comparative genomics - comparison of gene number, gene content and gene location in genomes.. Campbell & Heyer “Genomics”

Analysis of gene order (synteny). Genes with a related function are frequently clustered on the chromosome. Ex: E.coli genes responsible for synthesis of Trp are clustered and order is conserved between different bacterial species. Operon: set of genes transcribed simultaneously with the same direction of transcription

Analysis of gene order (synteny). Koonin & Galperin “Sequence, Evolution, Function”

Analysis of gene order (synteny). The order of genes is not very well conserved if %identity between prokaryotic genomes is < 50% The gene neighborhood can be conserved so that the all neighboring genes belong to the same functional class. Functional prediction based on gene neighboring.

Classwork III: Comparing microbial genomes. Go to Select Thermus thermophilus genome View TaxTable What gene clusters do you see which are common with Archaea?