Lecture 4 BNFO 235 Usman Roshan. IUPAC Nucleic Acid symbols.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Phylogenetic reconstruction
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Molecular Evolution Revised 29/12/06
BNFO 602 Multiple sequence alignment Usman Roshan.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Dynamic programming.
Lecture 2 BNFO 240. Perl online references
Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.
Lecture 1 BNFO 240 Usman Roshan. Course overview Perl progamming language (and some Unix basics) Sequence alignment problem –Algorithm for exact pairwise.
BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.
BNFO 602 Lecture 2 Usman Roshan. Sequence Alignment Widely used in bioinformatics Proteins and genes are of different lengths due to error in sequencing.
BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE.
BNFO 602, Lecture 3 Usman Roshan Some of the slides are based upon material by David Wishart of University.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Sequence alignment.
Pairwise profile alignment Usman Roshan BNFO 601.
BNFO 602 Lecture 2 Usman Roshan. Bioinformatics problems Sequence alignment: oldest and still actively studied Genome-wide association studies: new problem,
BNFO 602 Multiple sequence alignment Usman Roshan.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
Lecture 2 BNFO 135 Usman Roshan. Perl variables Scalar –Number –String Examples –$myname = “Roshan”; –$year = 2006;
Sequence Alignments Revisited
CIS786, Lecture 6 Usman Roshan Some of the slides are based upon material by David Wishart of University.
Bioinformatics Gene Introduction Oct NTUST.
Lecture 1 BNFO 136 Usman Roshan. Course overview Pre-req: BNFO 135 or approval of instructor Python progamming language and Perl for continuing students.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Mutations Section 12–4 This section describes and compares gene mutations and chromosomal mutations.
Sequence Analysis. DNA and Protein sequences are biological information that are well suited for computer analysis Fundamental Axiom: homologous sequences.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Lecture 8 perl pattern matching features
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
DNA Structure & Function. Perspective They knew where genes were (Morgan) They knew what chromosomes were made of Proteins & nucleic acids They didn’t.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Overview of Bioinformatics 1 Module Denis Manley..
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
GE3M25: Computer Programming for Biologists Python, Class 5
Construction of Substitution matrices
Step 3: Tools Database Searching
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Dynamic programming.
An Improved Search Algorithm for Optimal Multiple-Sequence Alignment Paper by: Stefan Schroedl Presentation by: Bryan Franklin.
4.12 DNA and Mutations. Quick DNA Review Base pairing Base pairing.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
A change in the nucleotide sequence of DNA Ultimate source of genetic diversity Gene vs. Chromosome.
SC.912.L.16.3 DNA Replication. – During DNA replication, a double-stranded DNA molecule divides into two single strands. New nucleotides bond to each.
Computer Applications and Bioinformatics
Bioinformatics Overview
Multiple Sequence Alignment
Variation among organisms
Lecture 2 BNFO 601.
Lecture 1 BNFO 601 Usman Roshan.
Types of Mutations.
Distances.
BNFO 602 Lecture 2 Usman Roshan.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
BNFO 602 Lecture 2 Usman Roshan.
Reed A. Cartwright Department of Genetics University of Georgia
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
STAAR Notebook 2.
Nucleic Acids.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Lecture 4 BNFO 235 Usman Roshan

IUPAC Nucleic Acid symbols

IUPAC Amino Acid symbols

Genetic code

Splitting and joining strings split: splits a string by regular expression and returns array = split(/,/); = split(/\s+/); join: joins elements of array and returns a string (opposite of split)

Searching and substitution $x =~ /$y/ ---- true if expression $y found in $x $x =~ /ATG/ --- true if open reading frame ATG found in $x $x !~ /GC/ --- true if GC not found in $x $x =~ s/T/U/g --- replace all T’s with U’s $x =~ s/g/G/g --- convert all lower case g to upper case G

DNA regular expressions Taken from Jagota’s Perl for Bioinformatics

DNA Sequence Evolution AAGACTT -3 mil yrs -2 mil yrs -1 mil yrs today AAGACTT T_GACTTAAGGCTT _GGGCTTTAGACCTTA_CACTT ACCTT (Cat) ACACTTC (Lion) TAGCCCTTA (Monkey) TAGGCCTT (Human) GGCTT (Mouse) T_GACTTAAGGCTT AAGACTT _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT TAGGCCTT (Human) TAGCCCTTA (Monkey) A_C_CTT (Cat) A_CACTTC (Lion) _G_GCTT (Mouse) _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT

Comparative Bioinformatics Fundamental notion of biology: all life is related by an unknown evolutionary Tree of Life. Therefore, if we know something about one species we can make inferences about other ones. Also, by comparing multiple species we can make inferences about sets of species. How do we compare DNA or protein sequences of two different species?

Comparative Bioinformatics We need to know how often do mutations from A to T occur or A to C occur. To determine this we manually create a set of “true” alignments and estimate the likelihood of A changing to C, for example, by counting the number of time A changes to C and computing related statistics. Now we have a realistic “scoring matrix” which can be used to evaluate how related are two species based on their DNA.

Problems Write a Perl subroutine called readmatrix that reads a DNA substitution scoring matrix from a file called “dna.txt” and stores it in a two dimensional array. The format of the scoring matrix in the file is ACGT A10314 C31235 G13152 T45211 Write a Perl subroutine called translate that takes an mRNA sequence and converts it into a protein sequence and also returns the sequence.

Problems Write a Perl program that reads in a substitution scoring matrix from a file called “matrix.txt”, reads in a pair of DNA sequences of equal length from a file called “dna.txt”, and returns the total substitution score between the two sequences. Write a Perl program that reads pairs of DNA sequences from a file called “DNApairs.txt” and estimates the frequency of nucleotide substitutions.