1 Speaker: Jakob Fredslund, computer scientist, post doc. at Bioinformatics Research Centre, Aarhus University.

Slides:



Advertisements
Similar presentations
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Advertisements

A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
ISSAG Viterbo - 22 / 26 August 2005 CONTRIBUTION TO FINE MAPPING OF OL-2 LOCUS IN TOMATO MINOIA SILVIA Department of Agro-forestry and Environmental Biology.
COFFEE: an objective function for multiple sequence alignments
A Look into the Process of Marker Development Matt Robinson.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
PCR - Polymerase Chain Reaction PCR is an in vitro technique for the amplification of a region of DNA which lies between two regions of known sequence.
A Fully Automated Object Extraction System for the World Wide Web a paper by David Buttler, Ling Liu and Calton Pu, Georgia Tech.
1 Bioinformatics Research Center. 2 Talk overview 1.DNA and genes 2.Project idea 3.PriFi – finding primers based on a multiple alignment 4.GeMprospector.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Comparative Genome Analysis. Comparative yeast genomics Kellis et al (2003) Nature 423,
1. 2 Talk overview Overall project scenario PriFi motivation PriFi algorithm description Web version Demo.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Lecture 12 Splicing and gene prediction in eukaryotes
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Phylogenetic trees Sushmita Roy BMI/CS 576
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Interdisciplinary Center for Biotechnology Research
PCR Primer Design Guidelines
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Designing CAPS markers using SGN CAPS Designer
IN THE NAME OF GOD. PCR Primer Design Lecturer: Dr. Farkhondeh Poursina.
Protein Sequence Alignment and Database Searching.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Primer Design and Computer Program Does it really matter? Principles of Primer Design Can I trust my gut feeling? What should I do? Sean Tsai ©1999, National.
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Copyright OpenHelix. No use or reproduction without express written consent1.
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
The iPlant Collaborative
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Announcements Urban Forestry data and photos due next week after the break. Reading. Writing assignment due Oct 18. Choose one of the characteristics out.
Using blast to study gene evolution – an example.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Pattern Discovery and Recognition for Genetic Regulation Tim Bailey UQ Maths and IMB.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Construction of Substitution matrices
Motif Search and RNA Structure Prediction Lesson 9.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Expected accuracy sequence alignment Usman Roshan.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Vineet Bafna. How can we compute the local alignment itself?
Fac. of Agriculture, Assiut Univ.
Fall HORT6033 Molecular Plant Breeding
Lesson: Sequence processing
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
PCR TECHNIQUE
Primer design.
A Hybrid Algorithm for Multiple DNA Sequence Alignment
Lecture 4: Probe & primer design
Tests for Gene Clustering
BLAST.
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Genomic structure of LTBP-4 around the 3rd 8-Cys repeat.
Presentation transcript:

1 Speaker: Jakob Fredslund, computer scientist, post doc. at Bioinformatics Research Centre, Aarhus University.

2 Talk overview 1.Overall project scenario 2.PriFi – finding primers based on a multiple alignment 3.Web version 4.PriFi demo 5.Project results 6.Project pipeline demo

3 Overall project aim Development of General molecular markers for legume genetics –unique, variable, known sequence –markers may be associated with important breeding traits –many markers  greater chance of trait association –general: each marker should be shared by all legumes PCR primers for the markers –One set per marker, should work for all legumes

4 CATS – comparative anchor tagged sequences Alignment of ESTs from multiple legume species Align to genomic region Intron Identification of evolutionarily conserved regions Design PCR primers in conserved regions - Hopefully primers work in related species too Amplification of intron - Weak selection pressure on introns  good chance of finding polymorphism

5 Copyright ©2004 by the National Academy of Sciences Choi, Hong-Kyu et al. (2004) Proc. Natl. Acad. Sci. USA 101, , Doyle & Luckow species  General legume markers would be very useful! Legume Taxonomy Genomic sequences Genomic sequences Arachis (peanut)

6 Legumes We don't have a complete legume genome If incomplete genome is used: –Markers may have undiscovered paralogs in genome and hence also in other legumes –Won't know which sequence we're actually reading with PCR Use complete Arabidopsis thaliana genome instead Genomic regions that haven't been sequenced ? ACGCATCGATTCGCGAACTG

7 Arabidopsis and legumes If Arabidopsis has 2 copies of some gene, its legume ortholog probably exists in only 1 copy EST has 1 or 2 hits in Arabidopsis  probably unique in legumes  useful as marker candidate Arabidopsis Legumes whole genome duplication

8 Large-scale: Legume pipeline

9 Primer design introns replaced by X'es to help Clustal Good marker region Usual method: Visual inspection of alignment  "manual" design of primers. Idea: automate primer design through computer program. Primer consensus sequences: Fw: TGCYTCAAAGGAGGAAATTTCAARAG Rv: CTGTCAAYACCAGTATTTGCCCKKG

10 Lab practice Work method: go through numerous examples with lab people while they explain what they do and why. The "why" turned out to be difficult: Hard rules hard to formulate –"So T m must always be above 55°." –"Yes. Unless.. " Rules often contradictory –"But then the primer violates the AT content rule??" –"It does? Well then the rule should be rephrased to.." Scoring primer pairs –"Why is this primer pair better than this one?" –"It just is!"

11 Primer finder program PriFi Works with alignment (or Fasta file which it aligns itself using Clustal). 1.Identifies conserved regions and locates introns 2.Identifies individual primer candidates –Checks most criteria 3.Considers pairs of primer candidates –Checks remaining criteria 4.Ranks all pairs 5.Suggests four pairs and explains their scores –Lets user make informed choice (discussions showed primer design is not exact science!).

12 Report Fw 5'-ATCCGATTTCGAGAAATGCAAACCCTGGTTGATCC Rv 5'-CCCTTCACAGTGGTGATACACTTTCGCTTGTTACG Tm = 66.4 / 66.9 Primer lengths: 35 / 35 Avg. #sequences in primer alignments: 3.0 / 2.0 Estimated product length: 1785 Primer/intron distances: 36 / 88 A/T's among last 8 bp of 3'-end: 4 / 5 Ambiguities: 0 / : High-Tm bonus 6.0: Fw primer length 6.0: Rv primer length 24.7: bonus for #sequences in primer alignments 3.0: Fw has G/C terminal in 3'-end 3.0: Rv has G/C terminal in 3'-end 60.0: Good product length -5.0: Rv in unconserved region or based mostly on 2 seqs -11.3: Primer/intron distance(s) outside bp -3.0: Too high AT content in 3'-ends Score: 176

13 PriFi on the web Can't do batch runs, otherwise same program

14

15 Configuration Critical melting temperature If both primer melting temperatures are below this value, penalize the pair. Optimal PCR product length interval Penalty Ok Optimal Ok Penalty p1p1 p2p2 p3p3 p4p4 PCR prod len points Introns in sequences If set to 'no', primer pairs do not have to span an intron (and introns are not marked by X'es). Somewhat heuristic parameters and rules..

16 PriFi demodemo

17 PriFi user statistics 2109 true hits in total. Users from 37 countries.

18 Project results in interactive web tableweb table

19 Status Genomic data from Medicago and Lotus, ESTs from Medicago, Lotus, Glycine, Arachis, Phaseolus. PriFi found primer pairs for 400 alignments. 92 primer pairs tested in Phaseolus: –57 usable, correct bands, 38 markers 90 tested in Arachis –43 usable, correct bands, 34 markers

20 Web-based pipelinepipeline

21 Thanks for your attention People involved in developing PriFi: Leif Schauser (BiRC), Lene H. Madsen, Niels Sandal (Dept. of Mol. Biology). Grant holder: Jens Stougaard.