Aligning Grass Protein Sequences Using PAM-Modified Global Alignment

Slides:



Advertisements
Similar presentations
Results: Tables and Figures. Tables and Figures When to use what? Text: for simple results E.g. Seed production was higher for plants in the full-sun.
Advertisements

Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Phylogenetic analysis To infer and study evolutionary history of homologous gene families Manuel Ruiz (CIRAD, Data Integration team) Alexis Dereeper (IRD)
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Phylogenetic reconstruction
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Bioinformatics and Phylogenetic Analysis
Inter-species sequence conservation and intra- species sequence diversity Apratim Mitra.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Sequence Alignment III CIS 667 February 10, 2004.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Role of Rubisco in Photosynthesis Anu Murphy Dept. of Molecular and Integrative Physiology, University of Illinois at Urbana-Champaign.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Genome alignment Usman Roshan. Applications Genome sequencing on the rise Whole genome comparison provides a deeper understanding of biology – Evolutionary.
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Wednesday, September 11, 2013 TAKE OUT: Bioinformatics pre-lab (p. 1-2); tear off pages 3-8 from lab handout AND RECYCLE ! SAVE analysis questions on page.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Analysis-III
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Construction of Substitution matrices
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Pairwise Sequence Alignment. Three modifications for local alignment The scoring system uses negative scores for mismatches The minimum score for.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Risheng Chen et al BMC Genomics
Computer Applications and Bioinformatics
Scoring Sequence Alignments Calculating E
INVESTIGATION 3 BIG IDEA 1
Phylogeny - based on whole genome data
PNAS 2012 Alpha diversity: how many species are in each sample?
S1 Table. The protein sequences Glycine max St8 MER3 and 18 homologous proteins used for phylogenetic analysis. S. No. Gene Name/ ID Protein type 1 Glyma.06G
Sequence comparison: Local alignment
INVESTIGATION 3 BIG IDEA 1
INVESTIGATION 3 BIG IDEA 1
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
Comparing Crosslinkers
Explore Evolution: Instrument for Analysis
Affine Gap Alignment - An improved global alignment
Multiple Sequence Alignment
Evolution of jasmonate and salicylate signal crosstalk
INVESTIGATION 3 BIG IDEA 1
Volume 9, Issue 11, Pages (November 2016)
Alignment IV BLOSUM Matrices
Basic Local Alignment Search Tool (BLAST)
Mapping of srt1 by BSA-seq.
Wxlv, the Ancestral Allele of Rice Waxy Gene
Comparative Population Genomics Reveals Strong Divergence and Infrequent Introgression between Asian and African Rice  Xuehui Huang, Qiang Zhao, Bin Han 
Presentation transcript:

Aligning Grass Protein Sequences Using PAM-Modified Global Alignment Yifei Zhang May 7, 2018

Utilizing the PAM250 Matrix Obtain the table Build a dictionary Modify the globalAlign Function

Gramineae, or the grasses family Evolutionary History of the Grasses Elizabeth A. Kellogg Plant Physiology Mar 2001, 125 (3) 1198-1205; DOI: 10.1104/pp.125.3.1198  Indica rice: long-grained, Harder after cooked Japonica rice: short-grained, Softer after cooked

Three proteins analyzed: Granule-bound starch synthase, which is related to the stickiness of the seed after cooked GS3 protein/seed length and weight protein, regulates grain size.  Betaine aldehyde dehydrogenase/badh2/fragrance protein. An allele located on the gene is a major factor associated with aroma.  

Finding and processing data Sample : Betaine aldehyde dehydrogenase [Zea mays L.] NCBI Reference Sequence: NP_001105781.2 506 mmasqamvplrqlfvdgewrppaqgrrlpvvnptteahigeipagtaedvdaavaaaraa lkrnrgrdwarapgavrakylraiaakvierkqelaklealdcgkpydeaawdmddvagc feyfadqaealdkrqnspvslpmetfkchlrrepigvvglitpwnypllmatwkvapala agcaavlkpselasvtcleladickevglppgvlnivtglgpdagaplsahpdvdkvaft gsfetgkkimaaaapmvkpvtlelggkspivvfddvdidkavewtlfgcfwtngqicsat srllvhtkiakefnekmvawaknikvsdpleegcrlgpvvsegqyekikkfilnaksega tiltggvrpahlekgffieptiitdittsmeiwreevfgpvlcvkefstedeaielandt qyglagavisgdrercqrlseeidagiiwvncsqpcfcqapwggnkrsgfgrelgeggid nylsvkqvteyisdepwgwyrspskl Remove spaces and number: def clean(s1): result = ''.join(i for i in s1 if not i.isdigit()) result = result.split() result = ''.join(result return result

Modify Global alignment Define getPam function that builds a dictionary from the PAM 250 text table(white space eliminated) Ex. int(pam[string1[a-1]][string2[b-1]]) replaces match Same procedure as HW6.2 Initialize table and backtrack Fill in scores and directions From backtrack start reverse alignment Reverse sequence to get alignment Modify Global alignment

Results A= Zea mays L. B= Oryza sativa indica group C= Oryza sativa japonica group 1= granule- bound starch synthase-stickiness 2=GS3-grain size 3=badh2-fragrance Scores for the second and third sequences are always higher than either one of them scoring with the first sequence : Two Oryza sativa cultivars are more closely related. (As expected) Average indels for pair 1&2: 5 Average indels for pair 4&5: 20 》〉》〉》〉》 GS3 as the most different protein in the three Average indels for pair 7&8: 3

What comes after Analyze more species from the grass family and construct a simple phylogenetic tree using alignment results Dig into different proteins and find out more about the similarities across species.  Develop a simple version of BLAST for protein alignment, (applying it to multiple pairs of sequences at the same time).

End Thank you.