Download presentation
Presentation is loading. Please wait.
Published byStanislav Denis Musil Modified over 6 years ago
1
Aligning Grass Protein Sequences Using PAM-Modified Global Alignment
Yifei Zhang May 7, 2018
2
Utilizing the PAM250 Matrix
Obtain the table Build a dictionary Modify the globalAlign Function
3
Gramineae, or the grasses family
Evolutionary History of the Grasses Elizabeth A. Kellogg Plant Physiology Mar 2001, 125 (3) ; DOI: /pp Indica rice: long-grained, Harder after cooked Japonica rice: short-grained, Softer after cooked
4
Three proteins analyzed:
Granule-bound starch synthase, which is related to the stickiness of the seed after cooked GS3 protein/seed length and weight protein, regulates grain size. Betaine aldehyde dehydrogenase/badh2/fragrance protein. An allele located on the gene is a major factor associated with aroma.
5
Finding and processing data
Sample : Betaine aldehyde dehydrogenase [Zea mays L.] NCBI Reference Sequence: NP_ 506 mmasqamvplrqlfvdgewrppaqgrrlpvvnptteahigeipagtaedvdaavaaaraa lkrnrgrdwarapgavrakylraiaakvierkqelaklealdcgkpydeaawdmddvagc feyfadqaealdkrqnspvslpmetfkchlrrepigvvglitpwnypllmatwkvapala agcaavlkpselasvtcleladickevglppgvlnivtglgpdagaplsahpdvdkvaft gsfetgkkimaaaapmvkpvtlelggkspivvfddvdidkavewtlfgcfwtngqicsat srllvhtkiakefnekmvawaknikvsdpleegcrlgpvvsegqyekikkfilnaksega tiltggvrpahlekgffieptiitdittsmeiwreevfgpvlcvkefstedeaielandt qyglagavisgdrercqrlseeidagiiwvncsqpcfcqapwggnkrsgfgrelgeggid nylsvkqvteyisdepwgwyrspskl Remove spaces and number: def clean(s1): result = ''.join(i for i in s1 if not i.isdigit()) result = result.split() result = ''.join(result return result
6
Modify Global alignment
Define getPam function that builds a dictionary from the PAM 250 text table(white space eliminated) Ex. int(pam[string1[a-1]][string2[b-1]]) replaces match Same procedure as HW6.2 Initialize table and backtrack Fill in scores and directions From backtrack start reverse alignment Reverse sequence to get alignment Modify Global alignment
7
Results A= Zea mays L. B= Oryza sativa indica group
C= Oryza sativa japonica group 1= granule- bound starch synthase-stickiness 2=GS3-grain size 3=badh2-fragrance Scores for the second and third sequences are always higher than either one of them scoring with the first sequence : Two Oryza sativa cultivars are more closely related (As expected) Average indels for pair 1&2: 5 Average indels for pair 4&5: 》〉》〉》〉》 GS3 as the most different protein in the three Average indels for pair 7&8: 3
8
What comes after Analyze more species from the grass family and construct a simple phylogenetic tree using alignment results Dig into different proteins and find out more about the similarities across species. Develop a simple version of BLAST for protein alignment, (applying it to multiple pairs of sequences at the same time).
9
End Thank you.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.