Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aligning Grass Protein Sequences Using PAM-Modified Global Alignment

Similar presentations


Presentation on theme: "Aligning Grass Protein Sequences Using PAM-Modified Global Alignment"— Presentation transcript:

1 Aligning Grass Protein Sequences Using PAM-Modified Global Alignment
Yifei Zhang May 7, 2018

2 Utilizing the PAM250 Matrix
Obtain the table Build a dictionary Modify the globalAlign Function

3 Gramineae, or the grasses family
Evolutionary History of the Grasses Elizabeth A. Kellogg Plant Physiology Mar 2001, 125 (3)  ; DOI:  /pp   Indica rice: long-grained, Harder after cooked Japonica rice: short-grained, Softer after cooked

4 Three proteins analyzed:
Granule-bound starch synthase, which is related to the stickiness of the seed after cooked GS3 protein/seed length and weight protein, regulates grain size.  Betaine aldehyde dehydrogenase/badh2/fragrance protein. An allele located on the gene is a major factor associated with aroma.  

5 Finding and processing data
Sample : Betaine aldehyde dehydrogenase [Zea mays L.] NCBI Reference Sequence: NP_ 506 mmasqamvplrqlfvdgewrppaqgrrlpvvnptteahigeipagtaedvdaavaaaraa lkrnrgrdwarapgavrakylraiaakvierkqelaklealdcgkpydeaawdmddvagc feyfadqaealdkrqnspvslpmetfkchlrrepigvvglitpwnypllmatwkvapala agcaavlkpselasvtcleladickevglppgvlnivtglgpdagaplsahpdvdkvaft gsfetgkkimaaaapmvkpvtlelggkspivvfddvdidkavewtlfgcfwtngqicsat srllvhtkiakefnekmvawaknikvsdpleegcrlgpvvsegqyekikkfilnaksega tiltggvrpahlekgffieptiitdittsmeiwreevfgpvlcvkefstedeaielandt qyglagavisgdrercqrlseeidagiiwvncsqpcfcqapwggnkrsgfgrelgeggid nylsvkqvteyisdepwgwyrspskl Remove spaces and number: def clean(s1): result = ''.join(i for i in s1 if not i.isdigit()) result = result.split() result = ''.join(result return result

6 Modify Global alignment
Define getPam function that builds a dictionary from the PAM 250 text table(white space eliminated) Ex. int(pam[string1[a-1]][string2[b-1]]) replaces match Same procedure as HW6.2 Initialize table and backtrack Fill in scores and directions From backtrack start reverse alignment Reverse sequence to get alignment Modify Global alignment

7 Results A= Zea mays L. B= Oryza sativa indica group
C= Oryza sativa japonica group 1= granule- bound starch synthase-stickiness 2=GS3-grain size 3=badh2-fragrance Scores for the second and third sequences are always higher than either one of them scoring with the first sequence : Two Oryza sativa cultivars are more closely related (As expected) Average indels for pair 1&2: 5 Average indels for pair 4&5: 》〉》〉》〉》 GS3 as the most different protein in the three Average indels for pair 7&8: 3

8 What comes after Analyze more species from the grass family and construct a simple phylogenetic tree using alignment results Dig into different proteins and find out more about the similarities across species.  Develop a simple version of BLAST for protein alignment, (applying it to multiple pairs of sequences at the same time).

9 End Thank you.


Download ppt "Aligning Grass Protein Sequences Using PAM-Modified Global Alignment"

Similar presentations


Ads by Google