Download presentation
Presentation is loading. Please wait.
Published byColin Thomas Modified over 9 years ago
1
1 Alexei Fedorov, Ph.D. Associate Professor Head of Bioinformatics Lab Department of Medicine Vice Director Program in Bioinformatics and Genomics/Proteomics Tel: (419) ‑ 383 ‑ 5270 Email: alexei.fedorov@utoledo.edu http://bpg.utoledo.edu/~afedorov/lab/
3
May 2011
4
4 Bioinformatics Lab in 2013-2014 PhD students Shuhao Qiu Masters students Ahmed Al-Khudair Current grants NSF Career Development 2007-2012 “Investigation of intron cellular roles”
5
5 MAJOR GOAL: Bioinformatics Investigation of the Human Genome
6
Education in Bioinformatics (TWO TYPES OF STUDENTS) Computer/math background gain experience in Biology (Sam, Andy) Biological background gain experience in programming (Dave, Maryam) Example of computational projects: Binary-absrtacted Markov models and their application to sequence classification http://etd.ohiolink.edu/view.cgi?acc_num=mco1271271172 http://etd.ohiolink.edu/view.cgi?acc_num=mco1271271172 http://bpg.utoledo.edu/~sshepard/defense/http://bpg.utoledo.edu/~sshepard/defense/ video
7
Genomic MRI http://bpg.utoledo.edu/gmri/ http://bpg.utoledo.edu/gmri/ http://www.jove.com/Details.php?ID=2663
8
Job perspectives (example: Ashwin Prakash) PhD – November 2011, HSC UT PhD research fellow -- from January 2011 Johns Hopkins School of Medicine Declined offers: Cold Spring Harbor Laboratory Baylor College of Medicine
9
The PI’s students received the following awards: Jason Bechtel, Outstanding MSBS student in 2008 at HSC UT. Theodor Rais, Second/Third Poster award by Ohio Bioinformatics Consortium, 2009. Samuel Shepard, Outstanding PhD student in 2010 at HSC UT. Lorraine Walters, Undergraduate Research Recognition Award, UT May 2012. Arnab Saha-Mandal, 1) Outstanding MSBS student in 2013 at HSC UT; and 2) Canadian Institute of Health Research fellowship support ($20,000). Jasmine Serpen, 1) Ohio Governor's Thomas Edison Award for Excellence in Biotechnology & Biomedical Technologies-1st place; and 2) OSERA Biomedical Research/Bioengineering Award-1st place (for high school students).
10
10 Program in Bioinformatics and Genomics/Proteomics (BPG) http://hsc.utoledo.edu/depts/bioinfo/ BPG offers a Certificate in association with the degrees of Doctor of Philosophy (Ph.D.) or Doctor of Medicine (M.D.). BPG also offers a Master of Science in Biomedical Sciences (MSBS).
11
11 Two courses in Spring semester: Application of Bioinformatics, Proteomics, and Genomics (BIPG 640) or “Advanced Bioinformatics” ( should be taken after “Fundamental Bioinformatics” of Dr. Trumbly) Introduction to Bioinformatic Computation (BIPG 610) The main goal of this course is to provide basic programming skills to biological and medical students who may lack a background in computer sciences. Programming will be specifically taught using important biological examples, focusing in particular on the PERL language. No programming skills are required!
12
12 In the “Introduction to Bioinformatic Computation” course, rather than doing “cookbook” lab exercises, students participate in real-world, challenging problems whose resolution advances the field of genome biology. In addition to learning programming and other bioinformatic skills the students of this course acquire knowledge in how to present the final product of bioinformatic research and how to write a scientific paper on the subject. In 2005 the class developed a program to identify novel genes for non-coding RNAs in humans and other mammals. This work resulted in publication of an article in Nucleic Acids Research 1, coauthored by the group of students who were actively working on this project. In 2006 course students created a novel public database (ASMD) and also a novel computational resource “Splicing Potential”. Ten students were co- authors in two manuscripts 2,3. In 2007 the class participated in the “Genomic MRI” project. Seven of these students are co-authors in BMC Genomics, 2008 4 2008 class continued “Genomic MRI” project. They performed whole genome comparisons for human, chimpanzee, and macaque and also analyzed distribution of 4 million SNPs inside and outside MRI regions. The results are in preparation for publication in Genome Research with 6 students among the authors.
13
Publications with IBC students 54. Prakash A., Shepard S., Mileyeva-Biebesheimer O., He J., Hart B., Chen M., Amarachiniha S., Bechtel J., Fedorov A. “Molecular forces shaping human genomic sequence at mid- range scales”, BMC Genomics 2009, 10:513. 53. Bechtel J.M., Wittenschlaeger T., Dwyer T., Song J., Arunachalam S., Ramakrishnan S.K., Shepard S., Fedorov A. Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures. BMC Genomics 2008, 9:284. 52. Bechtel J. M., Rajesh P., Ilikchyan I., Deng Y., Mishra P.K., Wang G., Wu X., Afonin K., Grose W., Wang Y., Khuder S., and Fedorov A. Calculation of Splicing Potential from the Alternative Splicing Mutation Database Research Notes 2008, 1:4. 51. Bechtel J. M., Rajesh P., Ilikchyan I., Deng Y., Mishra P.K., Wang G., Wu X., Afonin K., Grose W., Wang Y., Khuder S., and Fedorov A. The Alternative Splicing Mutation Database: a hub for investigations of alternative splicing using mutational evidence. Research Notes 2008, 1:3. 44. Fedorov A, Stombaugh J., Harr M.W., Yu S., Nasalean L., Shepelev V. Computer identification of snoRNA genes using a Mammalian Orthologous Intron Database. Nucl. Acids Res. 2005. 33, 4578-4583.
14
http://www.utoledo.edu/centers/brim/index.html
15
Bioinformatics COURSE: Bioinformatics of Biomarkers and Individualize Medicine, Spring 2012 Course time line: 14 Weeks No prerequisites, recommended: Introduction of bioinformatics and molecular biology Reserve materials: None Unit 1 Biomarker discovery and validation Unit 2 Individualized Medicine
16
16 Investigation of the human genome BASE COUNT 846302 a 578512 c 575805 g 843114 t 1703 others ORIGIN 1 gaattcaaaa aagaaagaca atgacttgta gctgaagcta tgatcaggaa aagatggggt 61 ggacggcatt tgagaaaatc aggacagtgg tgtacttatc aaataagaag atctgggcag 121 aagattgttg aaaaagcaga cacagcactg agtagcagca tggagcagaa aagcataagg 181 aacaagtagt gcagtgtgcc tgaacatagg atgggaaatt aggaaagata aatggaggct 241 gactgtggga agccttacat tccaggctta gtggaataag taaatattta aatctcatga 301 gttcttttct ctctgctttc tatttttcac gacctgaact cacctcccag tgaggagatg 361 tttccaccta gcactaaaca gtaactagtt cagactatat atttaaaaaa aaaaaaaaaa 421 aaaaaaaaaa gcagaacagc tcagatcatc cagtgaagtg gtgctactat tatactatta 481 acggggagat gaaagccaga taagatggag aagtaggaaa tttacgaaac attttaaaag 541 aaaatttatt tattcatcaa tatttacata aatgtttatt aattctaagt actatagtag 601 gcacccattt attactttca aaaattgaca atatacaagt taataaaatc atattagttt 661 cctcttctaa taaaattatc tcactcaaat tcatataact aaaaatacat ttaataaatt 721 ttatttttaa aatataggcc acttctactc tattcatttt tgcacttaac attctcttgc 781 tttcaaaaat gtatgaaaaa tttcagttta gtccccacca aatctcaatt tagaccccgg 841 ataaagagta aataaattaa agagctgtca gaattaaaac actactacag gtctccttca 901 ctttatggca tagatgaagg caggaaatac tggctgaaaa ttttgtttat gtcaaagatt 961 ttgatgatta ccatcagaga tctgatatct cagggaagaa aagcctttca tataccactt 1021 aaaaaattct gccaggcgcg gtggctcacg cctgtaatcc cagcactttg ggaggctgag 1081 gtgggcagat cacctgaggt cagaagttcg agaccagcct gaccaacatg gagaaaccct 1141 gtctctacta aaaatacaaa atcagccggg cgtggtggcg catgcctgta atcccagcta 1201 cttgggaggc tgaggcagga gaatcacttg aacccaggag gcagaggttg cggtgagccg 1261 agatcacacc attgcactcc agcctgggca acaagggcga aactctgtct caaaaaaaaa 1321 aaaacttctg gggaaatggt ggcctgcctt gtaacatcta tgtgtcttag agggccatgg 1381 tatgacaccc ttgggcagtc atttatagag tccttccctg accagggaat catcctgcca
17
17... after the first 50 pages.. 141601 cagcaccaaa tcctctcatt gcctttttaa aaaatgttgt ccaatttaac atcaagacac 141661 tgtccatgca atctgttgaa aaatctggct atttgcaaac aaagaaaaaa tgtatagcct 141721 cccacactat atatcaaaat aaacccaagt gtataaaaga gaaaatttta agtgaaacca 141781 aaacttgaaa atattgagat gaatattagt tagagctttg agtaggaaag gattttttga 141841 acagataacc aacagaggaa gtcagaaaac agtaatcatt tccttaatga aaatacaaaa 141901 cttaagtact tcaaaaaagt cattacaata cttaaaaacc ttacaacaat catgtggaaa 141961 gcatttatta caaataattc agaaaaagga tttatatccc taataactaa agaagtgagg 142021 aagaatgcta agatcacatt ttttaaaaag tagctaaagg ataatataaa tgactaacag 142081 acctgaggaa aaaagctaac ctcacaagta ttcaaccaaa taaaataacc tcgagatacc 142141 acttaaaaac ctatcgaaat aacgaagtgt ttggaaaatg acaagattca aaatctggta 142201 agagcagcat ttttccccat tgtggaggga gtgtgtaaat tggtgtggtc tttctgaaaa 142261 gcaattaggc aatcttgtat caaaaatctt caaagtgttc ttactctttg atgaagaatt 142321 ccacacgtgt gaatcctaaa acaattaaaa gtatgaacat atttttatgc acaaagatgt 142381 ttagccaaaa ggaaaacgac ctaaatgacg aatgatgtgc aactgcatgg ataaattgtt 142441 gtatatcaaa atgatgaaat attttgcagc tttgaaaagg taattttgaa aaaactttaa 142501 agacctcaaa aatgcccaaa atatattaat tgaaaaggat acaaaacttt attatttcac 142561 tacgtaatga aacagaatac agttgatcct tgaacaacgc tggtttgaac tgcactcgtc 142621 cacttacatt cagatttttt tctttttgct tttttttttt gagacgaagt ctcactctgt 142681 cacccaggct ggagggcagt ggcaccattc tggctcacta caacctgcgt ataccaggtt 142741 caagcaattc tcctgcctca gcctcccaag tagctggaat tacaggcgcc tgtcaccacg 142801 tccagctaat ttttgtattt ttagtagaga cggagtttca ccatgttggc caggctggtc 142861 tcgaactcct ggcctcaagt aatccacctg cctcagcctc ccaaagtgct gggattacag 142921 gcatcagccg ggtgcggtgg cttatgcctg caatcccatc ctggctaaca cggtgaaacc 142981 ctgtctctac taaaatacaa aaaattagct gagtgtggtg gcacatgcct atagttccag 143041 ctacttggga ggctgaggga tgagaattgc ttgaacctgg gaggcagagg ttgcagtgag 143101 ccgagatcac accactgtac tccagcctgg gcaacagagc aagactccat ctcaaaaaaa 143161 aaaaaaaaaa aaaaaagaaa aagaaaaaga aaaaggtatg ttatgaatgc agaaagtata 143221 tgttgatgct agtctattgt gtaatttacc accataaaat atacacaggt ctattataga 143281 agttaaaatg tatcaaaatg tatacacaaa cacttagaga tagtacatgg tatcattccc 143341 agttgagaaa aatgtaagca aacatgaaga tgcagtatta aatcataact gtataaaatt
18
18... after next 200 pages 683041 ggaggtgggg agcgcctctg cccagccgcc ccatctggga ggtggggagc gcctctgtcc 683101 agccaccaac ccatctggga agtgaggagc gcctctgcct ggccaccccg tctgggaagt 683161 gaggagcacc tctgccgggc tgccccgtct gggaagtgtt cccaacagct ctgaagagac 683221 agcgaccatc gagaatgggc catgatgacg atggtggttt tgtcgaaaag aaaaggggga 683281 aatgtgggga aaagaaagag agatcagatt gttactgtgt ctgtgtagaa agaagtagac 683341 ataggagact ccattttgtt ctgtactaag aaaaattctt ctgccttggg atgctgttaa 683401 tctataacct tacccccaaa cccctgctct ctgaaacatg tgctgtgtca actcagggtt 683461 aaatggatta agggcgatgc aagatgtgct ttgttaaaca gatgcttgaa gacagaaaaa 683521 aaaaaagaaa gagaaaaaaa aaatcattga aggattattt atgccctatg gcatcccttt 683581 ctccaacact tgtcacctaa tgaccaggga tcaataccca caaatacagt aagacctatt 683641 tttaaaggtt ttcagcttaa ctgttttgtc tcttaataaa tttttatata ggaaaaaaaa 683701 aagaatgttg aatattggcc cccactctct tctggcttgt agagtttctg cagagagatc 683761 cactgttagt ctgatggctt ccctttgtgg gtaacccagt ctttctttct gcccttaaca 683821 ttttttcctt catttcaacc atggtgaatc tgacaattat gtgtcttggt gttgctcttc 683881 tcaaggagta tctttgtggt gttctctgta tttcctgaat ttgaatattg gcctgtgtgg 683941 ataggttggg gaagttctcc tggataatat cctgaagagt gttttccaac ttggttccat 684001 tctcccagtc actttcaggt acaccaatca aatgtaggtt tggtcttttc acatagtccc 684061 atatttcttg gaggctttgc tcattccttt tcattctttt ttctctaatc ttgtcttcaa 684121 gctttatttc attaagttag tttatatttg actgtgcttt atacttgaca aagcactttc 684181 acatttcttg tcttttttgg gcctgataat tactctgcaa gttaaaaagg aaaaactcca 684241 agtaccatta cgctccgtga ggacagggac tattttgttc attgttgcaa cctaagcact 684301 taatatgttg cctggtccag agtagatact catatataaa tacttgctga ataaagggat 684361 gaatgggtgg gtggttagat gaatggaatt tgccttaatt ttcaagatgg attcaatttc 684421 caattccact tactggtgag aagccttgtc taagtcttta aaccttactt tcctcatcta 684481 taaaacagtg acaatgatat tgtttctgct accacaatgg aaaaaaggac agaattactt 684541 agtgtcatag tgatcaggaa taaagccagg gcttgaagca tctcctgatt cctagggcat 684601 tgtttgtccc aatgtatatg gcagagggag aaagaaaacc gttgagtctt aatctgtcag 684661 gcactatttt atgaacttta aaatcctcat agcagggcca ggtgcagtgg ctcacacctg 684721 taatcccagc actttgggag gccaaggcag gcagatcact tgaggtcagg accagcctgt 684781 ccaacgtggt gaaaccacat ctctactaaa aatacaaaaa ttagccaggc gtggtggtgc 684841 atgcctataa tcccagctac ttgggaggct gaggcaggag aaatgcttga acctgggagg 684901 cagaggttgt ggtgagctga gattgtgcca ctgtactcca gcctgggcaa cagaacaaga
19
19 Human chromosome 1 4,814,628 lines = =100,000 pages = 100 books (1000 pages each)
21
Nature 2012, Sept 6 th, v.489, p 46
22
Lab 2013
23
T HE 1000 G ENOME P ROJECT A GUIDE TO YOUR ANCESTRY The pattern of the human genetic variations believed to be a key to reveal much about the human population history and diversity. The 1000 Genome project has sequences 1092 genome from different populations and by identifying the sequence that correspond to LWK, GBR, JPT and FIN, we are aiming to learn more about the population genetic patterns and to get a picture of the genetic diversity existed within the mentioned populations. The 1000 genome project effort to catalogue the human genetic variation is utilized in this project to calculate and compare these genetic differences between 14 populations. I am presenting the results that our bioinformatics lab’s team obtained so far and working on having it put in a paper. Using Perl programming to compute the differences between each two individual’s genomes from the 1000 Genome project for the 14 populations ASW HapMap African ancestry individuals from SW US CEU CEPH individuals CHB (CHB) Han Chinese in Beijing CHS (CHB) Han Chinese South CLM Colombian in Medellin, Colombia FIN HapMap Finnish individuals from Finland GBR British individuals from England and Scotland (GBR) IBS Iberian populations in Spain JPT JPT Japanese individuals LWK (LWK) Luhya individuals MXL HapMap Mexican individuals from LA California PUR Puerto Rican in Puerto Rico TSI Toscan individuals YRI (YRI) Yoruba individuals
24
The Graph above illustrates the distribution of the genetic differences among the 14 populations. The X axis shows the range in the number of differences (2.7 million – 5.5 million). The Y axis represents the number of pairs (two individuals compared by calculating the number of genetic differences between their genomes).
25
Figure 2: The Graph below showing the 14 populations consisting 4 distinct origins and lets call them 4 ancestries. 1_African, 2_Hybrid, 3_European, 4Asian. 4 3 2 1
26
Figure 3: The three populations that have African origin, they total differences distributed close to each other. The LWK population(Luhya individuals ) showd some individual who had almost half (2.7 million – 4.8 million) the number of differences, almost all of these have been declared as siblings and relatives. Some of them are not declared to be relatives by the 100 Genome project so our results suggest that they might be some undeclared relatives in the 100 genome project.
27
We further examined some populations for any declared relationships between any of these individuals; the relatives showed that they have the minimum difference in their genetic variation. For example, In the LWK population as showing in the table below, the relatives fall at the top of the list when we sorted the total differences from lowest to highest. The green highlighted cells showing that these individuals are related to each other as been declared by the 1000 genome appendix, The ones that are not highlighted we suggest that they are somehow relatives but they haven’t been declared by the 1000 genome project. ID1_L WK ID2_LW K Total_LWK differences 1 NA193 74 NA1937 3 2756691Siblings 2 NA193 52 NA1934 7 2777456Siblings 3 NA194 70 NA1944 3 2848500Aunt/Uncle 4 NA193 97 NA1939 6 2871776Siblings 5 NA194 44 NA1943 4 3004459Siblings 6 NA193 34 NA1933 1 3007478? 7 NA193 82 NA1938 1 3070661uncertain parent/child relationship 8 NA194 53 NA1944 5 3077137 ? 9 NA194 70 NA1946 9 3111728Niece/Nephew 1010 NA193 31 NA1931 3 3119208 ? 1 NA193 82 NA1938 0 3970915Half Siblings 1212 NA194 53 NA1944 4 4106949 ? 1313 NA193 34 NA1931 3 4178970Unknown relation 1414 NA194 69 NA1944 3 4236592Niece/Nephew
28
Figure 4: CLM, PUR and MXL populations, they show a very wide distribution ranged from 3.1-4.86. what our results indicate that these population have wide range of mixed blood. The PUR population have a second peak showing on the right side (range between 4.74-4.9 million), we expect that these individuals having different blood. More investigation on these people being conducted to know where do they have blood from.
29
Figure 5: Populations from FIN, GBR, TSI, CEU and IBS. All these population fall under European origin. The IBS population show as a really low curve because only 13 person have been sequenced from this population.
30
Figure 6: The population from Asian origin showed how they are close in their blood by having really close shape of distribution that ranged between 3.4 million- 3.69 million.
31
We are more investigating the highest differences pairs (the highest differences between pairs of individuals) that we suggest that they possibly have a different origin. We investigated the highest 40 pairs in some population and we found that some individuals showed high difference with other individual and that were significantly repeated. Example in the figure below
32
The list below is the CLM individuals that showed the highest genetic differences with each other and when we looked at them individually we noticed that some of them have been repeated significantly more than others as it shows in the right side list of repeats. We see that HG01551 and HG01342 has been repeated as highest difference for 20 times while others were repeated 2and 3 times. So we decided to investigate the possibility of these individuals having other origin. HG015514479513HG01136 HG013654480834HG01342 HG013424481529HG01250 HG015514481637HG01250 HG015514483529HG01375 HG015514485279HG01125 HG014884487693HG01342 HG013664488647HG01342 HG015514490996HG01259 HG013424493212HG01271 HG013424493218HG01277 HG013774494064HG01342 HG014624494414HG01390 HG015514496682HG01365 HG014614497146HG01342 HG013424498051HG01125 HG015514499694HG01148 HG015514499713HG01345 HG013754500523HG01342 HG015514501432HG01134 HG015514503181HG01495 HG013894506393HG01342 HG013424508562HG01148 HG015514510222HG01377 HG013424514486HG01134 HG015514519187HG01389 HG013424520380HG01124 HG014404527415HG01342 HG013424533004HG01275 HG013424535490HG01272 HG015514537772HG01272 HG015514541901HG01488 HG015514542804HG01461 HG015514558088HG01462 HG015514561600HG01275 HG013904562418HG01342 HG014624564478HG01342 HG015514577349HG01440 HG015514608288HG01390 HG015514678948HG01342
33
The idea was to take those repeated high difference individuals with 10 other controls from the same population that showed average number of genetic difference within the same population, we then randomly took individuals from other populations and calculated the genetic differences between our 10 control +2 high repeats and the 1 control from the other populations. The comparison below was between 10 controls from CLM plus the 2 high repeated high genetic difference (HG01551 and HG01342 ), against one control individual from YRI population(Yoruba individuals ) “African Ancestry “. HG01551 and HG01342 had the lowest difference indicating that these two persons might be from African origin.
34
We more compared CLM controls with individual from African population(LWK) and another individual from Asian(CHS). The two control individuals showed lowest genetic difference against LWK control while showed highest difference when against CHS individual. This suggest that our two individuals from CLM population are originally belong to an African origin. CLM - LWK CLM - CHS
35
Conclusions Total variants showed substantial geographic differentiation, Total number of differences determines diverse populations that are more geographically and ancestrally remote. populations are grouped by the predominant component of ancestry: Europe (CEU, TSI, GBR, FIN and IBS), Africa (YRI, LWK and ASW), East Asia (CHB, JPT and CHS) and the Americas (MXL, CLM and PUR). Relatives within the same population have significantly less number of genotype variations “almost half the number” comparing to the non relatives. The study of human genetic variation has evolutionary significance. It can help to understand ancient human population migrations as well as how different human groups are biologically related to one another.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.