Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Bioinformatics. 2 Books: S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics S.B Primrose, RM. Twyman, Principles of Genome analysis.

Similar presentations


Presentation on theme: "1 Bioinformatics. 2 Books: S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics S.B Primrose, RM. Twyman, Principles of Genome analysis."— Presentation transcript:

1 1 Bioinformatics

2 2 Books: S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics A. Lesk, Introduction to Bioinformatics A. Lesk, Introduction to Bioinformatics A.M. Campbell, L. J. Heyer, Discovering Genomics, Proteomics, & Bioinformatics A.M. Campbell, L. J. Heyer, Discovering Genomics, Proteomics, & Bioinformatics J. Claverie, C. Notredame, Bioinformatics for Dummies J. Claverie, C. Notredame, Bioinformatics for Dummies –http://www.dummies.com/WileyCDA/DummiesTitle/productC d-0764516965.html

3 3 What Is Bioinformatics? “Bioinformatics is a new subject of genetic data collection, analysis and dissemination to the research community.” Hwa A. Lim (1987) “Bioinformatics is a new subject of genetic data collection, analysis and dissemination to the research community.” Hwa A. Lim (1987) “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data,including those to acquire, store, organize, archive, analyze, or visualize such data.” NIH working definition (2000) “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data,including those to acquire, store, organize, archive, analyze, or visualize such data.” NIH working definition (2000)

4 4 What is Bioinformatics? Informatics Computer Science Computer Engineering Information Science Biology & Other Natural Sciences Mathematics & Statistics Bioinformatics

5 5 Bioinformatics Related Fields Computational biology Computational biology Computational molecular biology Computational molecular biology Biomolecular informatics Biomolecular informatics Computational genomics Computational genomics …

6 6 Biological Data Genomes Genomes –DNA Sequences of A, T, C, G –Annotated with function, “interesting” features Proteins Proteins –Amino Acid Sequences Sequences of 20 letters Sequences of 20 letters –Annotated with structure, function, etc.

7 7 Biological Data Gene Expression Gene Expression –Dynamic behavior of genes Protein Expression Protein Expression –Dynamic behavior of proteins Structural Features Structural Features –RNA and proteins …

8 8 Biological Data Sus scrofa agouti-related protein gene 1 ggcacattct cctgttgagc caggctatgc tgaccacaat gttgctgagc tgtgccctac 1 ggcacattct cctgttgagc caggctatgc tgaccacaat gttgctgagc tgtgccctac 61 tgctggcaat gcccaccatg ctgggggccc agataggctt ggcccccctg gagggtatcg 61 tgctggcaat gcccaccatg ctgggggccc agataggctt ggcccccctg gagggtatcg 121 gaaggcttga ccaagccttg ttcccagaac tccaaggtca gtgcgggcag gagtgggttg 121 gaaggcttga ccaagccttg ttcccagaac tccaaggtca gtgcgggcag gagtgggttg 181 ggtggggctt ggacatcctc tggccacaaa gtattctgct tgtatgagcc ctttcttccc 181 ggtggggctt ggacatcctc tggccacaaa gtattctgct tgtatgagcc ctttcttccc 241 cttcccaatc ccaggcctgg gaggtgggtg ttttgtgcat gggtggttct gccctcacat 241 cttcccaatc ccaggcctgg gaggtgggtg ttttgtgcat gggtggttct gccctcacat 301 catctgtccc agatctaggc ctgcagcccc cactgaagag gacaactgca gaacgggcag 301 catctgtccc agatctaggc ctgcagcccc cactgaagag gacaactgca gaacgggcag 361 aagaggctct gctgcagcag gccgaggcca aggccttggc agaggtaaca gctcagggaa 361 aagaggctct gctgcagcag gccgaggcca aggccttggc agaggtaaca gctcagggaa 421 agggctgagg ccacaagtct tgagtgggtg tgtcaagcat caacctctat ctgtgcttgg 421 agggctgagg ccacaagtct tgagtgggtg tgtcaagcat caacctctat ctgtgcttgg 481 agttgccact gtggtacaac gggattggcg gtgtcttggg agcgctggga cgtggtttca 481 agttgccact gtggtacaac gggattggcg gtgtcttggg agcgctggga cgtggtttca 541 tccccggcca gcacaagtgg gttaaggatc tggccttgcc atcccttcag cttaggctga 541 tccccggcca gcacaagtgg gttaaggatc tggccttgcc atcccttcag cttaggctga 601 gactgtggct tggagctgat ctctgaccgg aagctccata tgctctgggg tgaccaaaaa 601 gactgtggct tggagctgat ctctgaccgg aagctccata tgctctgggg tgaccaaaaa 661 tggaaaaaca aacatacaaa acacctctac ctgcacttcc tgaccccctc acccggggcg 661 tggaaaaaca aacatacaaa acacctctac ctgcacttcc tgaccccctc acccggggcg 721 acactgcaga ccatcccgtt cacgctccac ttccatcctg ccttgatctg gcgcattcca 721 acactgcaga ccatcccgtt cacgctccac ttccatcctg ccttgatctg gcgcattcca 781 tgaatgtgct tttggaagtc cttgtttccc aacccttgta ggtgctagat cctgaaggac 781 tgaatgtgct tttggaagtc cttgtttccc aacccttgta ggtgctagat cctgaaggac 841 gcaaggcacg ctccccacgt cgctgcgtaa ggctgcacga atcctgtctg ggacaccagg 841 gcaaggcacg ctccccacgt cgctgcgtaa ggctgcacga atcctgtctg ggacaccagg 901 taccatgctg cgacccatgt gctacatgct actgccgttt cttcaacgcc ttctgctact 901 taccatgctg cgacccatgt gctacatgct actgccgttt cttcaacgcc ttctgctact 961 gccgcaagct gggtactgcc acgaacccct gcagccgcac ctagctggcc agccaatgtc 961 gccgcaagct gggtactgcc acgaacccct gcagccgcac ctagctggcc agccaatgtc 1021 gtcg

9 9 Genome Sizes Species Genome Size Bacteriophage MS2 3569 bp Esherichia coli 4.7 million bp Human 3.3 billion bp

10 10 Database Growth

11 11 Database Growth

12 12 Database Growth

13 13 Database Growth Exponential growth in sequence data Exponential growth in sequence data Not much growth in sequence size Not much growth in sequence size Expect exponential growth in annotation information Expect exponential growth in annotation information We have lots of data, but it’s difficult to make sense of it. We have lots of data, but it’s difficult to make sense of it.

14 14 Laser Dye Based Sequencing

15 15 Four-Color Sequencing

16 16 Automated Trace Analysis

17 17 Automated Base Calling

18 18 A Biology Lab?

19 19 Complete Genome Sequences 1995: shotgun sequencing of H. influenzae, 1.8 Mb; M. genitalium 0.6 Mb. 1995: shotgun sequencing of H. influenzae, 1.8 Mb; M. genitalium 0.6 Mb. 1996: S. cerevisiae, 13 Mb. 1996: S. cerevisiae, 13 Mb. 1998: C. elegans, 100 Mb. 1998: C. elegans, 100 Mb. 2000: D. melanogaster, 120 Mb 2000: D. melanogaster, 120 Mb 2001: human (3 Gb); >100 complete genome sequences, mostly microbial. 2001: human (3 Gb); >100 complete genome sequences, mostly microbial. 2002: mouse 2002: mouse 2003: pufferfish, D. pseudoobscura 2003: pufferfish, D. pseudoobscura 2004: C. briggsae, rat, chimp, chicken; many more coming 2004: C. briggsae, rat, chimp, chicken; many more coming

20 20 Human Genome Sequencing

21 21

22 22

23 23 Fundamental Problems in Bioinformatics Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment Phylogenetic Analysis Phylogenetic Analysis Sequence Based Database Searches Sequence Based Database Searches Gene Prediction Gene Prediction Structure Prediction (RNA and Protein) Structure Prediction (RNA and Protein) Protein Classification Protein Classification Gene Expression Gene Expression Genetic nets Genetic nets

24 24 Pairwise Sequence Alignment Given two DNA or AA sequences, find the best way to “line them up” Given two DNA or AA sequences, find the best way to “line them up” –Biology allows for variation –Gaps, mismatches, etc.. HEAGAWGHEE PAWHEAE HEAGAWGHE-E P-A--W-HEAE HEAGAWGHE-E --P-AW-HEAE

25 25 Multiple Sequence Alignment Extend pairwise problem to multiple sequences Extend pairwise problem to multiple sequences

26 26 Phylogenetic Analysis Study relationships between organisms Study relationships between organisms –Characteristic similarity –Sequence similarity –Whole genome comparison –…

27 27 Phylogenetic Analysis

28 28 Sequence Based Database Searches Keyword Keyword –Find all sequences named “cytochrome c” Sequence Sequence –Find all sequences similar to HEAGAWGHEE –Remember, there are gigabytes to search, and I’m not about to wait two days for an answer! BLAST, FASTA, … BLAST, FASTA, …

29 29 Gene Prediction Does the following sequence contain a gene? Does the following sequence contain a gene? How many introns? Exons? Promoters? Other features? How many introns? Exons? Promoters? Other features? TTGTAATCTCCTCTGTGACTATAATGACTAGTCTCAGGCCTGCCTTCCCCAGAAACCTCTCTTTTGGCTATTTCTCTTTC TAGTTCTCTGTTTAAACAAAATTTATTCTATATATCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATC TATCTATCTATCTATCATCTACTTATCATCTGTCTAGCCATTTGAAGCATCTTTGTGTTTTAGGTCCTGTTAGATTCTCC TTTCAGCCAGTGGAGGATCTGGACAGAGCTATTTCTTAGCTTCCCCTAAGCCATGTTGTTAGAACGAATCCCCCACACCT CCTCTGAGTGCTACGTCTCCGTCAAGAATTATGTATGTGGGATCCAGATGGCCCAGTGGATAAAACTGCAAGTGTCATGA CCATGACCTGACTTCAAGGGATTGTGTAGAAAGGGAGTTATCACAGTGTGAGGGACAGGGCTAAGGACACTAACCCGTAT GTTGAGGGGCACAGACGCTAGCAACAACAGTGAAGTGTTTAAAAAGGCAAAAATCATGTTTCTAGAAGTCAGGAAGAGCC TAACTTGTGGACAAGGACCAACAGGCAGCAGTTGTAATGGGGCAGGGCAGAGGGAGAGCGGACACGCAGCTTTTGGCATC AAACACACCCAGAGTGTGGATAGAGAGTAGGGAAATACTCTAGTCTCTGGCTAGGATACTCCCCTCTCTTTTTGACATTT CTCATTGGCAGCCCCAAGTGGTCACTGGAGAGCCAGGAAGCCTAAAGGACACAGTTAGTAGCAGCCAGCTCCTTTGGTGG AATTTTGGGGACATGGTGGGGTGACTTGGCTCTATCCAGGCCAGGGCTGGGTGTGAGTATACACTTAGTGACTGGCCTTC

30 30 Gene Prediction

31 31 Structure Prediction (RNA, Protein) From sequence, predict 2 and 3D structures. From sequence, predict 2 and 3D structures.

32 32 Protein Classification From sequence, identify characteristics of a protein From sequence, identify characteristics of a protein –Active sites –Families (e.g. globin) –Blocks –Domains –Folds –Motifs –Etc.

33 33 Gene Expression Study of gene activity under experimental conditions Study of gene activity under experimental conditions –Large scale studies with microarrays

34 34 Фрагмент одной из карт метаболических путей. Современная биология стала источником огромных объемов экспериментальной информации, осмысливание которых невозможно без использования эффективных информационных технологий и методов математического моделирования

35

36 36

37 37 IC&G SB RAS, Novosibirsk, Russia, BGRS-2002 МЕТАБОЛИЧЕСКИЕ ПУТИ – ОБЯЗАТЕЛЬНЫЕ ЭЛЕМЕНТЫ ГЕННЫХ СЕТЕЙ. А дипоцит: мевалонатный путь биосинтеза холестерина в клетке.

38 38 Интеграция генных сетей при противовоспалительном ответе Цитокины Антиоксида нтная защита Арест клеточного цикла Воспаление Метаболизм железа Ответ на тепловой шок Апоптоз Активные формы кислорода Интеграционный междисциплинарный проект СО РАН по системной компьютерной биологии

39 39 1882 процессов Регуляторная компонента (управление метаболизмом) Соотношение метаболической и регуляторной компонент цикла трикарбоновых кислот E. Coli K-12: Исполняющая компонента (метаболизм) 139 процессов - ПРОЦЕСС - участие в процессе с ненулевой стехиометрией - участие в процессе с нулевой стехиометрией Полный граф метаболической компоненты E. COLI K-12: 3973 процесса Нижние оценки сложности модели (без детального учета этапов матричного биосинтеза): ~ 60 000 – 100 000 процессов Более детальная модель: ~ 1 000 000 процессов Портретная модель: не менее 10 000 000 процессов Интеграционный междисциплинарный проект СО РАН по системной компьютерной биологии

40 40

41 41

42 42 Different perspectives on Bioinformatics Bioinformatics is a tool Bioinformatics is a tool –Biologists, biochemists, medical professionals, etc. –Obtain meaningful and understandable results Bioinformatics is a discipline Bioinformatics is a discipline –Informaticians, mathematicians, statisticians, etc. –Generate meaningful and understandable results

43 43 Summary Bioinformatics is truly interdisciplinary Bioinformatics is truly interdisciplinary –Biology (natural sciences), informatics, mathematics & statistics Databases Databases –Large, semistructured, incomplete, inaccurate Wide-range of problems Wide-range of problems –Solutions employ knowledge from sciences with algorithms and models from informatics, mathematics, and statistics


Download ppt "1 Bioinformatics. 2 Books: S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics S.B Primrose, RM. Twyman, Principles of Genome analysis."

Similar presentations


Ads by Google