Download presentation
Presentation is loading. Please wait.
Published byCharles Jackson Modified over 8 years ago
1
1 Bioinformatics
2
2 Books: S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics A. Lesk, Introduction to Bioinformatics A. Lesk, Introduction to Bioinformatics A.M. Campbell, L. J. Heyer, Discovering Genomics, Proteomics, & Bioinformatics A.M. Campbell, L. J. Heyer, Discovering Genomics, Proteomics, & Bioinformatics J. Claverie, C. Notredame, Bioinformatics for Dummies J. Claverie, C. Notredame, Bioinformatics for Dummies –http://www.dummies.com/WileyCDA/DummiesTitle/productC d-0764516965.html
3
3 What Is Bioinformatics? “Bioinformatics is a new subject of genetic data collection, analysis and dissemination to the research community.” Hwa A. Lim (1987) “Bioinformatics is a new subject of genetic data collection, analysis and dissemination to the research community.” Hwa A. Lim (1987) “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data,including those to acquire, store, organize, archive, analyze, or visualize such data.” NIH working definition (2000) “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data,including those to acquire, store, organize, archive, analyze, or visualize such data.” NIH working definition (2000)
4
4 What is Bioinformatics? Informatics Computer Science Computer Engineering Information Science Biology & Other Natural Sciences Mathematics & Statistics Bioinformatics
5
5 Bioinformatics Related Fields Computational biology Computational biology Computational molecular biology Computational molecular biology Biomolecular informatics Biomolecular informatics Computational genomics Computational genomics …
6
6 Biological Data Genomes Genomes –DNA Sequences of A, T, C, G –Annotated with function, “interesting” features Proteins Proteins –Amino Acid Sequences Sequences of 20 letters Sequences of 20 letters –Annotated with structure, function, etc.
7
7 Biological Data Gene Expression Gene Expression –Dynamic behavior of genes Protein Expression Protein Expression –Dynamic behavior of proteins Structural Features Structural Features –RNA and proteins …
8
8 Biological Data Sus scrofa agouti-related protein gene 1 ggcacattct cctgttgagc caggctatgc tgaccacaat gttgctgagc tgtgccctac 1 ggcacattct cctgttgagc caggctatgc tgaccacaat gttgctgagc tgtgccctac 61 tgctggcaat gcccaccatg ctgggggccc agataggctt ggcccccctg gagggtatcg 61 tgctggcaat gcccaccatg ctgggggccc agataggctt ggcccccctg gagggtatcg 121 gaaggcttga ccaagccttg ttcccagaac tccaaggtca gtgcgggcag gagtgggttg 121 gaaggcttga ccaagccttg ttcccagaac tccaaggtca gtgcgggcag gagtgggttg 181 ggtggggctt ggacatcctc tggccacaaa gtattctgct tgtatgagcc ctttcttccc 181 ggtggggctt ggacatcctc tggccacaaa gtattctgct tgtatgagcc ctttcttccc 241 cttcccaatc ccaggcctgg gaggtgggtg ttttgtgcat gggtggttct gccctcacat 241 cttcccaatc ccaggcctgg gaggtgggtg ttttgtgcat gggtggttct gccctcacat 301 catctgtccc agatctaggc ctgcagcccc cactgaagag gacaactgca gaacgggcag 301 catctgtccc agatctaggc ctgcagcccc cactgaagag gacaactgca gaacgggcag 361 aagaggctct gctgcagcag gccgaggcca aggccttggc agaggtaaca gctcagggaa 361 aagaggctct gctgcagcag gccgaggcca aggccttggc agaggtaaca gctcagggaa 421 agggctgagg ccacaagtct tgagtgggtg tgtcaagcat caacctctat ctgtgcttgg 421 agggctgagg ccacaagtct tgagtgggtg tgtcaagcat caacctctat ctgtgcttgg 481 agttgccact gtggtacaac gggattggcg gtgtcttggg agcgctggga cgtggtttca 481 agttgccact gtggtacaac gggattggcg gtgtcttggg agcgctggga cgtggtttca 541 tccccggcca gcacaagtgg gttaaggatc tggccttgcc atcccttcag cttaggctga 541 tccccggcca gcacaagtgg gttaaggatc tggccttgcc atcccttcag cttaggctga 601 gactgtggct tggagctgat ctctgaccgg aagctccata tgctctgggg tgaccaaaaa 601 gactgtggct tggagctgat ctctgaccgg aagctccata tgctctgggg tgaccaaaaa 661 tggaaaaaca aacatacaaa acacctctac ctgcacttcc tgaccccctc acccggggcg 661 tggaaaaaca aacatacaaa acacctctac ctgcacttcc tgaccccctc acccggggcg 721 acactgcaga ccatcccgtt cacgctccac ttccatcctg ccttgatctg gcgcattcca 721 acactgcaga ccatcccgtt cacgctccac ttccatcctg ccttgatctg gcgcattcca 781 tgaatgtgct tttggaagtc cttgtttccc aacccttgta ggtgctagat cctgaaggac 781 tgaatgtgct tttggaagtc cttgtttccc aacccttgta ggtgctagat cctgaaggac 841 gcaaggcacg ctccccacgt cgctgcgtaa ggctgcacga atcctgtctg ggacaccagg 841 gcaaggcacg ctccccacgt cgctgcgtaa ggctgcacga atcctgtctg ggacaccagg 901 taccatgctg cgacccatgt gctacatgct actgccgttt cttcaacgcc ttctgctact 901 taccatgctg cgacccatgt gctacatgct actgccgttt cttcaacgcc ttctgctact 961 gccgcaagct gggtactgcc acgaacccct gcagccgcac ctagctggcc agccaatgtc 961 gccgcaagct gggtactgcc acgaacccct gcagccgcac ctagctggcc agccaatgtc 1021 gtcg
9
9 Genome Sizes Species Genome Size Bacteriophage MS2 3569 bp Esherichia coli 4.7 million bp Human 3.3 billion bp
10
10 Database Growth
11
11 Database Growth
12
12 Database Growth
13
13 Database Growth Exponential growth in sequence data Exponential growth in sequence data Not much growth in sequence size Not much growth in sequence size Expect exponential growth in annotation information Expect exponential growth in annotation information We have lots of data, but it’s difficult to make sense of it. We have lots of data, but it’s difficult to make sense of it.
14
14 Laser Dye Based Sequencing
15
15 Four-Color Sequencing
16
16 Automated Trace Analysis
17
17 Automated Base Calling
18
18 A Biology Lab?
19
19 Complete Genome Sequences 1995: shotgun sequencing of H. influenzae, 1.8 Mb; M. genitalium 0.6 Mb. 1995: shotgun sequencing of H. influenzae, 1.8 Mb; M. genitalium 0.6 Mb. 1996: S. cerevisiae, 13 Mb. 1996: S. cerevisiae, 13 Mb. 1998: C. elegans, 100 Mb. 1998: C. elegans, 100 Mb. 2000: D. melanogaster, 120 Mb 2000: D. melanogaster, 120 Mb 2001: human (3 Gb); >100 complete genome sequences, mostly microbial. 2001: human (3 Gb); >100 complete genome sequences, mostly microbial. 2002: mouse 2002: mouse 2003: pufferfish, D. pseudoobscura 2003: pufferfish, D. pseudoobscura 2004: C. briggsae, rat, chimp, chicken; many more coming 2004: C. briggsae, rat, chimp, chicken; many more coming
20
20 Human Genome Sequencing
21
21
22
22
23
23 Fundamental Problems in Bioinformatics Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment Phylogenetic Analysis Phylogenetic Analysis Sequence Based Database Searches Sequence Based Database Searches Gene Prediction Gene Prediction Structure Prediction (RNA and Protein) Structure Prediction (RNA and Protein) Protein Classification Protein Classification Gene Expression Gene Expression Genetic nets Genetic nets
24
24 Pairwise Sequence Alignment Given two DNA or AA sequences, find the best way to “line them up” Given two DNA or AA sequences, find the best way to “line them up” –Biology allows for variation –Gaps, mismatches, etc.. HEAGAWGHEE PAWHEAE HEAGAWGHE-E P-A--W-HEAE HEAGAWGHE-E --P-AW-HEAE
25
25 Multiple Sequence Alignment Extend pairwise problem to multiple sequences Extend pairwise problem to multiple sequences
26
26 Phylogenetic Analysis Study relationships between organisms Study relationships between organisms –Characteristic similarity –Sequence similarity –Whole genome comparison –…
27
27 Phylogenetic Analysis
28
28 Sequence Based Database Searches Keyword Keyword –Find all sequences named “cytochrome c” Sequence Sequence –Find all sequences similar to HEAGAWGHEE –Remember, there are gigabytes to search, and I’m not about to wait two days for an answer! BLAST, FASTA, … BLAST, FASTA, …
29
29 Gene Prediction Does the following sequence contain a gene? Does the following sequence contain a gene? How many introns? Exons? Promoters? Other features? How many introns? Exons? Promoters? Other features? TTGTAATCTCCTCTGTGACTATAATGACTAGTCTCAGGCCTGCCTTCCCCAGAAACCTCTCTTTTGGCTATTTCTCTTTC TAGTTCTCTGTTTAAACAAAATTTATTCTATATATCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATC TATCTATCTATCTATCATCTACTTATCATCTGTCTAGCCATTTGAAGCATCTTTGTGTTTTAGGTCCTGTTAGATTCTCC TTTCAGCCAGTGGAGGATCTGGACAGAGCTATTTCTTAGCTTCCCCTAAGCCATGTTGTTAGAACGAATCCCCCACACCT CCTCTGAGTGCTACGTCTCCGTCAAGAATTATGTATGTGGGATCCAGATGGCCCAGTGGATAAAACTGCAAGTGTCATGA CCATGACCTGACTTCAAGGGATTGTGTAGAAAGGGAGTTATCACAGTGTGAGGGACAGGGCTAAGGACACTAACCCGTAT GTTGAGGGGCACAGACGCTAGCAACAACAGTGAAGTGTTTAAAAAGGCAAAAATCATGTTTCTAGAAGTCAGGAAGAGCC TAACTTGTGGACAAGGACCAACAGGCAGCAGTTGTAATGGGGCAGGGCAGAGGGAGAGCGGACACGCAGCTTTTGGCATC AAACACACCCAGAGTGTGGATAGAGAGTAGGGAAATACTCTAGTCTCTGGCTAGGATACTCCCCTCTCTTTTTGACATTT CTCATTGGCAGCCCCAAGTGGTCACTGGAGAGCCAGGAAGCCTAAAGGACACAGTTAGTAGCAGCCAGCTCCTTTGGTGG AATTTTGGGGACATGGTGGGGTGACTTGGCTCTATCCAGGCCAGGGCTGGGTGTGAGTATACACTTAGTGACTGGCCTTC
30
30 Gene Prediction
31
31 Structure Prediction (RNA, Protein) From sequence, predict 2 and 3D structures. From sequence, predict 2 and 3D structures.
32
32 Protein Classification From sequence, identify characteristics of a protein From sequence, identify characteristics of a protein –Active sites –Families (e.g. globin) –Blocks –Domains –Folds –Motifs –Etc.
33
33 Gene Expression Study of gene activity under experimental conditions Study of gene activity under experimental conditions –Large scale studies with microarrays
34
34 Фрагмент одной из карт метаболических путей. Современная биология стала источником огромных объемов экспериментальной информации, осмысливание которых невозможно без использования эффективных информационных технологий и методов математического моделирования
36
36
37
37 IC&G SB RAS, Novosibirsk, Russia, BGRS-2002 МЕТАБОЛИЧЕСКИЕ ПУТИ – ОБЯЗАТЕЛЬНЫЕ ЭЛЕМЕНТЫ ГЕННЫХ СЕТЕЙ. А дипоцит: мевалонатный путь биосинтеза холестерина в клетке.
38
38 Интеграция генных сетей при противовоспалительном ответе Цитокины Антиоксида нтная защита Арест клеточного цикла Воспаление Метаболизм железа Ответ на тепловой шок Апоптоз Активные формы кислорода Интеграционный междисциплинарный проект СО РАН по системной компьютерной биологии
39
39 1882 процессов Регуляторная компонента (управление метаболизмом) Соотношение метаболической и регуляторной компонент цикла трикарбоновых кислот E. Coli K-12: Исполняющая компонента (метаболизм) 139 процессов - ПРОЦЕСС - участие в процессе с ненулевой стехиометрией - участие в процессе с нулевой стехиометрией Полный граф метаболической компоненты E. COLI K-12: 3973 процесса Нижние оценки сложности модели (без детального учета этапов матричного биосинтеза): ~ 60 000 – 100 000 процессов Более детальная модель: ~ 1 000 000 процессов Портретная модель: не менее 10 000 000 процессов Интеграционный междисциплинарный проект СО РАН по системной компьютерной биологии
40
40
41
41
42
42 Different perspectives on Bioinformatics Bioinformatics is a tool Bioinformatics is a tool –Biologists, biochemists, medical professionals, etc. –Obtain meaningful and understandable results Bioinformatics is a discipline Bioinformatics is a discipline –Informaticians, mathematicians, statisticians, etc. –Generate meaningful and understandable results
43
43 Summary Bioinformatics is truly interdisciplinary Bioinformatics is truly interdisciplinary –Biology (natural sciences), informatics, mathematics & statistics Databases Databases –Large, semistructured, incomplete, inaccurate Wide-range of problems Wide-range of problems –Solutions employ knowledge from sciences with algorithms and models from informatics, mathematics, and statistics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.