1 Bioinformatics. 2 Books: S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics S.B Primrose, RM. Twyman, Principles of Genome analysis.

Slides:



Advertisements
Similar presentations
LESSON 1: What is Genetic Research? PowerPoint slides to accompany Using Bioinformatics : Genetic Research.
Advertisements

BIOINFORMATICS Ency Lee.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Bioinformatics “half a year in the lab can easily save you an afternoon in front of the computer….” (unknown)
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
On line (DNA and amino acid) Sequence Information
A number of slides taken/modified from:
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Bioinformatics.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Introduction to Bioinformatics Prologue. Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Centre for Integrative Bioinformatics VU (IBIVU) Tel ,
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
An overview of Bioinformatics. Cell and Central Dogma.
Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou
Bioinformatics and Computational Biology
COMPUTATIONAL BIOLOGIST DR. MARTIN TOMPA Place of Employment: University of Washington Type of Work: Develops computer programs and algorithms to identify.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
High throughput biology data management and data intensive computing drivers George Michaels.
BME435 BIOINFORMATICS.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Introduction to Bioinformatics and Functional Genomics
Biological Databases By: Komal Arora.
Data-intensive Computing: Case Study Area 1: Bioinformatics
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
생물정보학 Bioinformatics.
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Bioinformatics Vicki & Joe.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Introduction to Bioinformatic
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

1 Bioinformatics

2 Books: S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics S.B Primrose, RM. Twyman, Principles of Genome analysis and Genomics A. Lesk, Introduction to Bioinformatics A. Lesk, Introduction to Bioinformatics A.M. Campbell, L. J. Heyer, Discovering Genomics, Proteomics, & Bioinformatics A.M. Campbell, L. J. Heyer, Discovering Genomics, Proteomics, & Bioinformatics J. Claverie, C. Notredame, Bioinformatics for Dummies J. Claverie, C. Notredame, Bioinformatics for Dummies – d html

3 What Is Bioinformatics? “Bioinformatics is a new subject of genetic data collection, analysis and dissemination to the research community.” Hwa A. Lim (1987) “Bioinformatics is a new subject of genetic data collection, analysis and dissemination to the research community.” Hwa A. Lim (1987) “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data,including those to acquire, store, organize, archive, analyze, or visualize such data.” NIH working definition (2000) “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data,including those to acquire, store, organize, archive, analyze, or visualize such data.” NIH working definition (2000)

4 What is Bioinformatics? Informatics Computer Science Computer Engineering Information Science Biology & Other Natural Sciences Mathematics & Statistics Bioinformatics

5 Bioinformatics Related Fields Computational biology Computational biology Computational molecular biology Computational molecular biology Biomolecular informatics Biomolecular informatics Computational genomics Computational genomics …

6 Biological Data Genomes Genomes –DNA Sequences of A, T, C, G –Annotated with function, “interesting” features Proteins Proteins –Amino Acid Sequences Sequences of 20 letters Sequences of 20 letters –Annotated with structure, function, etc.

7 Biological Data Gene Expression Gene Expression –Dynamic behavior of genes Protein Expression Protein Expression –Dynamic behavior of proteins Structural Features Structural Features –RNA and proteins …

8 Biological Data Sus scrofa agouti-related protein gene 1 ggcacattct cctgttgagc caggctatgc tgaccacaat gttgctgagc tgtgccctac 1 ggcacattct cctgttgagc caggctatgc tgaccacaat gttgctgagc tgtgccctac 61 tgctggcaat gcccaccatg ctgggggccc agataggctt ggcccccctg gagggtatcg 61 tgctggcaat gcccaccatg ctgggggccc agataggctt ggcccccctg gagggtatcg 121 gaaggcttga ccaagccttg ttcccagaac tccaaggtca gtgcgggcag gagtgggttg 121 gaaggcttga ccaagccttg ttcccagaac tccaaggtca gtgcgggcag gagtgggttg 181 ggtggggctt ggacatcctc tggccacaaa gtattctgct tgtatgagcc ctttcttccc 181 ggtggggctt ggacatcctc tggccacaaa gtattctgct tgtatgagcc ctttcttccc 241 cttcccaatc ccaggcctgg gaggtgggtg ttttgtgcat gggtggttct gccctcacat 241 cttcccaatc ccaggcctgg gaggtgggtg ttttgtgcat gggtggttct gccctcacat 301 catctgtccc agatctaggc ctgcagcccc cactgaagag gacaactgca gaacgggcag 301 catctgtccc agatctaggc ctgcagcccc cactgaagag gacaactgca gaacgggcag 361 aagaggctct gctgcagcag gccgaggcca aggccttggc agaggtaaca gctcagggaa 361 aagaggctct gctgcagcag gccgaggcca aggccttggc agaggtaaca gctcagggaa 421 agggctgagg ccacaagtct tgagtgggtg tgtcaagcat caacctctat ctgtgcttgg 421 agggctgagg ccacaagtct tgagtgggtg tgtcaagcat caacctctat ctgtgcttgg 481 agttgccact gtggtacaac gggattggcg gtgtcttggg agcgctggga cgtggtttca 481 agttgccact gtggtacaac gggattggcg gtgtcttggg agcgctggga cgtggtttca 541 tccccggcca gcacaagtgg gttaaggatc tggccttgcc atcccttcag cttaggctga 541 tccccggcca gcacaagtgg gttaaggatc tggccttgcc atcccttcag cttaggctga 601 gactgtggct tggagctgat ctctgaccgg aagctccata tgctctgggg tgaccaaaaa 601 gactgtggct tggagctgat ctctgaccgg aagctccata tgctctgggg tgaccaaaaa 661 tggaaaaaca aacatacaaa acacctctac ctgcacttcc tgaccccctc acccggggcg 661 tggaaaaaca aacatacaaa acacctctac ctgcacttcc tgaccccctc acccggggcg 721 acactgcaga ccatcccgtt cacgctccac ttccatcctg ccttgatctg gcgcattcca 721 acactgcaga ccatcccgtt cacgctccac ttccatcctg ccttgatctg gcgcattcca 781 tgaatgtgct tttggaagtc cttgtttccc aacccttgta ggtgctagat cctgaaggac 781 tgaatgtgct tttggaagtc cttgtttccc aacccttgta ggtgctagat cctgaaggac 841 gcaaggcacg ctccccacgt cgctgcgtaa ggctgcacga atcctgtctg ggacaccagg 841 gcaaggcacg ctccccacgt cgctgcgtaa ggctgcacga atcctgtctg ggacaccagg 901 taccatgctg cgacccatgt gctacatgct actgccgttt cttcaacgcc ttctgctact 901 taccatgctg cgacccatgt gctacatgct actgccgttt cttcaacgcc ttctgctact 961 gccgcaagct gggtactgcc acgaacccct gcagccgcac ctagctggcc agccaatgtc 961 gccgcaagct gggtactgcc acgaacccct gcagccgcac ctagctggcc agccaatgtc 1021 gtcg

9 Genome Sizes Species Genome Size Bacteriophage MS bp Esherichia coli 4.7 million bp Human 3.3 billion bp

10 Database Growth

11 Database Growth

12 Database Growth

13 Database Growth Exponential growth in sequence data Exponential growth in sequence data Not much growth in sequence size Not much growth in sequence size Expect exponential growth in annotation information Expect exponential growth in annotation information We have lots of data, but it’s difficult to make sense of it. We have lots of data, but it’s difficult to make sense of it.

14 Laser Dye Based Sequencing

15 Four-Color Sequencing

16 Automated Trace Analysis

17 Automated Base Calling

18 A Biology Lab?

19 Complete Genome Sequences 1995: shotgun sequencing of H. influenzae, 1.8 Mb; M. genitalium 0.6 Mb. 1995: shotgun sequencing of H. influenzae, 1.8 Mb; M. genitalium 0.6 Mb. 1996: S. cerevisiae, 13 Mb. 1996: S. cerevisiae, 13 Mb. 1998: C. elegans, 100 Mb. 1998: C. elegans, 100 Mb. 2000: D. melanogaster, 120 Mb 2000: D. melanogaster, 120 Mb 2001: human (3 Gb); >100 complete genome sequences, mostly microbial. 2001: human (3 Gb); >100 complete genome sequences, mostly microbial. 2002: mouse 2002: mouse 2003: pufferfish, D. pseudoobscura 2003: pufferfish, D. pseudoobscura 2004: C. briggsae, rat, chimp, chicken; many more coming 2004: C. briggsae, rat, chimp, chicken; many more coming

20 Human Genome Sequencing

21

22

23 Fundamental Problems in Bioinformatics Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment Phylogenetic Analysis Phylogenetic Analysis Sequence Based Database Searches Sequence Based Database Searches Gene Prediction Gene Prediction Structure Prediction (RNA and Protein) Structure Prediction (RNA and Protein) Protein Classification Protein Classification Gene Expression Gene Expression Genetic nets Genetic nets

24 Pairwise Sequence Alignment Given two DNA or AA sequences, find the best way to “line them up” Given two DNA or AA sequences, find the best way to “line them up” –Biology allows for variation –Gaps, mismatches, etc.. HEAGAWGHEE PAWHEAE HEAGAWGHE-E P-A--W-HEAE HEAGAWGHE-E --P-AW-HEAE

25 Multiple Sequence Alignment Extend pairwise problem to multiple sequences Extend pairwise problem to multiple sequences

26 Phylogenetic Analysis Study relationships between organisms Study relationships between organisms –Characteristic similarity –Sequence similarity –Whole genome comparison –…

27 Phylogenetic Analysis

28 Sequence Based Database Searches Keyword Keyword –Find all sequences named “cytochrome c” Sequence Sequence –Find all sequences similar to HEAGAWGHEE –Remember, there are gigabytes to search, and I’m not about to wait two days for an answer! BLAST, FASTA, … BLAST, FASTA, …

29 Gene Prediction Does the following sequence contain a gene? Does the following sequence contain a gene? How many introns? Exons? Promoters? Other features? How many introns? Exons? Promoters? Other features? TTGTAATCTCCTCTGTGACTATAATGACTAGTCTCAGGCCTGCCTTCCCCAGAAACCTCTCTTTTGGCTATTTCTCTTTC TAGTTCTCTGTTTAAACAAAATTTATTCTATATATCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATC TATCTATCTATCTATCATCTACTTATCATCTGTCTAGCCATTTGAAGCATCTTTGTGTTTTAGGTCCTGTTAGATTCTCC TTTCAGCCAGTGGAGGATCTGGACAGAGCTATTTCTTAGCTTCCCCTAAGCCATGTTGTTAGAACGAATCCCCCACACCT CCTCTGAGTGCTACGTCTCCGTCAAGAATTATGTATGTGGGATCCAGATGGCCCAGTGGATAAAACTGCAAGTGTCATGA CCATGACCTGACTTCAAGGGATTGTGTAGAAAGGGAGTTATCACAGTGTGAGGGACAGGGCTAAGGACACTAACCCGTAT GTTGAGGGGCACAGACGCTAGCAACAACAGTGAAGTGTTTAAAAAGGCAAAAATCATGTTTCTAGAAGTCAGGAAGAGCC TAACTTGTGGACAAGGACCAACAGGCAGCAGTTGTAATGGGGCAGGGCAGAGGGAGAGCGGACACGCAGCTTTTGGCATC AAACACACCCAGAGTGTGGATAGAGAGTAGGGAAATACTCTAGTCTCTGGCTAGGATACTCCCCTCTCTTTTTGACATTT CTCATTGGCAGCCCCAAGTGGTCACTGGAGAGCCAGGAAGCCTAAAGGACACAGTTAGTAGCAGCCAGCTCCTTTGGTGG AATTTTGGGGACATGGTGGGGTGACTTGGCTCTATCCAGGCCAGGGCTGGGTGTGAGTATACACTTAGTGACTGGCCTTC

30 Gene Prediction

31 Structure Prediction (RNA, Protein) From sequence, predict 2 and 3D structures. From sequence, predict 2 and 3D structures.

32 Protein Classification From sequence, identify characteristics of a protein From sequence, identify characteristics of a protein –Active sites –Families (e.g. globin) –Blocks –Domains –Folds –Motifs –Etc.

33 Gene Expression Study of gene activity under experimental conditions Study of gene activity under experimental conditions –Large scale studies with microarrays

34 Фрагмент одной из карт метаболических путей. Современная биология стала источником огромных объемов экспериментальной информации, осмысливание которых невозможно без использования эффективных информационных технологий и методов математического моделирования

36

37 IC&G SB RAS, Novosibirsk, Russia, BGRS-2002 МЕТАБОЛИЧЕСКИЕ ПУТИ – ОБЯЗАТЕЛЬНЫЕ ЭЛЕМЕНТЫ ГЕННЫХ СЕТЕЙ. А дипоцит: мевалонатный путь биосинтеза холестерина в клетке.

38 Интеграция генных сетей при противовоспалительном ответе Цитокины Антиоксида нтная защита Арест клеточного цикла Воспаление Метаболизм железа Ответ на тепловой шок Апоптоз Активные формы кислорода Интеграционный междисциплинарный проект СО РАН по системной компьютерной биологии

процессов Регуляторная компонента (управление метаболизмом) Соотношение метаболической и регуляторной компонент цикла трикарбоновых кислот E. Coli K-12: Исполняющая компонента (метаболизм) 139 процессов - ПРОЦЕСС - участие в процессе с ненулевой стехиометрией - участие в процессе с нулевой стехиометрией Полный граф метаболической компоненты E. COLI K-12: 3973 процесса Нижние оценки сложности модели (без детального учета этапов матричного биосинтеза): ~ – процессов Более детальная модель: ~ процессов Портретная модель: не менее процессов Интеграционный междисциплинарный проект СО РАН по системной компьютерной биологии

40

41

42 Different perspectives on Bioinformatics Bioinformatics is a tool Bioinformatics is a tool –Biologists, biochemists, medical professionals, etc. –Obtain meaningful and understandable results Bioinformatics is a discipline Bioinformatics is a discipline –Informaticians, mathematicians, statisticians, etc. –Generate meaningful and understandable results

43 Summary Bioinformatics is truly interdisciplinary Bioinformatics is truly interdisciplinary –Biology (natural sciences), informatics, mathematics & statistics Databases Databases –Large, semistructured, incomplete, inaccurate Wide-range of problems Wide-range of problems –Solutions employ knowledge from sciences with algorithms and models from informatics, mathematics, and statistics