Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Oleg Rokhlenko Ydo Wexler

Similar presentations


Presentation on theme: "Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Oleg Rokhlenko Ydo Wexler"— Presentation transcript:

1 Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Oleg Rokhlenko Ydo Wexler http://webcourse.cs.technion.ac.il/236523

2 2 What is Bioinformatics?

3 3 Course Objectives To introduce the bioinfomatics discipline To make the students familiar with the major biological questions which can be addressed by bioinformatics tools To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

4 4 Course Structure and Requirements 1.Class Structure Each class (except the first one) will be divided into two parts: 1.Lecture (in lecture room) 2.A Training Lab (in computer lab)* For the Training Lab the class will be divided to 2 groups. Each one of the groups will meet every second week, starting from the second week. The work in the Training Labs will be in pairs. Lab assignments will be submitted at the end of each lab. Preparing yourself for the lab- A tutorial including self home exercise and their answers will be posted on the web a week before the lab 2. A final home exam

5 5 Grading 30 % lab assignments 70% final exam

6 6 Literature list Gibas, C., Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, 2001. Lesk, A. M. Introduction to Bioinformatics. Oxford University Press, 2002. Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004. Advanced Reading Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

7 7 Course syllabus

8 8 What is Bioinformatics?

9 9 “The field of science in which biology, computer science, and information technology merge to form a single discipline” Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. What is Bioinformatics?

10 10 from purely lab-based science to an information science Bioinformatics Bio = Informatics

11 11 Central Paradigm in Molecular Biology mRNAGene (DNA)Protein 21 ST centaury GenomeTranscriptomeProteome

12 12 Genome Chromosomal DNA of an organism Coding and non-coding DNA Genome size and number of genes does not necessarily determine organism complexity

13 13 Transcriptome Complete collection of all possible mRNAs (including splice variants) of an organism. Regions of an organism’s genome that get transcribed into messenger RNA. Transcriptome can be extended to include all transcribed elements, including non-coding RNAs used for structural and regulatory purposes.

14 14 Proteome The complete collection of proteins that can be produced by an organism. Can be studied either as static (sum of all proteins possible) or dynamic (all proteins found at a specific time point) entity

15 15 From DNA to Genome Watson and Crick DNA model First protein sequence 1955 1960 1965 1970 1975 1980 1985 First protein structure

16 16 1995 1990 2000 First human genome draft First bacterial genome Hemophilus Influenzae Yeast genome

17 17 The Human Genome Project Initiated in 1986 Completed in 2003 Project goals were to identify all the genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, store this information in databases, improve tools for data analysis and develop new tools address the ethical, legal, and social issues that may arise from the project.

18 18 Human Genome Project 1995 19902000 1985 USA Department of Energy announces project International Human Genome Organization founded Low resolution linkage map published Celera Genomics founded First working drafts published Project successfully completed

19 19 The Human Genome Project Initiated in 1986 Completed in 2003 How did we do?? identify all the genes in human DNA ☺ ☺ determine the sequences of the 3 billion chemical base pairs that make up human DNA ☺ ☺ ☺ store this information in databases ☺ ☺ ☺ improve tools for data analysis and develop new tools ☺ ☺ ☺ address the ethical, legal, and social issues that may arise from the project ☺

20 20 What makes us human? CHIMP GENOME Chimpanzees are similar to humans in so many ways: they are socially complex, sensitive and communicative, and yet indisputably on the animal side of the man/beast divide. Scientists have now sequenced the genetic code of our closest living relative, showing the striking concordances and divergences between the two species, and perhaps holding up a mirror to our own humanity.

21 21 Perhaps not surprising!!! Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23% How humans are chimps?

22 22 1994 0 1995 1 2004 234 2005303 eukaryotes 24 bacteria 240 archaea 39 Complete Genomes

23 23 The “post-genomics” era Goal: to understand the functional networks of a living cell AnnotationComparative genomics Structural genomics Functional genomics What’s Next ?

24 24 Annotation Open reading frames Functional sites Structure, function

25 25 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT.................... TGAAAAACGTA

26 26 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT............................................... TGA AAAACGTA TF binding site promoter Ribosome binding Site ORF=Open Reading Frame CDS=Coding Sequence Transcription Start Site

27 27 Comparative genomics Whole Genome Comparison Concluding on regulatory networks

28 28 Chimps and Us

29 29 Comparative genomics Comparing ORFs Identifying orthologs Concluding on structure and function Whole Genome Comparison Concluding on regulatory networks

30 30 Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

31 31 Functional genomics Genome-wide profiling of: mRNA levels Protein levels Co-expression of genes and/or proteins

32 32 Understanding the function of genes and other parts of the genome

33 33 Functional genomics Genome-wide profiling of: mRNA levels Protein levels Co-expression of genes and/or proteins Identifying protein-protein interaction Networks of interactions

34 34 A large network of 8184 interactions among 4140 S. Cerevisiae proteins A network of interactions can be built For all proteins in an organism

35 35 Structural genomics Assign structure to all proteins encoded in a genome

36 36 Protein Structure

37 37 Resources and Databases The different types of data are collected in database –Sequence databases –Structural databases –Databases of Experimental Results All databases are connected

38 38 Database Types Sequence databases Generalspecial GenBank, emblTF binding sites PIR, SwissprotPromoters Genomes Structure databases GeneralSpecial PDBSpecific protein families folds Databases of experimental results Co-expressed genes, prot-prot interaction, etc.

39 39 Sequence databases Gene database Genome database SNPs database Disease related mutation database

40 40 What can we learn about a Gene

41 41 mRNA, full length, EST

42 42 EST Expressed Sequence Tags Partial copies of mRNA found within a particular cell Can be used to identify genic regions; splicing patterns of genes; etc

43 43 Different transcripts can be related to the same gene!

44 44 Gene database Give information into gene functionality Alternative splicing of genes –Alternative pattern of exons included to create gene product EST

45 45 Genome Databases Data organized by species Clones assembled into contigous pieces ‘contigs’ or whole chromosomes Information on non-coding regions Relativity

46 46 Genome Browsers Annotation adds value to sequence Easy “walk” through the genome Comparative genomics

47 47 Genome Browsers Ensembl Genome Browser (http://www.ensembl.org)http://www.ensembl.org UCSC Genome Browser http://genome.ucsc.edu/ http://genome.ucsc.edu/ WormBase: http://www.wormbase.org/ http://www.wormbase.org/ AceDB: http://www.acedb.org/ http://www.acedb.org/ Comprehensive Microbial Resource: http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl FlyBase: http://flybase.bio.indiana.edu/ http://flybase.bio.indiana.edu/

48 48 beta globin

49 49

50 50 RefSeq Set of mRNA sequences cureted at NCBI Many experimentally validated Some partially validated via ESTs Some computationally predicted

51 51

52 52

53 53

54 54

55 55

56 56 SNP database Single Nucleotide Polymorphisms (SNPs) Single base difference in a single position among two different individuals of the same species Play an important role in differentiation and disease

57 57 Sickle Cell Anemia Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

58 58 Healthy Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GG A GAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTP E EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

59 59 Diseased Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GG T GAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTP V EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

60 60 Disease Databases Genes are involved in disease Many diseases are well studied Description of diseases and what is known about them is stored OMIM - Online Mendelian Inheritance in Man

61 61

62 62 Structure Databases 3-dimensional structures of proteins, nucleic acids, molecular complexes etc 3-d data is available due to techniques such as NMR and X-Ray crystallography

63 63

64 64

65 65 Databases of Experimental Results Data such as experimental microarray images- expression data Clustering information Metabolic pathways, protein-protein interaction data

66 66 PubMed MEDLINE publication database –Over 17,000 journals –15 million citations since 1950 Service of the National Library of Medicine http://www.ncbi.nlm.nih.giv/PubMed Literature Databases

67 67 Putting it All Together Each Database contains specific information Like other biological systems also these databases are interrelated

68 68 GENOMIC DATA GenBank DDBJ EMBL ASSEMBLED GENOMES GoldenPath WormBase TIGR PROTEIN PIR SWISS-PROT STRUCTURE PDB MMDB SCOP LITERATURE PubMed PATHWAY KEGG COG DISEASE LocusLink OMIM OMIA GENES RefSeq AllGenes GDB SNPs dbSNP ESTs dbEST unigene MOTIFS BLOCKS Pfam Prosite GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress

69 69 Entrez – NCBI Engine Entrez is the integrated, text-based search and retrieval system used at NCBI for the major databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, and others.Entrez http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?itool=toolbar

70 70 Entrez – NCBI Engine

71 71 General Bioinformatic Webpages –USA National Center for Biotechnology Information: www.ncbi.nlm.nih.gov –European Bioinformatics Institute: www.ebi.ac.uk –ExPASy Molecular Biology Server: www.expasy.org –Israeli National Node: inn.org.il http://www.agr.kuleuven.ac.be/vakken/i287/bioinformatica.htm


Download ppt "Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Oleg Rokhlenko Ydo Wexler"

Similar presentations


Ads by Google