Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

Similar presentations


Presentation on theme: "Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko."— Presentation transcript:

1 Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko

2 2 Course Objectives To introduce the bioinfomatics discipline To make the students familiar with the major biological questions which can be addressed by bioinformatics tools To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

3 3 Course Requirements 1.Submit written assignments. 1.9/12 short class assignments 4/4 home assignments 2.Each assignment is to be done and submitted in pairs (except the first two class assignment). 3.The pairs are ideally composed of a person from computer science and a person from life science. 2.A final project or a take home exam, submitted in pairs. 3.The course web site: http://webcourse.cs.technion.ac.il/236523 http://webcourse.cs.technion.ac.il/236523

4 4 Grading 10 % class assignments 30 % home assignments 60% final project/ test

5 5 Literature list Gibas, C., Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, 2001. Lesk, A. M. Introduction to Bioinformatics. Oxford University Press, 2002. Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004. Advanced Reading Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

6 6 Course Outline Introduction to bioinformatics Bioinformatics databases Pairwise and multiple sequence alignment Searching for sequences in databases Searching for motifs in sequences Phylogenetics RNA secondary Structure Protein structure: secondary and tertiary structure Proteins families: motifs, domains, clustering The Human Genome Project Gene prediction, alternative splicing Gene expression analysis (DNA microarrays) Comparative genomics, Biological networks

7 7 Course Outline Introduction to bioinformatics Bioinformatics databases Pairwise and multiple sequence alignment Searching for sequences in databases Searching for motifs in sequences Phylogenetics RNA secondary Structure Protein structure: secondary and tertiary structure Proteins families: motifs, domains, clustering The Human Genome Project Gene prediction, alternative splicing Gene expression analysis (DNA microarrays) Comparative genomics, Biological networks

8 8 Introduction to Bioinformatics What is Bioinformatics? From DNA to Genome What’s next? the post genomic era

9 9 “the field of science in which biology, computer science, and information technology merge to form a single discipline Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.” What is Bioinformatics?

10 10 Central Paradigm in Molecular Biology mRNAGene (DNA)Protein TranslationTranscription DNA RNA Protein Symptomes (Phenotype )

11 11 21st century Biology – from purely lab-based science to an information science

12 12 Central Paradigm of Bioinformatics Genetic Information Molecular Structure Biochemical Function Symptoms

13 13 From DNA to Genome Watson and Crick DNA model Sanger sequences insulin protein ARPANET (early Internet) Sanger dideoxy DNA sequencing PDB (Protein Data Bank) N-W sequence alignment GenBank database PCR (Polymerase Chain Reaction) 1955 1960 1965 1970 1975 1980 1985 Dayhoff’s Atlas of Protein Seqs.

14 14 1995 1990 2000 SWISS-PROT database USA’s NCBI WWW (World Wide Web) Celera Genomics First human genome draft Israel’s INN Human Genome Initiative BLAST algorithm FASTA algorithm First bacterial genome Europe’s EBI Yeast genome

15 15 1994 0 1995 1 2004 234 eukaryotes 20 bacteria 194 archaea 19 Complete Genomes

16 16 The “post-genomics” era Goal: to understand the functional networks of a living cell AnnotationComparative genomics Structural genomics Functional genomics What’s Next ?

17 17 Annotation Open reading frames Functional sites Structure, function

18 18 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT.................... TGAAAAACGTA

19 19 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT............................................... TGA AAAACGTA TF binding site promoter Ribosome binding Site ORF=Open Reading Frame CDS=Coding Sequence Transcription Start Site

20 20 Comparative genomics Comparing ORFs Identifying orthologs Concluding on structure and function Comparing functional sites Concluding on regulatory networks

21 21 Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

22 22 Ultraconserved Elements in the Human Genome Gill Bejerano,1* Michael Pheasant,3 Igor Makunin,3 Stuart Stephen,3W.James Kent,1 John S. Mattick,3 David Haussler2* There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.

23 23 Functional genomics Genome-wide profiling of: mRNA levels Protein levels Co-expression of genes and/or proteins Identifying protein-protein interaction Networks of interactions

24 24 Understanding the function of genes and other parts of the genome

25 25 Structural genomics Assign structure to all proteins encoded in a genome

26 26 Structural Genomics Expectations ~300 unique folds in PDB ~300 unique folds Currently 27761 structure

27 27 Structural Genomics Expectations 1000-3000 unique folds in “structure space” Estimate

28 28 Course Outline Introduction to bioinformatics Bioinformatics databases Pairwise and multiple sequence alignment Searching for sequences in databases Searching for motifs in sequences Phylogenetics RNA secondary Structure Protein structure: secondary and tertiary structure Proteins families: motifs, domains, clustering The Human Genome Project Gene prediction, alternative splicing Gene expression analysis (DNA microarrays) Comparative genomics, Biological networks

29 29 Database Types Sequence databases Generalspecial GenBank, emblTF binding sites PIR, SwissprotPromoters Genomes Structure databases GeneralSpecial PDBSpecific protein families folds Databases of experimental results Co-expressed genes, prot-prot interaction, etc.

30 30 World Wide Web –USA National Center for Biotechnology Information: www.ncbi.nlm.nih.gov –European Bioinformatics Institute: www.ebi.ac.uk –ExPASy Molecular Biology Server: www.expasy.org –Israeli National Node: inn.org.il http://www.agr.kuleuven.ac.be/vakken/i287/bioinformatica.htm

31 31 Entrez – NCBI Engine Entrez is the integrated, text-based search and retrieval system used at NCBI for the major databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, and others.Entrez http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?itool=toolbar

32 32 Entrez – NCBI Engine

33 33 Nucleotide Nucleotides database is a collection of sequences from several sources, including GenBank, RefSeq, and PDB. April 2004 -> 38,989,342,565 bases

34 34 PubMed MEDLINE publication database –Over 17,000 journals –Some other citations Papers from 1960s –Over 12,000,000 entries Alerting services –http://www.pubcrawler.ie/ –http://www.biomail.org/

35 35 OMIM Online Mendelian Inheritance in Man –Genes and genetic disorders –Edited by team at Johns Hopkins –Updated daily Entries –10670 single-loci phenotypes (*) –1294 multi-loci phenotypes (#) –2415 unclassified phenotypes

36 36 Searching PubMed Structureless searches –Automatic term mapping Structured searches –Field names, e.g. [au], [ta], [dp], [ti] –Boolean operators, e.g. AND, OR, NOT, () Additional features –Subsets, limits –Clipboard, history

37 37 Searching OMIM Search Fields –Disease name, e.g. hypertension –Cytogenetic location, e.g. 1p31.6 –Inheritance, e.g. autosomal dominant Browsing Interfaces –Alphabetical by disease –Genetic map Additional features like PubMed


Download ppt "Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko."

Similar presentations


Ads by Google