Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
On line (DNA and amino acid) Sequence Information Lecture 7.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
The Cell, Central Dogma and Human Genome Project.
prepared with some help from friends...
The BIG Goal “The greatest challenge, however, is analytical. … Deeper biological insight is likely to emerge from examining datasets with scores of samples.”
CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Oleg Rokhlenko Ydo Wexler
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
Introduction to Bioinformatics / Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shai Ben-Elazar Idit kosti Course web site :
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
An Introduction to Bioinformatics Molecular Biology Databases.
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Bioinformatics for biomedicine
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Biological Databases and Tools Sandra Sinisi / Kathryn Steiger November 25, 2002.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Centre for Integrative Bioinformatics VU (IBIVU) Tel ,
Organizing information in the post-genomic era The rise of bioinformatics.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Edward Vitkin Course web site :
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
An overview of Bioinformatics. Cell and Central Dogma.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Olga Karinski Course web site :
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Archives and Information Retrieval
생물정보학 Bioinformatics.
Introduction to Bioinformatics /234525
Algorithms for Biological Sequence Analysis
Mangaldai College, Mangaldai
Genomes and Their Evolution
Ultraconserved Elements in the Human Genome
Introduction to Bioinformatics
Introduction to Bioinformatics
Presentation transcript:

Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko

2 Course Objectives To introduce the bioinfomatics discipline To make the students familiar with the major biological questions which can be addressed by bioinformatics tools To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

3 Course Requirements 1.Submit written assignments. 1.9/12 short class assignments 4/4 home assignments 2.Each assignment is to be done and submitted in pairs (except the first two class assignment). 3.The pairs are ideally composed of a person from computer science and a person from life science. 2.A final project or a take home exam, submitted in pairs. 3.The course web site:

4 Grading 10 % class assignments 30 % home assignments 60% final project/ test

5 Literature list Gibas, C., Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, Lesk, A. M. Introduction to Bioinformatics. Oxford University Press, Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, Advanced Reading Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

6 Course Outline Introduction to bioinformatics Bioinformatics databases Pairwise and multiple sequence alignment Searching for sequences in databases Searching for motifs in sequences Phylogenetics RNA secondary Structure Protein structure: secondary and tertiary structure Proteins families: motifs, domains, clustering The Human Genome Project Gene prediction, alternative splicing Gene expression analysis (DNA microarrays) Comparative genomics, Biological networks

7 Course Outline Introduction to bioinformatics Bioinformatics databases Pairwise and multiple sequence alignment Searching for sequences in databases Searching for motifs in sequences Phylogenetics RNA secondary Structure Protein structure: secondary and tertiary structure Proteins families: motifs, domains, clustering The Human Genome Project Gene prediction, alternative splicing Gene expression analysis (DNA microarrays) Comparative genomics, Biological networks

8 Introduction to Bioinformatics What is Bioinformatics? From DNA to Genome What’s next? the post genomic era

9 “the field of science in which biology, computer science, and information technology merge to form a single discipline Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.” What is Bioinformatics?

10 Central Paradigm in Molecular Biology mRNAGene (DNA)Protein TranslationTranscription DNA RNA Protein Symptomes (Phenotype )

11 21st century Biology – from purely lab-based science to an information science

12 Central Paradigm of Bioinformatics Genetic Information Molecular Structure Biochemical Function Symptoms

13 From DNA to Genome Watson and Crick DNA model Sanger sequences insulin protein ARPANET (early Internet) Sanger dideoxy DNA sequencing PDB (Protein Data Bank) N-W sequence alignment GenBank database PCR (Polymerase Chain Reaction) Dayhoff’s Atlas of Protein Seqs.

SWISS-PROT database USA’s NCBI WWW (World Wide Web) Celera Genomics First human genome draft Israel’s INN Human Genome Initiative BLAST algorithm FASTA algorithm First bacterial genome Europe’s EBI Yeast genome

eukaryotes 20 bacteria 194 archaea 19 Complete Genomes

16 The “post-genomics” era Goal: to understand the functional networks of a living cell AnnotationComparative genomics Structural genomics Functional genomics What’s Next ?

17 Annotation Open reading frames Functional sites Structure, function

18 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT TGAAAAACGTA

19 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT TGA AAAACGTA TF binding site promoter Ribosome binding Site ORF=Open Reading Frame CDS=Coding Sequence Transcription Start Site

20 Comparative genomics Comparing ORFs Identifying orthologs Concluding on structure and function Comparing functional sites Concluding on regulatory networks

21 Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

22 Ultraconserved Elements in the Human Genome Gill Bejerano,1* Michael Pheasant,3 Igor Makunin,3 Stuart Stephen,3W.James Kent,1 John S. Mattick,3 David Haussler2* There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.

23 Functional genomics Genome-wide profiling of: mRNA levels Protein levels Co-expression of genes and/or proteins Identifying protein-protein interaction Networks of interactions

24 Understanding the function of genes and other parts of the genome

25 Structural genomics Assign structure to all proteins encoded in a genome

26 Structural Genomics Expectations ~300 unique folds in PDB ~300 unique folds Currently structure

27 Structural Genomics Expectations unique folds in “structure space” Estimate

28 Course Outline Introduction to bioinformatics Bioinformatics databases Pairwise and multiple sequence alignment Searching for sequences in databases Searching for motifs in sequences Phylogenetics RNA secondary Structure Protein structure: secondary and tertiary structure Proteins families: motifs, domains, clustering The Human Genome Project Gene prediction, alternative splicing Gene expression analysis (DNA microarrays) Comparative genomics, Biological networks

29 Database Types Sequence databases Generalspecial GenBank, emblTF binding sites PIR, SwissprotPromoters Genomes Structure databases GeneralSpecial PDBSpecific protein families folds Databases of experimental results Co-expressed genes, prot-prot interaction, etc.

30 World Wide Web –USA National Center for Biotechnology Information: –European Bioinformatics Institute: –ExPASy Molecular Biology Server: –Israeli National Node: inn.org.il

31 Entrez – NCBI Engine Entrez is the integrated, text-based search and retrieval system used at NCBI for the major databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, and others.Entrez

32 Entrez – NCBI Engine

33 Nucleotide Nucleotides database is a collection of sequences from several sources, including GenBank, RefSeq, and PDB. April > 38,989,342,565 bases

34 PubMed MEDLINE publication database –Over 17,000 journals –Some other citations Papers from 1960s –Over 12,000,000 entries Alerting services – –

35 OMIM Online Mendelian Inheritance in Man –Genes and genetic disorders –Edited by team at Johns Hopkins –Updated daily Entries –10670 single-loci phenotypes (*) –1294 multi-loci phenotypes (#) –2415 unclassified phenotypes

36 Searching PubMed Structureless searches –Automatic term mapping Structured searches –Field names, e.g. [au], [ta], [dp], [ti] –Boolean operators, e.g. AND, OR, NOT, () Additional features –Subsets, limits –Clipboard, history

37 Searching OMIM Search Fields –Disease name, e.g. hypertension –Cytogenetic location, e.g. 1p31.6 –Inheritance, e.g. autosomal dominant Browsing Interfaces –Alphabetical by disease –Genetic map Additional features like PubMed