REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.

Slides:



Advertisements
Similar presentations
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Advertisements

NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
BIOINFORMATICS Ency Lee.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Lecture 8 Alignment of pairs of sequence Local and global alignment
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Structural bioinformatics
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
The Cell, Central Dogma and Human Genome Project.
The BIG Goal “The greatest challenge, however, is analytical. … Deeper biological insight is likely to emerge from examining datasets with scores of samples.”
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Bioinformatics & LIS A brief talk for librarians, information scientists, and computer scientists about resources and collaborative opportunities with.
Sequence Alignment III CIS 667 February 10, 2004.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Chapter 3 Computational Molecular Biology Michael Smith
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
Computer Applications and Bioinformatics
BME435 BIOINFORMATICS.
Bioinformatics Overview
Archives and Information Retrieval
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Genomes and Their Evolution
Nancy Baker SILS Bioinformatics Seminar January 21, 2004
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
Bioinformatics Vicki & Joe.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology and molecular biology Protein analysis

BIOINFORMATICS

Study of the structure of biological information and biological systems Integrates theories and tools of mathematics/statistics, computer science and information technology Involves the use of hardware and software to study vast amounts of biological data

What is Bioinformatics?  the field of science in which biology, computer science, and information technology merge to form a single discipline  application of information technology to the storage, management and analysis of biological information  facilitated by the use of computers

FUNCTIONS Data Management Storage Retrieval Data Analysis *Literature/Bibliography, Sequence, Structure, Taxonomy, Expression, etc.

BIOLOGICAL DATABASES Systematic data storage/retrieval Maintained on a regular basis Can contain various types of data (integration) Sequence Structure Other pertinent information Nucleotides and proteins are most common

DATABASES  a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system  Biological databases consist usually of the nucleic acid sequences of the genetic material of various organisms as well as protein sequences and structures

DATABASES  e.g. nucleotide sequence database typically contains information such as  contact name  the input sequence with a description of the type of molecule  the scientific name of the source organism from which it was isolated  additional requirements  easy access to the information  a method for extracting only that information needed to answer a specific biological question

DATABASES Sequence – GenBank, European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ); managed by the International Nucleotide Sequence Database Collaboration (INSDC) – UniGene – Saccharomyces Genome Database (SGD) – UniProtKB (UniProtKB/Swiss-Prot or UniProt/TrEMBL) – ExPASy

DATABASES Structure Nucleic Acid Database (NDB) Protein Data Bank (PDB) Worldwide Protein Data Bank (wwPDB) ExPASy

DATA MINING Process by which testable hypotheses are created regarding function/structure of gene/protein of interest through identifying similar sequences in “more established” organisms Tools: Text-term search Sequence similarity search

Machine Learning Studies methods and the design of computer programs based on past experience Why? New methods are being introduced Old ones should be improved

“Units” of Information DNA (genome) RNA (transcriptome) Protein (proteome)

What is Being Analyzed? Sequence Structure Interactions Pathways Mutations/Evolutions

Why? Increasing amount of biological information entails Organization Archiving Global unification/harmonization More biological discoveries Functional/Structural similarities Phylogenetic/Evolutionary patterns

Applications Medicine Pharmaceuticals Biotechnology Agriculture

STRUCTURE DATABASES

Molecular Data When you draw a molecule, – You start with atoms – Then proceed with the structure – And the three-dimensional data What can be stored? – Coordinates – Sequences – Chemical graphs Atoms and bonds

Databases Protein Data Bank (PDB) Molecular Modeling Database (MMDB)

Techniques in the Laboratory X-ray Crystallography Nuclear Magnetic Resonance

Formats PDB mmCIF MMDB

Structure Viewers Cn3D RasMol WebMol Mage VRML CAD Swiss PDB Viewer

Promises of bioinformatics Medicine Knowledge of protein structure facilitates drug design Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up Genome analysis allows the targeting of genetic diseases The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

Challenges in bioinformatics Explosion of information Need for faster, automated analysis to process large amounts of data Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…) Need for “smarter” software to identify interesting relationships in very large data sets Lack of “bioinformaticians” Software needs to be easier to access, use and understand Biologists need to learn about the software, its limitations, and how to interpret its results

SEQUENCE ALIGNMENT

Two or More Sequences Measure similarity Determine correspondences between residues Find patterns of conservation Derive evolutionary relationships

Alignment Correspondences of nucleotides/amino acids in two sequences or more are assigned An assignment of correspondences that preserves the order of the residues within the sequences is an alignment Gaps are used to achieve this Sequence alignment refers to the identification of residue-residue correspondences

Uses Homology Similarities “Ancestry” Genome annotation Assigning structure and function to genes Database queries For newly-discovered/unknown sequences

Tools Dot Plots – Diagonal lines of dots showing similarities between two sequences Scoring Matrices – Score reflects quality of each possible alignment; best possible score is identified – Scoring scheme is crucial – PAM (Point Accepted Mutations) and BLOSUM (BLOCKS Substitution Matrix) Dynamic Programming – Algorithmic technique that reuses previous computations

Scoring Penalties/Scores Match (e.g. A – A) Mismatch (e.g. A C) Gap (e.g. A _) Linear Gap Penalty: Uniform Affine Gap Penalty: Gap Existence vs. Gap Extension

Local vs. Global Alignments Global Alignment Similarities between majority of two sequences Local Alignment Similarities between specific parts of two sequences

Programs Pairwise Sequence Alignment BLAST VAST FASTA Multiple Sequence Alignment MAFFT

Needleman-Wunsch Algorithm Can be used for global and alignments Maximum-value function A simple scoring scheme is assumed Three steps – Initialization – Matrix fill (scoring) – Traceback (alignment)