Bioinformatics & LIS A brief talk for librarians, information scientists, and computer scientists about resources and collaborative opportunities with.

Slides:



Advertisements
Similar presentations
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Advertisements

1.
On line (DNA and amino acid) Sequence Information Lecture 7.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Nucleic Acid Database By Pooja Awatramani. Database Utilities Provides structural references in the form of base pair annotation for DNA, RNA, and some.
Nucleic Acids The amino acid sequence of a polypeptide is programmed by genes. Genes consist of DNA, which is a polymer belonging to the class of compounds.
Nucleic Acids Nucleic Acid Basics Contain instructions to build proteins 2 types: – DNA – RNA Composed of smaller units called nucleotides – Monomer:
3.3 DNA Structure –
DEOXYRIBONUCLEIC ACID DNA. O.L Lesson Objectives At the end of this lesson you should be able to 1. Outline the simple structure of DNA – 2 strands and.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Biochemistry Part IV Nucleic Acids. Largest organic molecule made by organisms Largest organic molecule made by organisms Include 2 main types: Include.
BIOINFORMATICS Ency Lee.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Introduction to Bioinformatics Yana Kortsarts Bob Morris.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Archives and Information Retrieval
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
The Protein Data Bank (PDB)
Sequence/Structure Alignment Resources from NCBI Steve Bryant Protein Data Bank Rutgers University November 19, 2005.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Introduction to DNA and RNA Biology Standards B-4.1: Compare DNA and RNA in terms of structure, nucleotides, and base pairs. B-4.2: Summarize the relationship.
Nucleic Acids Nucleic Acid Basics Contain instructions to build proteins 2 types: – DNA – RNA Composed of smaller units called nucleotides – Monomer:
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
CSE 6406: Bioinformatics Algorithms. Course Outline
Sequence Databases What are they and why do we need them.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Function of RNA Since DNA cannot leave the nucleus, RNA molecules will convert the written instructions into proteins in the cytoplasm Genes are coded.
Molecular Biology 2.6 Structure of DNA and RNA. Nucleic Acids The nucleic acids DNA and RNA are polymers of nucleotides.
Unit 4: Molecular Genetics Left sidePg #Right SidePg # Unit Page58Table of contents59 Double Bubble60C.N. – DNA & RNA Structure 61 DNA & RNA Coloring62.
D.N.A. DeoxyriboNucleic Acid
DNA structure.
Organizing information in the post-genomic era The rise of bioinformatics.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Nucleic Acids.
Regents Biology Nucleic Acids Information storage.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Computer Storage of Sequences
DNA AND RNA STUDY GUIDE FOR THE TEST. Name the three molecules DNA is made up of.
The nucleic acids include the amazing DNA molecule. It is the source of constancy and variation among species, and is the foundation for the unity and.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
DNA and Genes. Prokaryotes VS Eukaryotes Prokaryotes: no defined nucleus and a simplified internal structure Eukaryotes: membrane limited nucleus and.
AS Biology. Gnetic control of protein structure and function.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
2015/04/10 Jun Min Jung MOLECULAR BIOLOGY & BIOCHEMISTRY.
Introduction of Genomic Nada Al-Juaid. Out line  Cell  DNA the molecule of life  Centra dogma  Gene  Genetics  Genome  Genomic  Epigenomic  Human.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Introduction to PubChem BioAssay
Biological Databases By: Komal Arora.
Data-intensive Computing: Case Study Area 1: Bioinformatics
(3) Gene Expression Gene Expression (A) What is Gene Expression?
DNA Structure 2.6 & 7.1.
The Structure and Function of Large Biological Molecules
Nucleic Acids Section 3.5.
What is Bioinformatics?
Mangaldai College, Mangaldai
DNA and RNA Structure and Function
DNA Structure.
The Structure and Function of Large Biological Molecules
REVIEW DNA DNA Replication Transcription Translation.
Deoxyribonucleic Acid
Title: Nucleic Acids
Nucleic Acids.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Bioinformatics & LIS A brief talk for librarians, information scientists, and computer scientists about resources and collaborative opportunities with biology. April 18, 2006 G. Benoit

Outline of the talk Bioinformatics defined Generation of data Tools and databases Activities for Librarianship, Computer and Information Science Examples: –Entrez, NCBI, Visualization Collaborations

Bioinformatics defined Over 70 defintions Differences arise from the work Nat’l Center for Biotechnical Information (NCBI) The development of new algorithms and statistics with which to assess relationships among members of large data sets; The analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and The development and implementation of tools that enable efficient access and management of different types of information.

Without getting into the science… How the data started … Four chemical bases (purines [adenine (A), guanin (G)] and pyrimidines [cytosine (C) and thymine (T)] ) Their precise order and linking (attached to a sugar molecule and to a phosphate molecule to create a nucleotide) …

DNA

A pairs with T; G with C to make unique and very long strings, called sequences E.g., AATGACCAT codes for a different gene than GGGCCATAG would Replication: RNA consists of A, G, C, and Uracil and has ribose instead of deoxyribose Point is one can predict missing data, sometimes…

In short… the nucleotides are linked in a certain order or sequence through the phosphate group; their precise order and linking within the DNA determines what proteins the gene produces and the phenotype of the organism

Generation of Data Raw data from sequencing Expression data Data generated by linking other raw data in very large, multidimensional databases (e.g., OMIM) Research literature (full-text journals) Data models to describe the literature for retrieval, linking to other data, and linking to the raw data New data models to support greater flexibility in describing & manipulating data …

Generation of Data To support integrated search and retrieval To focus on single organisms or find similarities across them Feed other technology Visualization of natural phenomena and of abstract phenomena

Tools & Databases A host of tools for database searching… –BLAST (basic local alignment search tool) –FASTA (sequence strings) –ChopUp (protein analysis) –Integrated packages (Lasergene Sequence Analysis Software) –The many services offered through NCBI and NLM

Take a look at handout, Table 1, publically accessible databases

Data Categories Monographs, Journals, Announcements (text) Datasets: –Bibliographic ( –Taxonomic –Nucleic acid –Genomic (e.g., GDB, OMIM) –Protein DB (SwissProt, TrEMBL) –Protein families, domains, and functional sites –Proteomics initiative –Enzyme/metabolic pathways –Sequence Retrieval System (SRS) and NCBI Data Model

Take a look at handout, Table 2, publically- accessible databases defined and then Entrez sample, Table 3

Entrez example Notice the familiar access points (author, journal, title) as well as domain- specific ones (exon, gene, organism) Notice, too, the DNA …

NCBI Homepage Notice the variety of tools (left menu) Site map: Alpha list

Linking across resources NCBI’s structure database is called Molecular Modeling Database (MMDB), and is a subset of non-theoretical models 3D structures obtained from the Protein Data Bank (PDB). Data are obtained from X-ray crystallography and NMR- spectroscopy. Goal is to make it easier to compare structures. Searching : variety of access points: author, title, text terms, or a PDB 4-character code or a numerical MMDB-id MMDB Data : PDB records are parsed (to extract sequences and citations from PDB records, and structural info). Converted to ASN.1. Taxonomy : is used to help end users see term relationships and databases, along with literature references: Example: Escherichia+coli&lvl=0&srchmode=1

Linking across resources XML - there are hundreds of XML schema used in biology Calls for mapping to ASN1 records [see NCBI example] Calls for mapping across schema Calls for exporting data for different devices…

Visualization Cn3D - uses MMDB-Entrez’s structure database – RasMol Protein Explorer OpenRasMol MolviZ.org World Index of Molecular Visualization

Recap main points Very large data sets - “homogenized” thru ASN.1 Goal to integrate (text-text, visualization-text, text-vis) Raw data + research literature + visualization Biologists provide domain knowledge XML is a big player CS and IS provide technology Librarians provide maintenance and access to resources

Collaborative Opportunities For LIS and CS: –Domain analysis –information use, communication, theories of information; –systems analysis and design, –data modeling, –classification, –storage and retrieval, –HCI mapped onto a generalized model of a molecular biology experimental cycle [Denn & MacMullen, 2002, p. 556]

Collaborative Opportunities “Insertion Points” - development of new tools and methods for managing, integrating & visualization For local use: download selected data sets for local needs (Stapley & Benoit, 2000) XML Transformations XML - SVG - X3D Automated retrieval Clustering (data- and text-mining)

Collaborative Opportunities Biologists’ needs: –To go beyond mining of genomic data to investigate causal entailments in intra- and intracellular dynamics LIS’s response: –To aid understanding of the scientific processes thru visualization of literature, metadata and graphic representations in general and for disease-specific analysis

Back to you… Thanks …