Scientific publications and archives: media, content and access Lesk, Ch 3 (Lesk, 2008)

Slides:



Advertisements
Similar presentations
Ensuring a Journal’s Economic Sustainability, While Increasing Access to Knowledge.
Advertisements

Journals.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
1.
On line (DNA and amino acid) Sequence Information Lecture 7.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2005.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Engineering Village ™ ® Basic Searching On Compendex ®
Components of a Cell (Eukaryotes) Picture from on-line biology book,on-line biology book,
Archives and Information Retrieval
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Lecture 2.21 Retrieving Information: Using Entrez.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
SFU Library services, resources, and research tips for SIAT researchers (or: How libraries are still useful in the age of the Digital Revolution and Breaking.
On line (DNA and amino acid) Sequence Information
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Development of Bioinformatics and its application on Biotechnology
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
Archives and Information Retrieval
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
Chapter 14 a Guide to Print, Electronic, and Other Sources.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Notre Dame Radiation Chemistry Data Center. Keith P. Madden Notre Dame Radiation Laboratory.
Organizing information in the post-genomic era The rise of bioinformatics.
Scientific Applications of XML Arvind Hulgeri, Shantanu Godbole
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Overview of Bioinformatics 1 Module Denis Manley..
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
OWL Representing Information Using the Web Ontology Language.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
BME435 BIOINFORMATICS.
Biological Databases By: Komal Arora.
Data-intensive Computing: Case Study Area 1: Bioinformatics
Demo: Protein Information Resource
Archives and Information Retrieval
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Predicting Active Site Residue Annotations in the Pfam Database
PIR: Protein Information Resource
IL Step 3: Using Bibliographic Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Scientific publications and archives: media, content and access Lesk, Ch 3 (Lesk, 2008)

Scientific literature Scientific publications began as interpersonal communications – lectures, seminars and discussions – oral communication. Formal written article or books – scientific literature. Today, journals, presentation at meetings, books, book chapters, Web material, films, radio, television programs, podcasts. Formal academic publications must pass the test of ‘peer review’ – quality control. Before the Internet, scientific literature appeared on paper (journals). Today, journals appear electronically as well as on paper (some rarely visit a library to read journals). Delocalized literature delivery and computational methods of information retrieval. 2

Economic factors governing access to scholarly publications 3 Traditional economic model of scientific journals: a scientific organization or publisher produces and distribute at regular intervals, a paper-bound ‘issue’ of articles. – Cost: editorial office; preparation of manuscripts; printing/distribution. – Support (income): sales (subscription), page charges to authors, donation, subsidy, advertisements etc. Recently, changes: – More papers are published – driving up costs. – Larger volume of publication puts libraries under financial pressure. – Electronic facilities reduces costs. – Electronic distribution extends the potential format of journal articles. – User community supports open access.

Open access / traditional and digital libraries Redefinition of the author/publisher/reader relationship. – Retains peer-review process. – Accepted articles are placed on the Web, with free access. – Authors retain copyright (instead of publisher). – Cost of publication are transferred from readers to authors. Traditional libraries – you know what it is. Digital libraries. – Electronic form, on-line. – Raise economic questions. – Large-scale digital libraries by scanning? 4

The information explosion / Databases Efficient delivery can be a mixed blessing. Impossible for anyone to read all the literature in a given field. The Web gives a higher dimension – no longer linear, new media, new way of searching, bibliography management, organizing and sharing the harvest. Databases: contents, ontology, logical structure, format of the data, routes for retrieval of data, links to other resources. Literature as a database: e.g. Medline (Medical Literature Analysis and Retrieval System Online) – now part of PubMed, bibliographic database. 5

Databases Database organization / design – e.g. design of a relational database of amino acids. Annotation: a typical entry in a molecular biology database might contain other information (other than say gene sequences). – Reference information (citations of publications). – Interpretative information. – Links to other information. Database quality control (errors?) – “Get it right the first time”: database curation and annotation – a new profession. – Identify errors – external curators /users. – Tracking database changes. 6

Databases Database access: a issue to consider. Links (utility of a database): internal links and external links. Database interoperability: questions that require appeal to multiple database at once? – Merge several databases? – Methods for intercommunication between databases? Data mining. – Knowledge discovery: description/explanation. – Successful forecasting / predictive modeling. – Statistical techniques. – Artificial neutral networks. – Support vector machines. 7

Programming languages and tools Traditional programming languages: FORTRAN, C, C++ Scripting languages: PERL, PYTHON, RUBY… Program libraries specialized for molecular biology: standard libraries (numerical analysis and text processing), libraries for molecular biology (e.g. bioperl.org). Java – Java Virtual Machine – computing over the Web? Markup languages: implements data structures, XML. 8

Natural language processing Natural language: verbal-oral and/or textual forms of human- human communication. Natural language processing has been a goal of computing. Difficulty: ambiguity of words and phrases. Identifying keywords and combinations of keywords: e.g. names of genes and names of diseases. Knowledge extraction: protein-protein interactions (automatic text- mining software). Text mining: – Identification of references to genes and proteins. – Identification of interactions. – Interaction networks and diseases. – Hypothesis generation (unsuspected relationships between genes and diseases). 9

Archives and information retrieval Lesk, Ch 4 (Lesk, 2008)

Database indexing and specification of search terms An index: set of pointers to information in a database. Information retrieval programs accepts multiple query terms and keywords. Possible to ask for logical combinations of indexing terms. Many database search engines allow complex logical expressions. Follow-up questions: modify query, cumulative searches, links between entries in different databases. Analysis and processing of retrieved data: using results retrieved in one search as input for another one (some information retrieval systems provide such facilities). 11

Nucleic acid sequence databases Archiving of bioinformatics data was originally carried out by individual research groups. As requirements grew, projects become very large-scale. Primary data collections related to biological macromolecules: – Nucleic acid sequences, including whole-genome projects. – Amino acid sequence of proteins. – Protein and nucleic acid structures. – Small-molecule crystal structures. – Protein functions. – Expression patterns of genes. – Networks: of metabolic pathways, of gene and protein interactions, and of control cascades. – Publications. 12

Nucleic acid sequence databases Triple partnership of the National Center for Biotechnology Information (USA); the EMBLBank (European Bioinformatics Institute, UK) and the Data Bank of Japan (National Institute of Genetics, Japan). Curate, archive and distribute DNA and RNA sequences. Entries have life history: – Unannotated -> Preliminary -> Unreviewed -> Standard Sample entry includes: properties of specific regions (e.g. coding sequences, performs of affect function, interaction with other molecules, affect replication, etc) 13

Genome databases and genome browsers Genome browsers (full-genome sequences): databases bringing together all molecular information available about a particular species. E.g. ensembl.org: intended to be the universal information source for the human and other genomes. 14

Protein sequence databases In 2002, three protein sequence databases, the Protein Information Resource (PIR), USA and SWISS-PORT, Swiss and TrEMBL, Europe, formed the UniPort consortium. Share the database but continue to offer separate information-retrieval tools for access. Databases associated with SWISS-PORT: – ENZYME DB and PROSITE PIR and associated databases: – PIRSF: protein family classification system. – iProClass: protein knowledge, access to over 90 biological databases. – iProLINK: gateway to protein literature. 15

Databases of protein families Evolutionary relationships / homology detection. Two full-length protein sequences (>=100 residues) that have >=25% identical residues in an optional alignment are likely to be related. Need sequence alignment algorithms. Refer to a group of related proteins as a family. 16

Databases of structures Structure databases archive, annotate and distribute sets of atomic coordinates. World-wide Protein Data Bank (wwPDB.org). – Joint effort of the Research Collaboratory for Structural Bioinformatics (RCSB) and the Protein Data Bank Japan. – Contains the structures of proteins. – It overlaps several other databases. Several website offer hierarchical classification of all proteins of known structure – SCOPE, CATH, DALI, CE 17

Other databases Classification and assignment of protein function. – The Enzyme Commission. – The Gene Ontology Consortium protein function classification. Specialized, or ‘boutique’ databases. Expression (mRNA levels) and proteomics databases (interpretation in terms of protein patterns). Databases of metabolic pathways (flow of molecules and energy through pathways of chemical reactions). Bibliographic databases. Only a few of the many databases… 18