Topics in 2 nd Part: Biological Information and Tools. Molecular Modeling Technology and Applications. Computer-aided drug design SMA5422: Special Topics.

Slides:



Advertisements
Similar presentations
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Pfam(Protein families )
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
LSM3241: Bioinformatics and Biocomputing Lecture 2: Bioinformatics of viral genome Prof. Chen Yu Zong Tel:
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Archives and Information Retrieval
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Bioinformatics Lecture 2. Bioinformatics: is the computational branch of molecular biology Using the computer software to analyze biological data The.
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
The Science of Life Biology unifies much of natural science
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Ch10. Intermolecular Interactions and Biological Pathways
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bioinformatics and it’s methods Prepared by: Petro Rogutskyi
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
CS 790 – Bioinformatics Introduction and overview.
Biological Databases By : Lim Yun Ping E mail :
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
CZ3253: Computer Aided Drug design Lecture 3: Drug and Cheminformatics Databases Prof. Chen Yu Zong Tel:
The lives of gray-headed flying foxes are closely entwined with the lives of the eucalyptus trees that form their habitat –Eucalyptus trees provide food.
CZ5225 Methods in Computational Biology Lecture 9: Biological pathways and pathway simulation Prof. Chen Yu Zong Tel:
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
CZ3253: Computer Aided Drug design Lecture 1: Drugs and Drug Development Part I Prof. Chen Yu Zong Tel:
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Overview of Bioinformatics 1 Module Denis Manley..
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
EB3233 Bioinformatics Introduction to Bioinformatics.
Protein Domain Database
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
Bioinformatics and Computational Biology
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Overview of Bioinformatics Module Denis Manley.. Contact Details Lecturer Name: Denis Manley Room number: KE-1-013a
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
CZ5225 Methods in Computational Biology Lecture 2-3: Protein Families and Family Prediction Methods Prof. Chen Yu Zong Tel:
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
InterPro Sandra Orchard.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Archives and Information Retrieval
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
생물정보학 Bioinformatics.
CZ3253: Computer Aided Drug design Introduction about the module Prof
Predicting Active Site Residue Annotations in the Pfam Database
Annotation: linking literature to gene products
Genomes and Their Evolution
Introduction to Bioinformatic
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Topics in 2 nd Part: Biological Information and Tools. Molecular Modeling Technology and Applications. Computer-aided drug design SMA5422: Special Topics in Biotechnology Chen Yu Zong Department of Computational Science, NUS Office: Blk SOC1 Room Tel.: SMA5422: Special Topics in Biotechnology Chen Yu Zong Department of Computational Science, NUS Office: Blk SOC1 Room Tel.:

Schedule Lecture 6 (Feb 14): Biological information database and data mining. Lecture 7 (Feb 21): Gene and protein sequence alignment methods. Lecture 8 (Feb 26): Machine learning techniques in sequence analysis. Lecture 9 (Mar 5): Computer modeling of biomolecules: Structure, motion, and binding. Lecture 10 (Mar 7): Computer aided drug design: structure-based approach. Lecture 11 (Mar 12): Computer aided drug design: QSAR approach.

Lecture 6: Biological information database and data mining Biology as an information intensive science Typical databases Introduction to data mining Data mining in biology

Biology as an information intensive science Organization of living systems: Ecosystems=> Communities=> Populations => Organisms => Organ systems => Organs => Tissues => Cells => Molecules. Ecosystem: All living things in a particular area (such as an island) and all non-living, physical components of the environment that affect living things (such as air, soil, water, sunlight). Community: All living things in an ecosystem (such as all animals, plants, bacteria, fungal, viruses etc. in a rain forest). Population: A group of interbreeding individuals of one species (such as all flying squirrels in a rain forest). Organism: An individual living thing (such as one flying squirrel). Organ system: A group of related body components that perform a specific type of function (such CNP). Organ: Functional group of organ system (such as brain).

Biology as an information intensive science Fundamental Theory: Evolution: Simple molecules => Organic molecules => RNA-based life systems => Single cells => Multiple cellular organisms => Higher organisms Molecular Basis of Life: DNA (Genes) => RNAs => Proteins: Structural organization Chemical reaction, synthesis and destruction of molecules Signal transduction Transportation of molecules. Regulation

Biology as an information intensive science Cell Organization and Function: Structural organization Chemical reaction, synthesis and destruction of molecules Signal transduction Transportation of molecules. Regulation

Biology as an information intensive science Information (Molecular Level): DNA: 30,000 ~ 100,000 genes for human (many with unknown functions) 3x10 9 base pairs for human DNA (< 10% coding region) Protein: 60,000 ~ 100,000 proteins for human. Individual level: sequence, 3D structure, molecular function. Group level: pathways, cellular location, collective function. Classification: Family: superfamily, family, subfamily (based on evolution and function) Type: receptor, ion channel, enzyme, carrier, regulator, structure Function: Physiological function, diseases, therapeutics, toxicity, pharmacokinetics, agriculture, plant, environmentally relevant.

Typical Databases Category: General Sequence 3D structure Protein function, proteomics, and pathways. Pharmainformatics Medical informatics and disease information Reference: Nucleic. Acids. Res.Nucleic. Acids. Res., 30, 1-12 (2002). Internet links:

Typical Databases General: The National Center for Biotechnology Information (NCBI). The National Center for Biotechnology Information (NCBI). Integrated ENTREZ retrieval software and databases for genetics, gene and protein sequences, 3D structures, and on-line PubMed library. CAM (Complementary and Alternative Medicine) on PubMed. ENTREZ CAM (Complementary and Alternative Medicine) on PubMed Pedro's BioMolecular Research Tools. Pedro's BioMolecular Research Tools. A Collection of WWW Links to Information and Services Useful to Molecular Biologists. Other mirror sites in Germany, and Switzerland.GermanySwitzerland The CMS Molecular Biology ResourceThe CMS Molecular Biology Resource. This site is a compendium of electronic and Internet-accessible tools and resources for Molecular Biology, Biotechnology, Molecular Evolution, Biochemistry, and Biomolecular Modeling. Other mirror sites in Japan, Canada, France, Germany, Italy, and UK.JapanCanadaFrance GermanyItalyUK

Typical Databases Sequence: The Genome Data Base (GDB).The Genome Data Base (GDB). Database for genes of human and other species. Located at Johns Hopkins University School of Medicine. Mirror site in Japan.Mirror site in Japan. Genome Sequence DataBaseGenome Sequence DataBase. Located at the National Center for Genome Resources (NCGR) in Santa Fe. Site has info on Human Genome Project, gentics and public issues, education and references. SWISS-PROTSWISS-PROT Annotated protein sequence database. Online Mendelian Inheritance in Man.Online Mendelian Inheritance in Man. Database that catalogs the human genes and genetic disorders. Located at NCBI. Pfam: Protein families database of alignments and HMMsPfam: Protein families database of alignments and HMMs. A large collection of multiple sequence alignments and hidden Markov models covering many common protein domains.

Typical Databases Structure: Protein Data Bank (PDB).Protein Data Bank (PDB). 3D crystal and NMR structure of proteins, DNA, RNA and ligand-bound complexes. Official mirror site in Singapore, and other places in China., Japan, Taiwan and several places in USA: Boston, North Carolina.Singapore China.JapanTaiwanBostonNorth Carolina Nucleic Acids Database (NDB).Nucleic Acids Database (NDB). 3D crystal structure of DNA and RNA. Mirror sites in UK, Japan, and other sites in USA: San Diego.UKJapanSan Diego SCOPSCOP. Structural classification of proteins. Mirror sites in Singapore, China, the U.S., and Japan.SingaporeChina U.SJapan CATHCATH. Protein Structure Classification. A hierarchical domain classification of protein structures in PDB. MODBASEMODBASE. A database of Comparative Protein Structure Models. Models were generated by PSI-BLAST and MODELLER. As of Aug 2000, there are 3,379 reliable models for domains in 2,220 proteins, and 5433 reliable fold assignments for domains in 3,083 proteins.

Typical Databases Function and pathways: GeneCardsGeneCards. A database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others [gene listing]. PROSITEPROSITE. Protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. Mirror sites in Australia, Canada, China, Taiwan.AustraliaCanadaChinaTaiwan PRINTSPRINTS. Protein fingerprint database. A fingerprint is a group of conserved motifs used to characterise a protein family. PROCATPROCAT. A database of 3D enzyme active site templates. It can be thought of as the 3D equivalent of the 1D templates found in sequence motif databases such as PROSITE and PRINTS.PROSITE PRINTS KEGG: Kyoto Encyclopedia of Genes and GenomesKEGG: Kyoto Encyclopedia of Genes and Genomes. Site contains Pathway Info, Disease Catalogs, Cell Catalogs, Molecule Catalog, and Genomic Info. It also provides Links to Pathway and Other Databases.Links to Pathway and Other Databases SPAD: Signaling Pathway DatabaseSPAD: Signaling Pathway Database. An integrated database for genetic information and signal transduction systems. Divided into four categories based on extracellular signal molecules (Growth factor, Cytokine, and Hormone) and stress, that initiate the intracellular signaling pathway.

Typical Databases Pharmainformatics: TTD: Therapeutic Target DatabaseTTD: Therapeutic Target Database. A database to provide information about the known and newly proposed therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs/ligands directed at each of these targets. Links to relevant databases also provided. MedChem/Biobyte QSAR Database. MedChem/Biobyte QSAR Database. A collection of 10,000 of QSAR datasets that covers both biological and physical-organic chemistry. The NCI Drug Information System 3D DatabaseThe NCI Drug Information System 3D Database. A collection of 3D structures for over 400,000 drugs which was built and is maintained by the Developmental Therapuetics Program Division of Cancer Treatment, National Cancer Institute. The database is an extension of the NCI Drug Information System.Developmental Therapuetics Program Division of Cancer TreatmentNCI Drug Information System Drug Discovery Databases Compiled by The Biophysical Pharmacology Group at NCIDrug Discovery Databases Compiled by The Biophysical Pharmacology Group at NCI. Site has links to several therapeutics program databases and tools, and a 2D-Gel protein expression database. Pharmaceutical Information Network Pharmaceutical Information Network. A comprehensive information database about drugs and diseases. U. S. Food and Drug Administration Center for Drug Evaluation and ResearchU. S. Food and Drug Administration Center for Drug Evaluation and Research.

Introduction to Data Mining Main Objective: Pattern identification, Classification, Extraction of related data (character) set. Tasks: Generation of association rules. Classification and clustering. Pre-processing and post-processing of relevant dataset. General Procedure: 1.Understanding of application domain. 2.Data source identification and data selection. 3.Pre-processing: feature selection, discretization, data cleaning. 4.Data mining: pattern extraction and model building. 5.Post-processing: identification of interesting/useful/novel patterns/rules. 6.Incorporation of patterns in real world tasks.

Introduction to Data Mining Example: Generation of association rules: Record of customer purchases: John: Jacket, Boots Alfred: Milk, Cheese, Bread, Shoes Green: Milk, Bread Brown: Milk, Bread, Shoes, Greeting Cards, Pork Eric: Cheese, Milk, Shoes, Beef Bob: Jacket, Boots, Ski Pants Form of association rules: Item A => Item B [sup, conf] sup = support = % of records containing both item A and B conf = confidence = sup / (% of records containing item B)

Data Mining in Biology Types of Tasks: Search for similar pattern in a subsection of each member of datasets (e.g. protein sequence motifs). Classification of datasets into groups (e.g. proteins into families). Search for a dataset matching given characteristics (e.g. alignment of a protein sequence against all entries in a protein sequence database). Extraction of particular information from literature (e.g. drugs that bind to a particular protein). Proc. Natl. Acad. Sci. USA 95, (1998) Structure 7, (1999) Bioinformatics 17, (2001) Bioinformatics 17, (2001); 17, (2001))

Homework 1.Write a very short report about a database assigned to you. 2.Can you give at least two more examples to each type of tasks in biological data mining? 3.Read the reference about typical biological database and get a broad picture about the current status of publicly-accessible bioinformatics databases. 4.Read at least one of the references about data mining in biology and be prepared to give a brief description about the paper.