Application of Bioinformatics in Genetics Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Drs. Michele Tennant / & Rolando Milian Dr. Lei.

Slides:



Advertisements
Similar presentations
Bioinformatics growth curves Medline records Computer power DNA sequences 3-D structures.
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
On line (DNA and amino acid) Sequence Information Lecture 7.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Lecture 2.21 Retrieving Information: Using Entrez.
The Cell, Central Dogma and Human Genome Project.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Attribute databases. GIS Definition Diagram Output Query Results.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Public Resources (II) – Analysis tools  Web-based analysis tools – easy to use, but often with less customization options.  Stand-alone analysis tools.
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Public Resources for Bioinformatics Databases : how to find relevant information. Analysis Tools.
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Function preserves sequences
Copyright OpenHelix. No use or reproduction without express written consent1.
Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou
Bioinformatics and Computational Biology
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Practice – file types (Cont.) Load the “Mysequence.doc” file to Webcutter using “Choose file” and then “Upload sequence file”. -Notice that the “sequence”
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Introduction to Bioinformatics and Functional Genomics
Biological Databases By: Komal Arora.
Data-intensive Computing: Case Study Area 1: Bioinformatics
What is Bioinformatics?
Predict Protein Sequence by Fuzzy-Association Rules
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Searching the NCBI Databases
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Explore Evolution: Instrument for Analysis
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
How to search NCBI.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Application of Bioinformatics in Genetics Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Drs. Michele Tennant / & Rolando Milian Dr. Lei Zhou Course web page: Sakai/UFL for lecture notes and homework & for classroom practice.

Application of Bioinformatics in Genetic Research Time and location: M. W. F. : 1:00-2:00 in CGRC. Except: 1/14 & 1/16 in HSCL C2-3.

Evaluation 50% classroom participation 50% homework

History of bioinformatics – sequence analysis Sequence comparison Similarity search Phylogenetic analysis Structure predication Gene prediction

Bioinformatics in the post genome era Information Representation. - many new types of data, such as Function, Location, Interaction, Regulatory pathway, Expression profile, etc. needs to be recorded Data Management - Infrastructure for inputting, managing, access and retrieval of relevant information in a “sea of databases”. Cloud computing. Systematics The opportunity provided by genome sequence and genomic / proteomic technology is matched by the challenge to bioinformatics / computational biology

Bioinformatics in the post genome era Whole genome sequencing - SNP and whole genome wide association studies. Genomic/proteomic expression profiling (RNA and protein levels). Epigenomics, Comparative genomics, … Regulatory pathway simulation – systems biology. $1,000 genome and … $500,000 analysis ?

Objectives of GMS6014 Basic skills for retrieving and storing data, using web-based applications. Ability to install and run standalone local applications. Understanding the basis of bioinformatics applications using sequence similarity search as the example. A brief survey of available bioinformatics tools for HTS analysis and introduction to functional genomics and systems biology.

Sequence Representation - nucleotide N G R C W T G Y C Y A G A C A T G C C C C G T T T G T For complete list, see table 2.1, Mount 2 nd Ed Or

Sequence Representation - amino acids Q: What’s the common property of these amino acids ? 1.D, E 2.I, L, V, M, F 3.A, S, P

Sequence Representation - amino acids Example: Coloring based on aa property. WDLLAQILCYALRIY WRFLATVVLETLRQY WKFLAITMCKVLKQF RCLLCNKLYYLLRKV LNRLLAELYEVLCHI LRLLQQQQMVLQRQY WDLLAQILCYALRIY WRFLATVVLETLRQY WKFLAITMCKVLKQF RCLLCNKLYYLLRKV LNRLLAELYEVLCHI LRLLQQQQMVLQRQY

Representation of sequence – sequence file format 1.) FASTA – simple and clean > gene_name, (other info) MASASASKJHKLJLKJLDSDFSF SSDSASFSFD… Practice / DIY: retrieve sequence in Fasta format and save the file in the local computer.

How to store sequence files.txt format is clean and allows down stream sequence analysis.doc or.rtf allows formatting during annotation – however, extra information are inserted thus NOT suitable for computational analysis.

Practice – file types Using Windows Explorer (with your own computer) or IE with “C:\” in the address window. Change the “Tools  Folder Options” so that the file extensions (.xxx) are revealed. Edit the downloaded sequence file in MS Word, highlight a section of the sequence with Bold font or color and save as.doc Open the.doc file in NotePad – observe the inserted characters.

Practice – file types (Cont.) Load the “Mysequence.doc” file to Webcutter using “Choose file” and then “Upload sequence file”. -Notice that the “sequence” in the sequence box are nonsense characters. Clear input; Browse and then load the.txt file. Run an analysis. Always keep you sequences in.txt file for downstream analysis.

Representation of sequence The need to include annotations and functional information with each sequence. Structured data entry GeneBank EMBL / SwissProt Observe: The difference of data structure between SwissProt, NCBI protein, and NCBI Genes.

Representation of sequence The need to represent associated info with sequence Structured data entry Specialized databases  3-d Structure  Mutation / Diseases  Protein family / Protein domain  Interaction  Pathway  ….

Representation of sequence The need to represent associated info with sequence Structured data entry Specialized databases Complex / customized data structure - Object-oriented data representation (Mount, p44-45)

Public Resources for Bioinformatics Databases Analysis Tools Observe: List of databases and service at NCBI, EBI, KEGG, and Ensembl.

What can we know about this gene?  Search for “curated” databases.  To prepare for future analysis, save annotated sequence files as genename.html (in a target folder).  For downstream sequence analysis, save pure sequence as FASTA format file. TNF, or your favorite gene Pet Project:

Where and how much information are available for my gene? Observe: The information contents and presentation format for the same gene in SwissProt, NCBI protein, NCBI Genes, etc..

Public Resources (I) – Databases and data sources Over 1,000 in the sea of databases. Content-specific, such as DNA, Protein, Structure, etc. Species-specific, such as flybase, wormbase, OMIM, etc. System-specific, such as MetaCyc, AFCS, etc.

Database concept: Database - efficiently store, update, and retrieve information (data). Database management systems – Access, Sybase MySQL, Oracle, etc. Types of Databases – Relational DB, Object DB, native XML DB.

Database concept – tables in relational databases Accessi on Organ.Ref.NameKey words Features ….….. medline1 TNF…..…….….. …. medline2 P53….……..…… “TNF”=TNF[All Fields]TNF[Name] Protein table

Database concept – relationship between tables Accessi on Organ.Ref.NameKey words Features ….…..medline1P27…..…….….. …. medline2P53….……..…… Protein table IDtitleyearauthorabstract medline1…..1970….….. medline2….1980…. … Reference table

Representation of sequence The need to represent associated info with sequence Structured data entry Specialized databases Complex / customized data structure - Object-oriented data representation (Mount, p44-45)

Observe/Practice Search for TNF in the Gene database and the Nucleotide and Proteins databases. Search for TNF in “All Text” v.s “gene name” the in the Gene database. Compare results. Download the human TNF nucleotide sequence. Download three protein sequences in FASTA format from the RefSeq search result save as 3TNF.txt.

Public Resources (II) – Analysis tools  Web-based analysis tools – easy to use, but often with less customization options.  Stand-alone analysis tools – requires installation and configuration, but provides more customizatio0n options.  Commercial analysis tools  Scripting for bioinformatics projects

Practice: navigating the related resources through links Using the “PubMed” link, search annotated references on TNF. Using the “GEO Profiles” link, search gene expression information on TNF. Using the “Map Viewer” link to observe the chromosome location and gene structure of the TNF locus – change the option of “Map Viewer” to include prediction of CpG island.

Bioinformatics / Computational biology Bioinformatics - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology - The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. (Working Definition of Bioinformatics and Computational Biology - July 17, 2000). NIH / BISTI