Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.

Slides:



Advertisements
Similar presentations
Bioinformatics growth curves Medline records Computer power DNA sequences 3-D structures.
Advertisements

Bioinformatics for genomics Kickoff Bioinformatics Expertise Center 10 November 2009 Judith Boer Dept. of Human Genetics.
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Archives and Information Retrieval
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Organizing information in the post-genomic era The rise of bioinformatics.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Sackler Medical School
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
BASICS OF BIOINFORMATICS Biotechnology Division North-East Institute of Science & Technology (Council of Scientific & Industrial Research) Jorhat ,
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Bioinformatics Educated by Zhenglin Zhu School of Life Sciences, Chongqing U.
BME435 BIOINFORMATICS.
Bioinformatics for Research
Introduction to Genes and Genomes with Ensembl
Bioinformatics Overview
Basics of BLAST Basic BLAST Search - What is BLAST?
Archives and Information Retrieval
Introduction to bioinformatics
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Genomes and Their Evolution
Introduction to Bioinformatics
Searching the NCBI Databases
Explore Evolution: Instrument for Analysis
Problems from last section
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

BIOINFORMATICS

Bioinformatics Combination of: – Theory and methods (algorithms, statistical methods, machine learning, …) – Applications (sequence analysis, genome assemblies, databases,... ) – Different kinds of datasets (sequence data, microarray, next-gen data, …)

Biology Core Concepts Molecular biology Systems biology Evolutionary theory Common lab techniques Sequence comparison Phylogenetic analysis

Computer science Programming Database querying Data mining Visualization Machine learning Modeling …

Data exceeds analysis Bioinformatician data

How to survive? Knowledge of Linux/Unix Scripting: Perl/Python Network based data storage Knowledge biology, genomics Database structures Try to keep up with all new tools!

Benifit of using (Bio)perl, example You have a 1000 sequences to blast and analyse… You can do this manually Or… use a perlscript to do this for you and present you the final results!

Good journals to keep up the pace Bioinformatics ( ) BMC Bioinformatics ( ) PLoS Computational Biology ( )...

DATABASES

Types of databases DNA databases Protein databases Genome databases Microarray databases Next-Gen seq databases

What to find in databases? Sequences Motifs Mutations, SNPs Gene ineraction profiles Interactions (protein protein interactions) Transcription factor binding sites Etc…

Databases? Good Reference annual edition

NCBI: lot of options… feed the need

Amino acid databases Uniprot – SWISS-PROT – TrEMBL – PIR

Uniprot Good quality, curated Minimal redundancy Extensive cross linking to useful databases

Structural databases Structure leads to function! – Protein Data Base – PDB – SCOP & CATH databases (structural classification) lmb.cam.ac.uk/scop/ ; lmb.cam.ac.uk/scop/

Structure prediction (modeling)  SWISS-MODEL & Repository ( swissmodel.expasy.org/ )  MODELLER & MODBASE ( )  Study of interactions (docking) & drug design

SNPs and pharma To collect, encode, and disseminate knowledge about the impact of human genetic variations on drug response.

DNA Microarray Databases Standard: MIAME = minimum information about microarray experiment Databases: – ArrayExpress (EBI) – GEO (NCBI) Check the database before planning an experiment!

Next gen data database e.html e.html

GENOME BROWSERS

Human reference sequences Celera Huref GRCh37 Three reference genomes. Keep this in mind when browsing databases!

Useful Genome Browsers Ensembl: NCBI Map Viewer: _search.cgi? _search.cgi? UCSC:

Genome browser: Ensembl

EMBL Problems Lots of redundancy Wrong or old annotations Vector contamination Errors in sequences

Refseq Better option, NCBI reference Curated Annotations are controlled No redundancy

NCBI:Genbank vs RefSeq Sequence records are created by scientists who submit sequence data to GenBank. As an archival database, GenBank may contain hundreds of records for the same gene. In addition, because there is no independent review system, the types of information may vary from record to record, and GenBank sequence data may contain errors and contaminant vector DNA. To address some of the problems associated with GenBank sequence records, NCBI developed its RefSeq database.

Refseq accession numbers NM_ mRNA (provisional, predicted, reviewed) NP_ protein (provisional, predicted, reviewed) NR_ non-coding RNA (provisional, reviewed) NG_ human genes (provisional, reviewed) NC_ chromosomes, complete genomes (provisional, reviewed)

Refseq accession numbers (2) XM_ predicted mRNA (model) XP_ predicted protein (model) XR_ predicted non-coding RNA (model) NT_ human and mouse genomic contiqs (model) NW_ mouse supercontiqs (model)

Genome browser: NCBI

Genome browser: UCSC Example: UCSC Good tutorial: – com/downloads/ucsc/ ucsc_home.shtml com/downloads/ucsc/ ucsc_home.shtml

SNPS AND DISEASE RESEARCH

SNPs and disease research Association analysis, disease related (?), mapping genome variation… Reference = dbSNP database

Example NCBI SNP database, SNP rs

Other useful SNPs databases Genome variation center HapMap (Ensembl) List of all:

Clinical Bioinformatics Microarrays, omics data (genomics, proteomics, interactomics, metabolomics, …) Combination of bioinformatics and medical informatics

ALGORITHMS AND TOOLS

Algorithms Fundaments for bioinformatic tools – Implemented in ‘front end tools’ (website, Java applications) Can be slow Good for smaller analysis, quick mining – Scripts, programs - use in command line (e.g.local BLAST) Usually local install on server faster large queries, long analysis time required Knowledge of linux/unix essential

Hall of Fame Linux operating system, mySQL database (Bio)Perl: programming language  making your life easier! Blast/Blat: comparing sequences Phylip: Phylogenetic analysis, tree building ClustalW: Multiple alignment MEGA5: Multiple alignment and editing sequences HMMER: comparative genomics EMBOSS: combining several tools for sequence analysis Open sourcce  Free to use and develop

Tools? Good Reference - annual edition

Analysing next gen sequencing data Different tools for different formats – Roche – Applied Biosystems – Illumina

Next gen tools FastQC: quality assesment of FASTQ files

Assembly tools next gen A number of specialized tools exist: ABySS, gap4, Geneious, Mira, Newbler, SSAKE, SOAPdenovo, Velvet, …

Galaxy! Galaxy provides a web-based application for the analysis of sequence data Includes many tools including NGS data Makes your life easier, less linux knowledge

On the cloud

Structure Galaxy

Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011 So this is why you need a bioinformatician in the lab!!