Introduction to Bioinformatics BI420 – Introduction to Bioinformatics Introduction to Bioinformatics BI420 2012 Fall Semester Boston College
BI420 – Course information Instructor: Prof. Gabor Marth Teaching assistant: Jiantao Wu Wiki site with updated syllabus and powerpoints: http://bioinformatics.bc.edu/wikis/BI420 This will be updated regularly and will supersede printed syllabus. You are expected to keep up with any changes made there.
BI420 – Material Introduction to Computational Genomics: Cristianini and Hahn Additional materials: www.computational-genomics.net
BI420 – Material (cont’d) Recommended books Primary literature, both mandatory and optional (see Syllabus) Reference materials (for Practical Bioinformatics) Lecture PPTs will be online
BI420 – Grading Homework (4): 40% Midterm exams (3): 45% In-class presence & participation: 15%
BI420 – Software MATLAB http://www.bc.edu/software/applications/research/matlab.html Install this and view the “demo” by next class. To do this type “demo” at the MATLAB command line. UNIX/PERL We will introduce these when we get to the Practical Bioinformatics section
Genomes and Genes
The animal cell
DNA – the carrier of the genetic code
DNA organization – chromosomes
DNA organization – mitochondria
Translation of genetic information
Gene organization
Protein structure
RNA structure
Gene prediction, genome annotation
The informatics of DNA sequencing Steps in the production and analysis of sequencing data
Genetic variation discovery & analysis look at multiple sequences from the same genome region use base quality values to decide if mismatches are true polymorphisms or sequencing errors
Gene expression Expression profiling technologies and analysis approaches
Classical Bioinformatics Methods
Sequence alignment and similarity search Given a sequence, how can one tell which species it comes from? How can one identify evolutionarily related sequences?
Biologically significant alignment
Storage/retrieval of biological data August 2010 Genbank whole-genome sequences: 169 billion bp Genbank other sequences: 117 billion bp Gene expression omnibus: 480,000 samples (microarray, RNA-seq, ChIP-chip, ChIP-seq, RIP-seq, etc.)
The Tree of Life
Evolution of chromosome organization
Phylogenetics How can evolutionary relationships be reconstructed?
Population and Personal Genomics
Population genetics
Medical Genomics Medical applications of sequencing individuals
Personal genome sequencing How soon will everyone be sequenced? Ethical issues.
Practical Bioinformatics
MATLAB Performing bioinformatic analysis using MATLAB
UNIX scripting Using a command-line operating system
PERL programming Programming in PERL within a UNIX OS
Managing a Database CLONE id name received masked 1 NH0260K08 12-25-99 12-26-99 2 NH0407F02 12-28-99 01-03-00 HIT id cloneID hspID start end 1 1 1 1 17957 2 2 1 96912 114891 ALLELE id hitID nucleotide 1 1 C 2 2 T mySQL. Accessing bioinformatics databases, processing data, and building webpages from PERL.