Introduction to BioInformatics GCB/CIS535
Course Overview Sequence alignment Gene prediction Gene expression Dynamic programming Blast and its variants statistical significance Motif and promoter prediction Gene prediction Homology and HMMs Gene expression Experiment design Interpretation: clustering Proteomics Use of mass spectrometry
Sequence Alignment Choices Motif finding nucleotide vs. amino acid global vs. local repeat masking Motif finding Position weight matrices PAM, BLOSUM CONSENSUS EM and Gibbs sampling methods
Promoter Finding CpG islands Transcription Factor Binding sites TATA, GC, and CAAT boxes Transfac and Jasper libraries FirstExon
Gene Finding Homology Hidden Markov Models (HMMs) Regression Future Conservation between species Hidden Markov Models (HMMs) Acceptors & donors Coding & non-coding Frame shifts Regression Linear regression Artificial neural networks Future Conditional Random Fields (CRFs)
Gene Expression
Gene Expression Uses Technology Experimental design Analysis Finding Differentially Expressed Genes Gene List Annotation Technology Spotted array (two color) and Affimetrics (one color) Experiment Execution (Process Control) Experimental design Replicates Matched experiments Controls / reference samples Analysis Probes to Genes Normalization Sample Quality Control Statistical Significance of Over Representation
Clustering Clustering methods Key decisions Dimension reduction Hierarchical K-means Key decisions Standardize data? How many clusters? Dimension reduction PCA - Principal Components Analysis SOM - Self Organizing Maps Assessment Cluster purity
Methods for protein identification Proteomics Methods for protein identification
Proteomics Uses Mass spectrometry Toxicology Compare diseased vs. normal cells Alternative splicing Post-translational modifications Together with genomics Mass spectrometry Mass fingerprinting Sequence tags Cross correlation with simulated mass spectra E.g. Sequest and mascot Problem with introns Y-ions and b-ions Tandem mass spec
Future Directions Regulatory mechanisms Binding between Transcription (“gene expression”) Translation (“protein production”) Acetylation (of lycine) Phosphorylation, Other protein, RNA and DNA modification Binding between DNA, RNA, Protein Comparison across species Systems biology Metabolic modeling Combining data
Gene Regulatory Network Sea urchin development
Metabolic Networks