1 CISC 841 Bioinformatics (Fall 2008) Review Session
2 Basics of Molecular biology Central dogma –Transcription –Translation Genetic code DNA –Double helix –Watson-Crick binding RNA –Secondary structures Genes –Compositional Structure –Reading frames Proteins –Secondary structure: alpha helices, beta sheets, and coils –3-d structure Gene Regulations (Operons) DNA Microarray Cloning PCR Gel electrophoresis 2D gel + MS Yeast 2 hybrid system
3 Computational methods Kernel based methods -Linear SVM -Rosenblatt algorithm (Primal and dual forms) -Novikoff theorem (You do not need to memorize the proof) -Maximum margin (primal and dual form) -Lagrangian multiplier, KKT condition -Gradient descent algorithm for the dual form -Support vectors -Nonlinear SVM -Mapping to feature space (high dimension) -Kernel functions: generic kernels -Mercer’s theorem -Soft margin (slack variables) -Principal component analysis (PCA) -Dimension reduction (projection onto a few most differentiating directions) -Kernel based PCA (capable of nonlinear projection) -Binary versus multiclass classification -Applications: classifying genes based on expression profiles
4 Computational methods (cont’d) Bayesian networks -Joint probability, factorization based on chain rules -Bayes’ rule -Bayesian networks -Conditionally independence, D-separation, Markov condition -Model construction (scored-based, maximum posterior probability) -Parameter estimation (Maximum likelihood) -Model averaging -Bootstrap -Applications: inferring regulatory networks from gene expression data
5 Computational methods (cont’d) Hidden Markov models -Three major problems -Decoding -Likelihood -Training : Parameter estimation -Model structure -Incorporating domain knowledge -Genetic algorithm -Model equivalence -Mutual entropy -NP-Hard -Heuristics: quasi-consensus based -Applications: predicting transmembrane topology and classifying protein families Gradient descent algorithm Genetic algorithm Evaluation metrics -Sensitivity -Specificity -ROC
6 About the Exam –Time and Place: 3:30PM-4:45PM,Thursday, November A Smith Hall –closed-book –Four parts Basics of Molecular Biology[10 points] Kernel Based Methods [40 points] Bayesian Networks [35 points] Hidden Markov models [15 points]