Discussion Section Week 9

Slides:



Advertisements
Similar presentations
Downloading a multiple alignment for your region of interest from the UCSC Genome Browser ( that can be uploaded in ConTra for.
Advertisements

GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Hidden Markov Models.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Profiles for Sequences
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
CS273a Lecture 14, Fall 08, Batzoglou CS273a Lecture 14, Fall 2008 Finding Conserved Elements (1) Binomial method  25-bp window in the human genome 
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
CS273a Lecture 10, Aut 08, Batzoglou CS273a Lecture 10, Fall 2008 Neutral Substitution Rates.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
Finding approximate palindromes in genomic sequences.
Lecture 5: Learning models using EM
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Hidden Markov Models.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
CS107 Introduction to Computer Science Lecture 7, 8 An Introduction to Algorithms: Efficiency of algorithms.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Sequencing a genome and Basic Sequence Alignment
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Hidden Markov Models for Sequence Analysis 4
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
What does it take to make the most of Your Math Homework ?
Huei-Hun E Tseng1 and Martin Tompa BMC Bioinformatics 2009 Presenter : Seyed Ali Rokni Algorithms for locating extremely conserved elements in multiple.
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
Risk Analysis Simulate a scenario of possible input values that could occur and observe key impacts Pick many input scenarios according to their likelihood.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
California Pacific Medical Center
Multiple alignment using hidden Markove models November 21, 2001 Kim Hye Jin Intelligent Multimedia Lab
Risk Analysis Simulate a scenario of possible input values that could occur and observe key financial impacts Pick many different input scenarios according.
HW4: sites that look like transcription start sites Nucleotide histogram Background frequency Count matrix for translation start sites (-10 to 10) Frequency.
Week 8. Homework 7 2 state HMM – State 1: neutral – State 2: conserved Emissions: alignment columns – Alignment of human, dog, mouse sequences AATAAT.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Indel rates and probabilistic alignments Gerton Lunter Budapest, June 2008.
Hidden Markov Models BMI/CS 576
Week 10.
Measures of Central Tendency
OptiSystem applications: BER analysis of BPSK with RS encoding
Discussion Section 3 HW1 comments HW2 questions
Eukaryotic Gene Finding
CS 188: Artificial Intelligence Spring 2007
Three classic HMM problems
N-Gram Model Formulas Word sequences Chain rule of probability
Week 4 Genome 540: Discussion
Week 5 Discussion Section
Genome 540: Discussion Section Week 3
Presentation transcript:

Discussion Section Week 9 Eliah Overbey March 7, 2019

Agenda HW6: Questions? HW7 was due last night HW8: Due Wednesday, March 13, 11:59pm HW9: Due Wednesday, March 20, 11:59pm

HW8: Evolutionarily conserved segments ENCODE region 010 (chromosome 7) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving), conserved (slow-evolving) Emitted symbols are multiple alignment columns (e.g. ‘AAT’) Viterbi parse (no iteration) - Don’t need to worry about updating transition probabilities and performing multiple iterations; just do a single pass to determine the Viterbi path

Input Original maf format Homework file format Sequences broken into alignment blocks based on the species included Official file format specs Homework file format Only 3 species Gaps in human sequence and ambiguous bases replaced with ‘A’ for simplicity

HMM Diagram start 0.95 0.05 0.05 0.95 0.90 0.10 A-- TTT Neutral Conserved 0.05 0.95 0.90 0.10 A-- TTT … all tuple possibilities …

Setting parameters Emission probabilities Transition probabilities Neutral state: observed frequencies in neutral data set Conserved state: observed frequencies in functional data set Transition probabilities Given in the assignment; more likely to go from conserved to neutral Initiation probabilities Given in the assignment; more likely to start in the neutral state

Calculating Emission Probabilities Neutral State: Ancient Repeat Sequences Conserved State: Putative Functional Sites 1st base: human 2nd base: dog 3rd base: mouse etc … etc …

Output Parameter values State and segment histograms Emission probabilities you calculated from neutral and conserved data sets Initiation/transition probabilities you were given in the assignment State and segment histograms Coordinates of 10 longest conserved segments (report positions relative to the start of the chromosome) Brief annotations for the 5 longest conserved segments (look at UCSC genome browser, like in HW3) - When looking up annotations in the UCSC genome browser, make sure you’ve selected the correct reference genome (hg18)

HW8: Questions?

HW9: D-segments Revisited Same input data as for HW3 (file of read-start counts for chromosome 18) New scoring scheme for the read-start bins (0, 1, 2, and >=3) Same format, different numbers AND different values for S and D cutoffs

HW9: D-segments Revisited Assignment: Create randomized data sets with the same average read-start distribution as the original data N = number of sites in original sequence counts[r] = number of sites with r read starts in original sequence for each site 1..N x = random number between 0 and 1 (uniform distribution) if x < counts[0] / N randomized_counts[site] = 0 else if x < (counts[0] + counts[1]) / N randomized_counts[site] = 1 else if x < (counts[0] + counts[1] + counts[2]) / N randomized_counts[site] = 2 else randomized_counts[site] = 3

HW9: D-segments Revisited Assignment: Create randomized data sets with the same average read-start distribution as the original data Run maximal D-segment algorithm (from HW3) on 10 different randomizations of the read start sequence Scoring scheme different from HW3 for the read-start bins D = -5 S = {5, 10, 15, 20, 25}  5 different S, D combinations

HW9: D-segments Revisited Assignment: Create randomized data sets with the same average read-start distribution as the original data Run maximal D-segment algorithm (from HW3) on 10 different randomizations of the read start sequence Scoring scheme different from HW3 for the read-start bins D = -5 S = {5, 10, 15, 20, 25}  5 different S, D combinations Run maximal D-segment algorithm on the original data, using the same parameters

HW9: D-segments Revisited Output: Two tables, one for the original ‘real’ data, and another for the combined results across the 10 sets of simulated data. Each table row should report: S-value Number of D-segments found (mean for simulated data) Minimum D-segment score found Maximum D-segment score found Ratio of #D-seg(Si)/#D-seg(Si-1) Brief written answers to the questions posed in the assignment text (to be posted)

HW9: Questions?

Next Week: HW Office Hours