Download presentation
Presentation is loading. Please wait.
Published byDavid Hensley Modified over 9 years ago
1
Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas
2
2 Back to motif finding Apply MLE to the profile data Note that we already used MLE when calculating each cell Now θ is the set of choices for each letter Because each choice is independent of the others, the MLE is –Choose at each position j the letter The algorithm takes O(kn) time
3
3 Representing profiles Usually stored as logA ij values –historically for ease of calculation –with computers for maintaining accuracy Smoothing –estimated values can be 0 –this will affect calculations, sometimes leading to serious problems (e.g., no solution) –smoothing increases 0 probabilities –it has to reduce other estimated probabilities to account for this
4
4 Additive smoothing Replace each probability with where is a small number (such as 0.001)
5
5 Student presentations Scheduled for December 2 and December 4 Each student gets 10 minutes (7 minutes for presentation, 3 minutes for questions) Select project or topic and papers in consultation with the instructor by November 13
6
6 Potential presentation topics Similarity Statistical, predictive, and generative models Simulation Estimation Classification Clustering Text mining and knowledge discovery
7
7 Statistical sampling A very general method for solving difficult problems with many variables that cannot be solved directly, but where partial solutions can be “guessed” and improved Commonly known as “Monte Carlo” methods (from the Monaco casino) because one of the pioneers of the technique liked gambling
8
8 Famous MC applications Buffon’s needle (18th century) Enrico Fermi’s study of the neutron (1930) The Manhattan project (1944) Currently used in –aerodynamics –video games and computer-generated films –share pricing –bioinformatics
9
9 Buffon’s needle How to calculate π? Consider a random throwing of a needle of length l on a floor with parallel boards of width w (w>l). Then it can be shown that the probability p of the needle crossing a line between boards is By estimating p (experimentally through MLE) one can then calculate π Using this, the estimate 355/113 was obtained (accurate to 7 decimal places)
10
10 The classification problem Given examples from two or more different classes of objects, and a description of a new object, which class does the new object come from? A lot of variation depending on what kind of description we have available
11
11 Example classification problems Given samples of spam and non-spam email messages, classify an incoming message as spam or non-spam Given samples of paying and non-paying credit card holders, accept or reject a credit card application Given samples of patients who entered a hospital, predict whether a given patient will exit the hospital alive
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.