Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas.

Slides:



Advertisements
Similar presentations
Chapter 8 Capital Budgeting Techniques © 2005 Thomson/South-Western.
Advertisements

Approaches to Data Acquisition The LCA depends upon data acquisition Qualitative vs. Quantitative –While some quantitative analysis is appropriate, inappropriate.
Selection of Research Participants: Sampling Procedures
FIN 685: Risk Management Topic 5: Simulation Larry Schrenk, Instructor.
Capital Budgeting For 9.220, Term 1, 2002/03 02_Lecture8.ppt.
Sampling.
Chapter 14 Simulation. Monte Carlo Process Statistical Analysis of Simulation Results Verification of the Simulation Model Computer Simulation with Excel.
ISSUES RELATED TO SAMPLING Why Sample? Probability vs. Non-Probability Samples Population of Interest Sampling Frame.
The Monte Carlo Method: an Introduction Detlev Reiter Research Centre Jülich (FZJ) D Jülich
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Monte Carlo Simulation 1.  Simulations where random values are used but the explicit passage of time is not modeled Static simulation  Introduction.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
Buffon’s Needle Todd Savage. Buffon's needle problem asks to find the probability that a needle of length ‘l’ will land on a line, given a floor with.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
1 Theoretical Physics Experimental Physics Equipment, Observation Gambling: Cards, Dice Fast PCs Random- number generators Monte- Carlo methods Experimental.
Monte Carlo and Statistical Methods in HEP Kajari Mazumdar Course on Particle Physics, TIFR, August, 2009.
SCIENTIFIC METHOD THE STEPS.
CS 484 – Artificial Intelligence1 Announcements Lab 4 due today, November 8 Homework 8 due Tuesday, November 13 ½ to 1 page description of final project.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #5 Jose M. Cruz Assistant Professor.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Random Numbers and Simulation  Generating truly random numbers is not possible Programs have been developed to generate pseudo-random numbers Programs.
CHAPTER 1 LESSON 3 & 4 MATH IN SCIENCE + GRAPHS. WHAT ARE SOME MATH SKILLS USED IN SCIENCE? SOME MATH SKILLS USED IN SCIENCE WHEN WORKING WITH DATA INCLUDE.
P T A typical experiment in a real (not virtual) space 1.Some material is put in a container at fixed T & P. 2.The material is in a thermal fluctuation,
1 CSC 8520 Spring Paula Matuszek Kinds of Machine Learning Machine learning techniques can be grouped into several categories, in several ways: –What.
Chapter 1 Measurement, Statistics, and Research. What is Measurement? Measurement is the process of comparing a value to a standard Measurement is the.
Monte Carlo Methods in Statistical Mechanics Aziz Abdellahi CEDER group Materials Basics Lecture : 08/18/
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
Monte Carlo Simulation Presented by Megan Aldrich and Tiffany Timm.
Basic Numerical Procedures Chapter 19 1 Options, Futures, and Other Derivatives, 7th Edition, Copyright © John C. Hull 2008.
7.1: What is a Sampling Distribution?!?!. Section 7.1 What Is a Sampling Distribution? After this section, you should be able to… DISTINGUISH between.
Spam Detection Ethan Grefe December 13, 2013.
Monté Carlo Simulation  Understand the concept of Monté Carlo Simulation  Learn how to use Monté Carlo Simulation to make good decisions  Learn how.
No – Hands Questioning Students are not permitted to raise their hands when a question is asked All students are provided an opportunity to think about.
Lecture 11 Pairs and Vector of Random Variables Last Time Pairs of R.Vs. Marginal PMF (Cont.) Joint PDF Marginal PDF Functions of Two R.Vs Expected Values.
Machine Design Under Uncertainty. Outline Uncertainty in mechanical components Why consider uncertainty Basics of uncertainty Uncertainty analysis for.
Chapter 10 Sampling: Theories, Designs and Plans.
Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.
1 Motifs for Unknown Sites Vasileios Hatzivassiloglou University of Texas at Dallas.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Monte Carlo Simulation Natalia A. Humphreys April 6, 2012 University of Texas at Dallas.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chapter 7 Estimates, Confidence Intervals, and Sample Sizes
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
1 1 Slide © 2004 Thomson/South-Western Simulation n Simulation is one of the most frequently employed management science techniques. n It is typically.
Simulation Chapter 16 of Quantitative Methods for Business, by Anderson, Sweeney and Williams Read sections 16.1, 16.2, 16.3, 16.4, and Appendix 16.1.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Selecting a Sample. outline Difference between sampling in quantitative & qualitative research.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © 2005 Dr. John Lipp.
Intro CS – Probability and Random Numbers Lesson Plan 6a.
1 Capital Budgeting Techniques © 2007 Thomson/South-Western.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 8 1 MER301: Engineering Reliability LECTURE 8: Chapter 4: Statistical Inference,
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A simple parallel algorithm for the MIS problem
1. What is a Monte Carlo method ?
Math in Science + Graphs
Market Research Unit 3 P3.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Statistical Data Mining
Welcome! Knowledge Discovery and Data Mining
Monte Carlo simulation
Presentation transcript:

Smoothing, Sampling, and Simulation Vasileios Hatzivassiloglou University of Texas at Dallas

2 Back to motif finding Apply MLE to the profile data Note that we already used MLE when calculating each cell Now θ is the set of choices for each letter Because each choice is independent of the others, the MLE is –Choose at each position j the letter The algorithm takes O(kn) time

3 Representing profiles Usually stored as logA ij values –historically for ease of calculation –with computers for maintaining accuracy Smoothing –estimated values can be 0 –this will affect calculations, sometimes leading to serious problems (e.g., no solution) –smoothing increases 0 probabilities –it has to reduce other estimated probabilities to account for this

4 Additive smoothing Replace each probability with where is a small number (such as 0.001)

5 Student presentations Scheduled for December 2 and December 4 Each student gets 10 minutes (7 minutes for presentation, 3 minutes for questions) Select project or topic and papers in consultation with the instructor by November 13

6 Potential presentation topics Similarity Statistical, predictive, and generative models Simulation Estimation Classification Clustering Text mining and knowledge discovery

7 Statistical sampling A very general method for solving difficult problems with many variables that cannot be solved directly, but where partial solutions can be “guessed” and improved Commonly known as “Monte Carlo” methods (from the Monaco casino) because one of the pioneers of the technique liked gambling

8 Famous MC applications Buffon’s needle (18th century) Enrico Fermi’s study of the neutron (1930) The Manhattan project (1944) Currently used in –aerodynamics –video games and computer-generated films –share pricing –bioinformatics

9 Buffon’s needle How to calculate π? Consider a random throwing of a needle of length l on a floor with parallel boards of width w (w>l). Then it can be shown that the probability p of the needle crossing a line between boards is By estimating p (experimentally through MLE) one can then calculate π Using this, the estimate 355/113 was obtained (accurate to 7 decimal places)

10 The classification problem Given examples from two or more different classes of objects, and a description of a new object, which class does the new object come from? A lot of variation depending on what kind of description we have available

11 Example classification problems Given samples of spam and non-spam messages, classify an incoming message as spam or non-spam Given samples of paying and non-paying credit card holders, accept or reject a credit card application Given samples of patients who entered a hospital, predict whether a given patient will exit the hospital alive