Mathematical Derivation of Probability

Slides:



Advertisements
Similar presentations
Quality control tools
Advertisements

Chapter 8 – Normal Probability Distribution A probability distribution in which the random variable is continuous is a continuous probability distribution.
Ka-fu Wong © 2003 Chap 7- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
CHAPTER 6 Statistical Analysis of Experimental Data
Test for a Mean. Example A city needs $32,000 in annual revenue from parking fees. Parking is free on weekends and holidays; there are 250 days in which.
Chapter 9 Numerical Integration Numerical Integration Application: Normal Distributions Copyright © The McGraw-Hill Companies, Inc. Permission required.
Statistical Process Control
Section 9.3 The Normal Distribution
In this chapter we will consider two very specific random variables where the random event that produces them will be selecting a random sample and analyzing.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
Discrete Distributions The values generated for a random variable must be from a finite distinct set of individual values. For example, based on past observations,
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Chapter 7 Probability and Samples: The Distribution of Sample Means.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
1 7.5 CONTINUOUS RANDOM VARIABLES Continuous data occur when the variable of interest can take on anyone of an infinite number of values over some interval.
Chapter 8: Probability: The Mathematics of Chance Probability Models and Rules 1 Probability Theory  The mathematical description of randomness.  Companies.
Introduction A probability distribution is obtained when probability values are assigned to all possible numerical values of a random variable. It may.
Laboratory Investigations Each lab group will submit a single input. All members of the group will get THE SAME grade UNLESS... You are observed goofing.
Simulations and Normal Distribution Week 4. Simulations Probability Exploration Tool.
Chapter 6 – Continuous Probability Distribution Introduction A probability distribution is obtained when probability values are assigned to all possible.
The Electromagnetic Spectrum High Harmonic Generation
Q1.1 Find the wavelength of light used in this 2- slits interference.
Jonathan Donkers1, Lin Zhang1+
Madhu Karra1, Jason Koglin1,2
Modeling Auger processes DUE to high Intensity, ultrafast x-ray pulses
Sentiment Analysis and Data visualization Techniques of Experiments
Image Processing For Soft X-Ray Self-Seeding
Characterizing Optics with White-Light Interferometry
Matter in Extreme Conditions at LCLS:
X-Ray Shielding Introduction Application Research Conclusions
Crystallography images
Hutch 4 (XCS) Laser System
Anomaly Detection LCLS data: Introduction Research Conclusion
Normal Distribution.
MAKING WAY FOR LCLS II HERRIOTT CELL IGNIS THE NEED FOR LCLS-II
Analysis of Economic Data
Managing Diffraction Beam
Effects of Wrapping Vacuum
LCLS CXI Sample Chambers and Engineering Solutions
Values to Normalize Data
PCB 3043L - General Ecology Data Analysis.
G6PD Complex Structural Study Crystallization Setup
Your Name, Elias Catalano, Robert Sublett, Robin Curtis
Determining a Detector’s Saturation Threshold
HXRSD Helium Enclosure
Crystallography Labeling Using Machine Learning
IT Inventory and Optimization
Electro Optic Sampling of Ultrashort Mid-IR/THz Pulses
Margaret Koulikova1, William Colocho+
Analytical Solutions for Heat Distribution in KB Mirror Systems
Repeatability Test of Attocube Piezo Stages
Analytical Solutions for Temperature
Barron Montgomery1,2, Christopher O’Grady2, Chuck Yoon2, Mark Hunter2+
Elizabeth Van Campen, Barrack Obama2, Stephen Curry1,2, Steve Kerr2+
Advanced Methods of Klystron Phasing
Ryan Dudschus, Alex Wallace, Zachary Lentz
Jennifer Hoover1, William Colocho2
X-Ray Spectrometry Using Cauchois Geometry For Temperature Diagnostics
Redesigning of The SED Calendars
Title of Your Poster Introduction Research Conclusions Acknowledgments
Reconstruction algorithms Challenges & Future work
COS 126 – Atomic Theory of Matter
CONTINUOUS RANDOM VARIABLES AND THE NORMAL DISTRIBUTION
Learning Theory Reza Shadmehr
Copyright © Cengage Learning. All rights reserved.
Mathematical Foundations of BME
Presumptions Subgroups (samples) of data are formed.
Presentation transcript:

Mathematical Derivation of Probability Data Analysis at LCLS Cristina E. Garcia STAR Intern 2Linac Coherent Light Source, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025, USA. +Contact: yoon82@stanford.edu Introduction Python Stimulation The mathematical theory was verified through a python stimulation. First the program generates a random array of values. Each value represents how dark an image is. In order to control the randomness of the values we used the Standard Normal Distribution function which was also a fundamental part of our mathematical derivation. To find a peak, we must find joints of high values like nineteen and sixteen. So first the program locates the values that are greater than or equal to nineteen, which are values that are three sigma away from the mean. It prints out location, actual value, and counts the total found. Then it analyzes the values around it in search of values that are greater than or equal to sixteen, these are the values that are two sigma’s away from the mean. Once the program finds these joints, it prints out the total number of peaks and calculates its percentage from the total. Table 2 shows the results of each trial done on the stimulating program. The computer stimulation also allowed us to solve for the sigma level needed to only find five peaks in an automated peak finding threshold, which was 2.25sigma. Mathematical Derivation of Probability Based on a random Standard Normal distribution function with a fixed mean of 10 and a standard deviation of 3 we can calculate the probability of finding values that are equal to (𝜇+3𝜎) and (𝜇+2𝜎) right next to each other. This would indicate we have found a peak in a random array of images. Picture on the left illustrates a lysozyme protein crystal in real time. Picture on the right presents the same protein crystal under a microscope. Table 1 A protein’s structure determines its function. In order to better understand a protein’s function one must analyze in depth its atomic structure. Serial femtosecond crystallography is the process for which an ultrafast free-electron x-ray laser, like such at LCLS, diffracts a protein crystal into its atomic structural basis and takes pictures of protein crystals in motion. From these atomic structures, scientists can better understand the functions of the diffracted proteins. This is important because the protein structures obtained from this process can also provide valuable information that can help redesign medicinal drugs that will target diseases caused by certain proteins. Table 1 presents the Standard Normal Distribution graph used as reference for the mathematical derivation of probability theory. First, using the Normal Standard Distribution graph above we can find the probability of finding a value that is equal to 𝜇+3𝜎 . 𝑃 𝜇+3𝜎 = 1− .997 = .003   Since we are only looking for the top values about 3𝜎, we have to divide this probability by 2 to get the actual probability of finding that value. So, 𝑃 𝜇+3𝜎 = .003 2 = .0015 This means that in an array of 1000x1000 we would expect around 1,500. Working off this probability we can calculate the probability of finding at least one 2𝜎 pixel next to the 3𝜎 pixel. The probability of finding “at least one” 2𝜎 pixel is equal to one minus the complement of probability of the event not occurring.  𝑃 𝜇+2𝜎 = 1− 1−.0227 4 =1− .9773 4 1−.912=.088 Therefore, 𝑃 𝜇+3𝜎 × 𝑃 𝜇+2𝜎 = .0015×.088=.000132 This means that in an array of 1,000x1,000 we would expect around 132 2 sigma pixels. Table 2 Research Objectives Objective 1: This research aims to analyze protein crystal diffraction data using programs from the crystallography toolbox (CCTBX) at LCLS to solve the protein structure. Objective 2: A protein crystal diffraction pattern contains numerous strong peaks on top of Gaussian noise. We calculated the probability of finding a peak on random Gaussian images using mathematical theory and testing our theory with a computer stimulation. Conclusions The probability obtained from the mathematical theory was coherent with the values presented in the python script stimulation. This verified the accuracy of our theory and also allowed us to determine the sigma level necessary in an automated peak finding threshold. In order to determine the efficiency of the data being presented by the images collected we must look at the amount of false peaks it presents. Through our python stimulation we found that using a 2.25 sigma we could locate an average of 5 false peaks. Methods For data analysis we manually ran python scripts from the CCTBX for indexing and integration. The data collected was inputted into an excel worksheet for analysis. The data was used to analyze the difference in data sets we collect by using different parameters for each run. Also, we check for completeness, number of observations, and the averaging of dark images. Using the standard normal distribution function, we will calculate the probability of finding a peak in a set of images. We will use a python script to verify the derivation and accuracy of the mathematical theory. Also, we will investigate the sigma level needed to obtain a fixed number of peaks. Figure 1 Acknowledgments This project was made possible by the funding provided by the STAR Teacher Research Program. Figure 1 illustrates a portion of the matrix of images around a 3sigma pixel found. Date: 08/05/2017