Data Analysis Examples Anthony E. Butterfield CH EN
#1: The Normal PDF Your coworker tells you the temperature fluctuations of the outlet temperature from a certain coal gassifier have an average of 1304 K and keep within 12 K of that mean for 95% of her measurements, over months of operation. If we assume the temperature measurements are normally distributed, what is the standard deviation and what are the odds that a temperature measurement would be above 1310 K? T = 1304 ± 12 K (95% Confidence Level)
Normal Distribution Probability density function (PDF):
#1: The Normal PDF
#2: Error Propagation In a falling bead viscometer, the viscosity may be found by the following equation: Where r is the bead radius, g is gravitational acceleration, V is the terminal velocity, B is the bead density and F is the fluid density. If we find, within a 95% confidence level, that the bead density is 2 ± 0.1 g/cm3, the radius is 3 ± 0.1 mm, the fluid density is 1.1 ± 0.2 g/cm3, and, after terminal velocity is achieved, the bead falls 10 ± 0.2 cm in 12 ± 0.5 seconds. What is the calculated viscosity and the uncertainty in its value? Which measurement is the greatest source of error?
#2: Error Propagation A couple options:
ValueCIUnitsValueCIUnitsf g m/s^ cm/s^2f g/cm/s BB 20.1g/cm^320.1g/cm^3f g/cm/s FF g/cm^ g/cm^3f g/cm/s r30.1mm cmf g/cm/s d100.2cm100.2cmf g/cm/s t120.5s120.5sf g/cm/s f g/cm/s i(f0-fi)^ f0sum^.5 sum Viscosity ± g/cm/s sum^ #2: Error Propagation
#3: Log Normal 2. You find the following particle size distributions from a spray dryer experiment: Table of data If we were to assume this distribution of particle sizes is log-normal, what would be the mean and standard deviation for the log- normal pdf? Nonlinear fitting problem, like #6.
#3: Log Normal Range Max (um) CountPercentage
#4: Hypothesis Testing On a certain stage of a distillation column theory predicts the ethanol concentration should be 27%. You take the following measurements over several runs: What is the likelihood that your measurements match theory? Percent Ethanol
#4: Hypothesis Testing Student’s T-Test. Mean = StDev = Degrees of Freedom v = n a – 1 = = 9
#4: Hypothesis Testing T-Statistic:
#4: Hypothesis Testing Use t-statistic in CDF to find probability. Answer = 9.6%
#5: Hypothesis Testing 2 You are measuring the effectiveness of a new catalyst on a reaction with a great deal of normally distributed variability. You measure the time to 99% conversion of your reactants with both your new and old catalyst for several experimental runs and find the following data: Given this data, what is the probability that the new catalyst is more effective than the old? What is the probability that they are equally effective? Old (min) New (min)
#5: Hypothesis Testing 2 Mean A = 10.25, Mean B = 9.50 StDev A = 1.071, StDev B = Number A = 22, Number B = 20 Degrees of Freedom v = n a + n b – 2 = 40
#5: Hypothesis Testing 2 T-Statistic:
#5: Hypothesis Testing 2 Simple rule: – Greater or less than tests use one tail (two unequal areas) and you can easily know which % you want to use by looking at the means. – Equal test uses two equal tails. For T-CDF with v = 40 and at t- statistic of , P = 2.7%. P that new catalyst is more effective is a one tail test. More effective (one tail) = 100% - 2.7% = 97% Equal (two tail) = 2*2.7% = 5%
#6: Non-Linear Fit The rate of population growth in a bacteria culture are found to be: It is thought that this data could be fit to the equation: Rate=b1*sin(b2*t) where b1 and b2 are constants to be determined and t is time. Determine the least squares estimated values for b1 and b2 and give an appropriate confidence interval for a confidence level of 90%. Also, what would you anticipate the rate to be at 24 hr? What would the confidence interval for a 95% confidence level be at 24 hr? Time (hr) Rate (SRU)
#6: Non-Linear Fit
%Anthony Butterfield 2009 %Example of nonlinear fit with CIs clear close all b(1)=1/3; b(2)=1; re=0.1; %random noise strength x=linspace(0,6,20)'; %x data for fitting x2=linspace(0,6,100)'; %x data for plotting n=length(x); y=b(1)*sin(b(2)*x)+re*randn(n,1); %y data for fitting, note the random error added in to make it realistic yt=b(1)*sin(b(2)*x2); %theoretical y data for plotting [beta r 1]); %numerically performs a nonlinear fit bci = nlparci(beta,r,J); %returns the c.i. for the parameters, beta [ypred,delta] = %returns a predicted y and the c.i. for each y [ypred,delta] = %returns a predicted y and the c.i. for each y disp('Fit to equation: y = b1 sin(b2 * x)') disp(' x data y data') for i=1:n txt=sprintf(' %5.3f %5.3f',x(i),y(i)); disp(txt) end txt=sprintf('b1 was %3.1f, and is estimated to be: %f ± %f (95% CL)',b(1),beta(1),abs(beta(1)-bci(1,1))); disp(txt) txt=sprintf('b2 was %3.1f, and is estimated to be: %f ± %f (95% CL)',b(2),beta(2),abs(beta(2)-bci(2,1))); disp(txt) figure(1) hold on grid on scatter(x,y,10,'r') plot(x2,yt,'Color',[ ]) %just wanted to give you an example of how to change the line color to something not preset plot(x2,ypred,'b',x2,ypred+delta,'b:',x2,ypred-delta,'b:') hold off
#6: Non-Linear Fit nlparci In “theory” b1 = 0.3; estimated b1 = 0.35 ± 0.05 (90% CL) In “theory” b2 = 1.0; estimated b2 = 1.04 ± 0.04 (90% CL) nlpredci At 24 hr “theory” predicts: Rate = Fit predicts: Rate = ± (95% CL)