Physics-Informatics Looking for Higgs Particle Counting Errors (Continued) January 23 2013 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Γγ->HH Study Report Nozomi Maeda Hiroshima Univ.
Advertisements

Review. 2 TEST A 1.What is the single most important property of vibrating structure? 2. What happens when structure is vibrating in resonance? 3.What.
Partially Observable Markov Decision Process (POMDP)
1 Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC) Workshop HPDC Boston June Geoffrey Fox Community.
The Great Academia/Industry Grid Debate November Geoffrey Fox Community Grids Laboratory Indiana University
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Statistics for Business and Economics
Effect of b-tagging Scale Factors on M bb invariant mass distribution Ricardo Gonçalo.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section 2.2 Graphical Displays of Distributions.  Dot Plots  Histograms: uses bars to show quantity of cases within a range of values  Stem-and-leaf.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
The Indiana University School of Informatics Bobby Schnabel: Dean, Indiana University School of Informatics Presented by Geoffrey Fox: Associate Dean for.
The Bell Shaped Curve By the definition of the bell shaped curve, we expect to find certain percentages of the population between the standard deviations.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
R. Kass/S07 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
LHC’s Second Run Hyunseok Lee 1. 2 ■ Discovery of the Higgs particle.
X-Informatics Introduction: What is Big Data, Data Analytics and X-Informatics? January Geoffrey Fox
Quantitative Skills: Data Analysis and Graphing.
Density Curves Normal Distribution Area under the curve.
Physics-Informatics Looking for Higgs Particle Counting Errors (Continued) January Geoffrey Fox
BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Physics-Informatics Looking for Higgs Particle Counting Errors (Finished) January Geoffrey Fox
I590 Physics Comments. Comments I Hadn’t heard much about this before Contribution to fundamental knowledge generally considered worth it “Particle Physics”
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
Quantitative Skills 1: Graphing
1 Introduction to Dijet Resonance Search Exercise John Paul Chou, Eva Halkiadakis, Robert Harris, Kalanand Mishra and Jason St. John CMS Data Analysis.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Discrete Distributions The values generated for a random variable must be from a finite distinct set of individual values. For example, based on past observations,
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.
Adam Zok Science Undergraduate Laboratory Internship Program August 14, 2008.
Section 2.2 Graphical Displays of Distributions.  Dot Plots  Histograms: uses bars to show quantity of cases within a range of values  Stem-and-leaf.
Computing in Science How computers have evolved to be a part of the scientific community.
40S Applied Math Mr. Knight – Killarney School Slide 1 Unit: Statistics Lesson: ST-6 Confidence Intervals Confidence Intervals Learning Outcome B-4 ST-L6.
Estimation: Confidence Intervals Based in part on Chapter 6 General Business 704.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Bayesian Dark Matter Limits for 8 TeV p-p Collisions at the LHC Cedric Flamant.
Study of pair-produced doubly charged Higgs bosons with a four muon final state at the CMS detector (CMS NOTE 2006/081, Authors : T.Rommerskirchen and.
CMSSM and NUHM Analysis by BayesFITS Leszek Roszkowski* National Centre for Nuclear Research (NCBJ) Warsaw, Poland Leszek Roszkowski *On leave of absence.
LHC Computing, CERN, & Federated Identities
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Data Collection & Analysis ETI 6134 Dr. Karla Moore.
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
Stat 101Dr SaMeH1 Statistics (Stat 101) Associate Professor of Environmental Eng. Civil Engineering Department Engineering College Almajma’ah University.
Python & NetworkX Youn-Hee Han
Engineering College of Engineering Engineering Education Innovation Center Analyzing Measurement Data Rev: , MCAnalyzing Data1.
ATLAS Z-Path Masterclass Masterclass Analysis Intro.
Elliptic flow of D mesons Francesco Prino for the D2H physics analysis group PWG3, April 12 th 2010.
10 January 2008Neil Collins - University of Birmingham 1 Tau Trigger Performance Neil Collins ATLAS UK Physics Meeting Thursday 10 th January 2008.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
BAE 5333 Applied Water Resources Statistics
CHAPTER 1: Picturing Distributions with Graphs
Chapter 2 Organizing Data
Matplotlib.
Big Data Architectures
Services, Security, and Privacy in Cloud Computing
Simple Kmeans Examples
Grid Application Model and Design and Implementation of Grid Services
Indiana University July Geoffrey Fox
Presentation transcript:

Physics-Informatics Looking for Higgs Particle Counting Errors (Continued) January Geoffrey Fox Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington 2013

Top Topics TOPICS Medical/Health (medical records)11 Commerce (Target, Amazon)7 Internet/Search engine data7 Business6 Bio-informatics (metagenomics, forestry, disease)6 Sensors (music, videogames, 911 calls, traffic, automotive)5 Social informatics (social networks, communities)5 Environmental science4 Music Informatics4 Astronomy4 Science (LHC, modeling)3

The LHC produces some 15 petabytes of data per year of all varieties and with the exact value depending on duty factor of accelerator (which is reduced simply to cut electricity cost but also due to malfunction of one or more of the many complex systems) and experiments. The raw data produced by experiments is processed on the LHC Computing Grid, which has some 200,000 Cores arranged in a three level structure. Tier-0 is CERN itself, Tier 1 are national facilities and Tier 2 are regional systems. For example one LHC experiment (CMS) has 7 Tier-1 and 50 Tier-2 facilities. This analysis raw data  reconstructed data  AOD and TAGS  Physics is performed on the multi-tier LHC Computing Grid. Note that every event can be analyzed independently so that many events can be processed in parallel with some concentration operations such as those to gather entries in a histogram. This implies that both Grid and Cloud solutions work with this type of data with currently Grids being the only implementation today. Higgs Event Note LHC lies in a tunnel 27 kilometres (17 mi) in circumference ATLAS Expt

Model

These “Feynman diagrams” define a model. It cannot be calculated exactly but approximate calculations are possible

Event Counting A lot of data analysis consists of setting up a process that gives “events” as a result – Take a survey of what people feel; event is voting of an individual – Build a software system and categorize each of your data; event is this categorization – Sensor Nets; event is result of a sensor measurement Often results are yes/no – Did person vote for Candidate X or not? – Did event fall into a certain bin of histogram or not? – Note event might have a result (cost of Hotel stay or Mass of Higgs) which is histogrammed; the decision as to which bin to go into is a yes/no decision

Generate a Physics Experiment with Python import numpy as np import matplotlib.pyplot as plt testrand = np.random.rand(42000) Base = * np.random.rand(42000) index = ( * (Base-110)/30) > testrand gauss = 2 * np.random.randn(300) +126 Sloping = Base[index] NarrowGauss = 0.5 * np.random.randn(300) +126 total = np.concatenate((Sloping, gauss)) NarrowTotal = np.concatenate((Sloping, NarrowGauss))

Python Resources Python distribution including NumPy and SciPy with plot package matplotlib – Also pandas and iPython Python for Data Analysis Agile Tools for Real World Data By Wes McKinneyWes McKinney Publisher: O'Reilly Media Released: October 2012 Pages: 472

Sloping

Sloping 1% Data

Higgs on its own

Actual Wide Higgs plus Sloping Background

Actual Wide Higgs plus Sloping Background with errors

Actual Wide Higgs plus Sloping Background Data Size divided by 10

Narrow Higgs plus Sloping Background 2 GeV Bins

Narrow Higgs plus Sloping Background 1 GeV Bins

Narrow Higgs plus Sloping Background 0.5 GeV Bins

What Happens if more Higgs produced gaussbig = 2 * np.random.randn(30000) +126 gaussnarrowbig = 0.5 * np.random.randn(30000) +126 totalbig = np.concatenate((Sloping, gaussbig))

30,000 Higgs (Real Width and Narrow)

30,000 Higgs (Real Width) + Background

Simplification with flat background import numpy as np import matplotlib.pyplot as plt Base = * np.random.rand(42000) # Base is set of observations with an expected 2800 background events per bin # Note we assume here flat but in class I used a "sloping" curve that represented experiment better gauss = 2 * np.random.randn(300) +126 # Gauss is Number of Higgs particles simpletotal = np.concatenate((Base, gauss)) # simpletotal is Higgs+Background plt.figure("Total Wide Higgs Bin 2 GeV") values, binedges, junk = plt.hist(simpletotal, bins=15, range =(110,140), alpha = 0.5, color="green") centers = 0.5*(binedges[1:] + binedges[:-1]) # centers is center of each bin # values is number of events in each bin # :-1 is same as :Largest Index-1 # binedges[:-1] gets you lower limit of bin # 1: gives you array starts at second index (labelled 1 as first index 0) # binedges[1:] is upper limit of each bin # Note binedges has Number of Bins + 1 entries; centers has Number of Bins entries errors =sqrt(values) # errors is expected error for each bin plt.hist(Base, bins=15, range =(110,140), alpha = 0.5, color="blue") plt.hist(gauss, bins=15, range =(110,140), alpha = 1.0, color="red") plt.errorbar(centers, values, yerr = errors, ls='None', marker ='x', color = 'black', markersize= 6.0 )

Uniform Background