CHEP 2016 10-14 October 2016, San Francisco, CA, USA

Slides:



Advertisements
Similar presentations
Income Inequality: Measures, Estimates and Policy Illustrations
Advertisements

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Case Studies Instructor Paulo Alencar.
Prediction of fault-proneness at early phase in object-oriented development Toshihiro Kamiya †, Shinji Kusumoto † and Katsuro Inoue †‡ † Osaka University.
Lecture 10 World Income Inequality: past, present and future. Read Outline to Chapter 11.
© 2003 By Default!Slide 1 Inequality Measures Celia M. Reyes Introduction to Poverty Analysis NAI, Beijing, China Nov. 1-8, 2005.
Analysis of CK Metrics “Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults” Yuming Zhou and Hareton Leung,
What is Software Engineering? And why is it so hard?
Lecture 12 World Income Inequality: past, present and future.
Inequalities of Development Lorenz Curve and Gini Coefficient
Distribution of Income & Income Inequality The Lorenz Curve & The Gini Index.
1 NASA OSMA SAS02 Software Reliability Modeling: Traditional and Non-Parametric Dolores R. Wallace Victor Laing SRS Information Services Software Assurance.
S T A M © 2000, KPA Ltd. Software Trouble Assessment Matrix Software Trouble Assessment Matrix *This presentation is extracted from SOFTWARE PROCESS QUALITY:
1 The student will learn about: §4.6 Applications to Economics. producers’ surplus, and consumers’ surplus, the Gini index.
Physics data management tools: computational evolutions and benchmarks Mincheol Han 1, Chan-Hyeung Kim 1, Lorenzo Moneta 2, Maria Grazia Pia 3, Hee Seo.
Business Forecasting Used to try to predict the future Uses two main methods: Qualitative – seeking opinions on which to base decision making – Consumer.
Maria Grazia Pia, INFN Genova Methods and techniques for Monte Carlo physics validation MC April 2015, Nashville, TN, USA C. Choi, M. C. Han,
Non-functional and functional requirements: are they equally relevant? Herwig Mannaert.
International Health Policy Program -Thailand Using STATA 10.0 for Health Equity Analysis Rachid Janta Vuthiphan Wongmongkol 4/12/2008 IHPP Meeting Room.
EE325 Introductory Econometrics1 Welcome to EE325 Introductory Econometrics Introduction Why study Econometrics? What is Econometrics? Methodology of Econometrics.
Implementing a dual readout calorimeter in SLIC and testing Geant4 Physics Hans Wenzel Fermilab Friday, 2 nd October 2009 ALCPG 2009.
Inequalities of Development Lorenz Curve and Gini Coefficient
MORE TARGETING, LESS REDISTRIBUTION? AN ENQUIRY INTO THE ROLE OF POLICY DESIGN IVE MARX, LINA SALANAUSKAITE, GERLINDE VERBIST CENTRUM VOOR SOCIAAL BELEID.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Session 5 Review Today Inequality measures Four basic axioms Lorenz
Alexander Serebrenik and Mark van den Brand Theil index for aggregation of software metrics values.
Measurement of income distribution. Income distribution Income distribution refers to the way the nation’s “income cake” is divided or shared between.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Equity, then Social Insurance … Allen C. Goodman © 2013.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Precision analysis of Geant4 condensed transport effects on energy deposition in detectors M. Batič 1,2, G. Hoff 1,3, M. G. Pia 1 1 INFN Sezione di Genova,
Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA
WERST – Methodology Group
How free markets create & divide wealth
Maria Grazia Pia, INFN Genova Statistics Toolkit Project Maria Grazia Pia, INFN Genova AIDA Workshop.
13-1 Economics: Theory Through Applications This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported.
INCOME INEQUALITY IN INDIA
Physics Data Libraries: Content and Algorithms for Improved Monte Carlo Simulation Physics data libraries play an important role in Monte Carlo simulation:
Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.
Dealing With Concept Drifts in Process Mining. Abstract Although most business processes change over time, contemporary process mining techniques tend.
Introduction to statistics Definitions Why is statistics important?
How free markets create & divide wealth
1 Measuring Poverty: Inequality Measures Charting Inequality Share of Expenditure of Poor Dispersion Ratios Lorenz Curve Gini Coefficient Theil Index Comparisons.
A Hierarchical Model for Object-Oriented Design Quality Assessment
Inequalities of Development Lorenz Curve
Assessment of Geant4 Software Quality
A new perspective on global carbon emission inequality: insights from global interpersonal carbon Gini-index Presented By: Tianpeng Wang Institute of Energy-Environment-Economic,
International Economics Association
Cleveland Intersection – Predicting the Number of Competitors, Strength of Competition & Performance of Competitors in Differentiated Systems Working Drafts.
Measurement 12 Inequality.
Evaluation of IR Systems
An Investigation of Market Dynamics and Wealth Distributions
A new perspective on global carbon emission inequality: insights from a global interpersonal carbon Gini-index Presented By: Tianpeng Wang Institute of.
Theoretical Basis for Statistical Process Control (SPC)
ISO 9000 Series A set of international standards on quality management and quality assurance developed to help companies effectively document the quality.
Object-Oriented Metrics
Design Metrics Software Engineering Fall 2003
Design Metrics Software Engineering Fall 2003
Model-Driven Analysis Frameworks for Embedded Systems
Introductory Econometrics
Basic Statistical Terms
Summary and Recommendations
Automated Fitness Guided Fault Localization
An examination of the purpose and techniques of inequality measurement
Lecture 1 Cameron Kaplan
Trend assessment (A. V, 2.4.4) Identification of trends in pollutants
GDP and beyond Robin Lynch
Summary and Recommendations
Comparison of data distributions: the power of Goodness-of-Fit Tests
GDP and beyond Robin Lynch
Presentation transcript:

CHEP 2016 10-14 October 2016, San Francisco, CA, USA Application of econometric and ecology analysis methods in physics software Maria Grazia Pia, INFN Genova, Italy M. C. Han, G. Hoff, C. H. Kim, S. H. Kim, E. Ronchieri, P. Saracco Hanyang University, Seoul, Korea - INFN CNAF, Bologna, Italy - CAPES, Brasilia, Brazil Foreword Due to limited time allocation, there is room only to highlight some basic concepts and to illustrate them in a few examples of application

multiple perspectives Treat a software system as a sociosystem/ecosystem Software development environment Observables produced by the software Apply data analysis concepts, methods and techniques developed in economy/ecology multiple perspectives [adapt] Quantitative analysis: Inference Measures

Government expenditure for tertiary education Trend Government expenditure for tertiary education

Trend analysis Statistical techniques to identify patterns in a series of data Ability to deal with noise Used to forecast the future (although it does not predict the future) But also to analyze past events Tests for statistical inference: parametric and non parametric Test for randomness: H0 = random, H1 = monotonic trend/upward/downward Mann-Kendall test, Cox-Stuart test, Bartels test etc. Related: change point detection

Lehman laws Programs, Life Cycles, and Laws of Software Evolution, M. M. Lehman, Programs, Life Cycles, and Laws of Software Evolution, Proc. IEEE, vol. 68, no. 9, pp. 1060-1076, 1980 Lehman laws Continuing Change A program that is used and that as an implementation of its specification reflects some other reality, undergoes continual change or becomes progressively less useful. The change or decay process continues until it is judged more cost effective to replace the system with a recreated version. Increasing Complexity As an evolving program is continually changed, its complexity, reflecting deteriorating structure, increases unless work is done to maintain or reduce it.

Coupling between classes Chidamber and Kemerer CBO Coupling between classes Excessive coupling between object classes is detrimental to modular design and prevents reuse A high coupling has been found to indicate fault-proneness High CBO is undesirable Abstract H0: random H1: upward Leaf H0: random H1: downward Mann-Kendall test p-value < 0.01 p-value < 0.01 How high is too high? CBO>14 H. Sahraoui et al., “Can Metrics Help to Bridge the Gap Between the Improvement of OO Design Quality and Its Automation?” Proc. Int. Conf. Software Maintenance, pp. 154-262, 2000

Do I really need a statistical test to see a trend? I can see a trend just by looking at the plot! What about seeing trends in 26581 plots? How to objectively quantify what different eyes see? How to aggregate the trends observed in various plots?

Chidamber and Kemerer OO metrics Abstract classes H0: random H1: upward p-value < 0.01

H0: random – H1: upward p-value < 0.01

Trends in software functionality Electron backscattering simulation with Geant4 H0: randomness H1: upward trend Mann-Kendall test p-value = 0.003 H0: randomness H1: downward trend Mann-Kendall test p-value = 0.002 Trend of compatibility with experiment as a function of Geant4 version for different physics configurations Helpful guidance in algorithm development, optimization, regression testing, software maintenance…

Income inequality measures Gini index The 62 richest people in the world are worth more than the poorest 50% Cumulative fraction of population Cumulative fraction of wealth Lorenz Lperfect equality ½ Gini 0 ≤ P ≤ 1 more unequal society 1 C. Gini, Variabilità e mutabilità : contributo allo studio delle distribuzioni e delle relazioni statistiche, 1912

Pietra index P = max(Lpe(x) – L(x)) 0 ≤ P ≤ 1 Cumulative fraction of population Cumulative fraction of wealth Lorenz Lperfect equality Pietra index 0 ≤ P ≤ 1 AKA Ricci-Schutz index, Hoover index P = max(Lpe(x) – L(x)) Used in derivative markets as a benchmark measure of statistical heterogeneity Counterpart of Kolmogorov-Smirnov statistic It can be interpreted as the proportion of income that has to be transferred from those above the mean to those below the mean in order to achieve an equal distribution Emphasis on individual-mean interaction

Other inequality measures Theil index si = share of the ith group in total income n = total number of income groups ∞ More equal society The same as redundancy in information theory: the maximum possible entropy of the data minus the observed entropy Atkinson index e = sensitivity parameter 0 ≤ I ≤ 1 Used to calculate the proportion of total income that would be required to achieve an equal level of social welfare as at present, if incomes were perfectly distributed Theil I, Theil II, Kolm index, coefficient of variation, generalized entropy and more…

Halstead mental effort Measure of the number of elemental mental discriminations necessary to create or understand a class Gini Pietra Atkinson Theil II

concentrated software complexity distributed software complexity Gini = 0.90 Gini = 0.37 concentrated software complexity distributed software complexity evolution of concentration Gini = 0.87 Gini = 0.25

Gini and galaxies Aggregate the capabilities of Geant4 PhysicsLists to reproduce experimental observables

Information theory background Other econometric analysis methods: Concentration, Change point Relation with methods used in ecology (e.g. analysis of diversity) Information theory background Comparative evaluation of measures and tests Decomposition of inequality measures by subgroups Methods, applications to physics software and results will be documented in forthcoming papers

Conclusion Statistical methods commonly used in other disciplines can be valuable in software and physics analysis Rich variety of econometric/ecology concepts and techniques Trend, inequality, concentration, diversity, changepoint… Ongoing R&D to explore applications in physics software To characterize software properties To evaluate the behaviour of physics models A few highlights, no time for extensive presentation