Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics Introduction to Data.

Similar presentations


Presentation on theme: "Statistics Introduction to Data."— Presentation transcript:

1 Statistics Introduction to Data

2 Contents Applications in Business and Economics Data Data Sources
Descriptive Statistics Statistical Inference Computers and Statistical Analysis

3 Applications in Business and Economics
What areas need to use statistics?

4 Applications in Business and Economics
Accounting Finance Marketing Production Economics Information Systems …………

5 Data Data and Data Sets Elements, Variables, and Observations
Scales of Measurement Qualitative and Quantitative Data Cross-Sectional and Time Series Data

6 Data —Data and data set Data are the facts and figures collected, summarized, analyzed, and interpreted. The data collected in a particular study are referred to as the data set. Examples?

7 Data -- Elements, Variables, and Observations
The are the entities on which data are collected. A is a characteristic of interest for the elements. The set of measurements collected for a particular element is called an The total number of data values in a data set is the number of elements multiplied by the number of variables.

8 Data -- Elements, Variables, and Observations
A data set with n elements contains n observations. The total number of data values in a data set is the number of elements multiplied by the number of variables.

9 Data -- Elements, Variables, and Observations
the data set contains 8 elements. five variables: Exchange, Ticker Symbol, Market Cap, Price/Earnings Ratio, Gross Profit Margin. observations: the first observation (DeWolfe Companies) is AMEX, DWL, 36.4, 8.4, and 36.7.

10 Data -- Elements, Variables, and Observations
Names Stock Annual Earn/ Exchange Sales($M) Share($) Company Dataram EnergySouth Keystone LandCare Psychemedics AMEX OTC NYSE NYSE AMEX Data Set

11 Data-- Scales of Measurement
Nominal scale When the data for a variable consist of labels or names used to identify an attribute of the element. For example, gender, ID number, “exchange variable” in Table 1.1 nominal data can be recorded using a numeric code. We could use “0” for female, and “1” for male.

12 Data-- Scales of Measurement
Nominal scale example: Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on. Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).

13 Data-- Scales of Measurement
Ordinal scale If the data exhibit the properties of nominal data and the order or rank of the data is meaningful. For example, questionnaire: a repair service rating of excellent, good, or poor. Ordinal data can be recorded using a numeric code. We could use 1 for excellent, 2 for good, and 3 for poor.

14 Data-- Scales of Measurement
Ordinal scale example: Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on).

15 Data-- Scales of Measurement
Interval scale The data show the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Example: SAT scores, temperature. Interval data are always numeric.

16 Data-- Scales of Measurement
Interval data example: Three students with SAT scores of 1120, 1050, and 970 can be ranked or ordered in terms of best performance to poorest performance. In addition, the differences between the scores are meaningful. For instance, student 1 scored 1120 – =70 points more than student 2, while student 2 scored 1050 – 970 = 80 points more than student 3.

17 Data-- Scales of Measurement
Ratio scale The data have all the properties of interval data and the ratio of two values is meaningful. Ratio scale requires that a zero value be included to indicate that nothing exists for the variable at the zero point. For example, distance, height, weight, and time use the ratio scale of measurement.

18 Data-- Scales of Measurement
Ratio scale example: Melissa’s college record shows 36 credit hours earned, while Kevin’s record shows 72 credit hours earned. Kevin has twice as many credit hours earned as Melissa.

19 Data --Qualitative and Quantitative Data
Data can be further classified as either qualitative or quantitative. The statistical analysis appropriate for a particular variable depends upon whether the variable is qualitative or quantitative.

20 Data --Qualitative and Quantitative Data
If the variable is qualitative, the statistical analysis is rather limited. In general, there are more alternatives for statistical analysis when the data are quantitative.

21 Data –Qualitative Data
Labels or names used to identify an attribute of each element Qualitative data are often referred to as categorical data Use either the nominal or ordinal scale of measurement Can be either numeric or nonnumeric Appropriate statistical analyses are rather limited

22 Data --Quantitative Data
Quantitative data indicate how many or how much: discrete, if measuring how many continuous, if measuring how much Quantitative data are always numeric. Ordinary arithmetic operations are meaningful for quantitative data.

23 Data-- Scales of Measurement
Qualitative Quantitative Numerical Nonnumerical Numerical Nominal Ordinal Nominal Ordinal Interval Ratio

24 Data-- Cross-Sectional Data
Cross-sectional data are collected at the same or approximately the same point in time. Example: data detailing the number of building permits issued in July 2016 in each of the districts of Tainan City

25 Data– Time series Data Time series data are collected over several time periods. Example: data detailing the number of building permits issued in Tainan City in each of the last 36 months

26 Time Series Data Graph of Time Series Data

27 Data Sources Existing Sources Statistical Studies
Data Acquisition Errors

28 Data Sources Existing Sources Within a firm – almost any department
Business database services – Dow Jones & Co. Government agencies - U.S. Department of Labor Industry associations – Travel Industry Association of America Special-interest organizations – Graduate Management Admission Council Internet – more and more firms

29 Data Sources

30 Data Sources Statistical Studies
In experimental studies the variables of interest are first identified. Then one or more factors are controlled so that data can be obtained about how the factors influence the variables. In observational (nonexperimental) studies no attempt is made to control or influence the variables of interest. a survey is a good example

31 Data Sources Time requirement
Searching for information can be time consuming. Information may no longer be useful by the time it is available Cost of Acquisition Organizations often charge for information even when it is not their primary business activity.

32 Data Sources Data Errors
Using any data that happens to be available or that were acquired with little care can lead to poor and misleading information

33 Descriptive Statistics – After collecting data
Most of the statistical information in newspapers, magazines, company reports, and other publications consists of data that are summarized and presented in a form that is easy to understand. Descriptive statistics are the tabular, graphical, and numerical methods used to summarize data.

34 Descriptive Statistics – Example
Next table is the data for different mini-systems. Brand & Model Price ($) Sound Quality CD Capacity FM Tuning Tape Decks Aiwa NSX-AJ Good 3 Fair 2 JVC FS-SD Good 1 Very Good 0 JVC MX-G Very Good 3 Excellent 2 Panasonic SC-PM Fair 5 Very Good 1 RCA RS Good 3 Poor 0 Sharp CD-BA Good 3 Good 2 Sony CHC-CL1 300 Very Good 3 Very Good 1 Sony MHC-NX1 500 Good 5 Excellent 2 Yamaha GX Very Good 3 Excellent 1 Yamaha MCR-E Very Good 1 Excellent 0

35 Descriptive Statistics – Example
Parts Cost ($) Parts Frequency Percent Frequency 2 13 16 7 5 50 4 26 32 14 10 100

36 Descriptive Statistics – Example
Parts Cost ($) Parts Frequency Percent Frequency 2 13 16 7 5 50 4 26 32 14 10 100

37 Numerical Descriptive Statistics
The most common numerical descriptive statistic is the average (or mean). The average price is ? Descriptive Statistics: Price ($) Total Sum of Variable Count Percent CumPct Mean StDev Sum Squares Minimum Price ($) N for Variable Median Maximum Mode Mode Price ($)

38 Statistical inference
Population - the set of all elements of interest in a particular study Sample - a subset of the population Statistical inference - the process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population Census - collecting data for a population Sample survey - collecting data for a sample

39 Process of Statistical Inference
1. Population consists of all tune-ups. Average cost of parts is unknown. 2. A sample of 50 engine tune-ups is examined. 3. The sample data provide a sample average parts cost of $79 per tune-up. 4. The sample average is used to estimate the population average.

40 Computers and Statistical Analysis
Statistical analysis often involves working with large amounts of data. Computer software is typically used to conduct the analysis. Statistical software packages such as Microsoft Excel and Minitab are capable of data management, analysis, and presentation.

41 Analytics Analytics is the scientific process of transforming data into insight for making better decisions. Techniques: Descriptive analytics: This describes what has happened in the past. Predictive analytics: Use models constructed from past data to predict the future or to assess the impact of one variable on another.

42 Analytics Techniques: Descriptive analytics Predictive analytics
Prescriptive analytics: The set of analytical techniques that yield a best course of action.

43 Big data and Data Mining:
Big data: Large and complex data set. Three V’s of Big data: Volume : Amount of available data Velocity: Speed at which data is collected and processed Variety: Different data types

44 Data warehousing Data warehousing is the process of capturing, storing, and maintaining the data. Organizations obtain large amounts of data on a daily basis by means of magnetic card readers, bar code scanners, point of sale terminals, and touch screen monitors.

45 Data warehousing Wal-Mart captures data on million transactions per day. Visa credit card processes 6,800 payment transactions per second.

46 Data Mining Methods for developing useful decision-making information from large databases. Using a combination of procedures from statistics, mathematics, and computer science, analysts “mine the data” to convert it into useful information.

47 Data Mining The most effective data mining systems use automated procedures to discover relationships in the data and predict future outcomes prompted by general and even vague queries by the user.

48 Data Mining Applications
The major applications of data mining have been made by companies with a strong consumer focus such as retail, financial, and communication firms. Data mining is used to identify related products that customers who have already purchased a specific product are also likely to purchase (and then pop-ups are used to draw attention to those related products).

49 Data Mining Applications
Data mining is also used to identify customers who should receive special discount offers based on their past purchasing volumes.

50 Data Mining Requirements
Statistical methodology such as multiple regression, logistic regression, and correlation are heavily used. Also needed are computer science technologies involving artificial intelligence and machine learning. A significant investment in time and money is required as well.

51 Data Mining Model Reliability
Finding a statistical model that works well for a particular sample of data does not necessarily mean that it can be reliably applied to other data. With the enormous amount of data available, the data set can be partitioned into a training set (for model development) and a test set (for validating the model).

52 Data Mining Model Reliability
There is, however, a danger of overfitting the model to the point that misleading associations and conclusions appear to exist. Careful interpretation of results and extensive testing is important.

53 Ethical Guidelines for Statistical Practice
In a statistical study, unethical behavior can take a variety of forms including: Improper sampling Inappropriate analysis of the data Development of misleading graphs Use of inappropriate summary statistics Biased interpretation of the statistical results

54 Ethical Guidelines for Statistical Practice
One should strive to be fair, thorough, objective, and neutral as you collect, analyze, and present data. As a consumer of statistics, one should also be aware of the possibility of unethical behavior by others.

55 Ethical Guidelines for Statistical Practice
The American Statistical Association developed the report “Ethical Guidelines for Statistical Practice”.

56 Ethical Guidelines for Statistical Practice
It contains 67 guidelines organized into 8 topic areas: Professionalism Responsibilities to Funders, Clients, Employers Responsibilities in Publications and Testimony Responsibilities to Research Subjects

57 Ethical Guidelines for Statistical Practice
It contains 67 guidelines organized into 8 topic areas: Responsibilities to Research Team Colleagues Responsibilities to Other Statisticians/Practitioners Responsibilities Regarding Allegations of Misconduct Responsibilities of Employers Including Organizations, Individuals, Attorneys, or Other Clients


Download ppt "Statistics Introduction to Data."

Similar presentations


Ads by Google