The frequency distribution

The frequency distribution
The most basic analysis that we can perform on a variable of any type is the frequency distribution. Statistics from the frequency distribution are often used in research articles in the description of the sample. To create a frequency distribution, we make a list of all of the values for the variable, and tally the number of times each value occurs. Once we have counted the number of cases for each of the possible values, we can also compute the percentage of total cases associated with each value by dividing the count for each value by the total number of cases, and converting the decimal fraction to a percentage. Percentage comparisons are often more meaningful than frequency count comparisons, especially when we are comparing samples of different sizes. 1

The percentage is also called the relative frequency or the proportion
The percentage is also called the relative frequency or the proportion. We will use it as the basis for determining the probability that a case in our sample had a particular characteristic. The information from a frequency distribution can be presented as tables, as charts, and as individual statistics cited in a narrative discussion. We will focus on frequency distribution tables, though you should become familiar with the different graphing methods described in the text, since well-designed graphs can effectively and efficiently communicate summary information in a presentation or a paper report. The advantage of graphs for effective and efficient communication is offset by a loss of precision, i.e. detail about specific cases can be lost in a graph. The next slide shows an example of a frequency distribution table and narrative description from a research article. 2

Data analyses were based on data of participants of 16 groups that were conducted between October 1996 and February There were a total of 127 participants in these groups. All participants attended at least seven out of eight group sessions. However, data for 39 participants were not included in this analysis due to missing information in some of the assessments. Respondents entered in this analysis consisted of 88 program participants: 70 men (79.5%) and 18 women (21.5%). The age of the program participants ranged from 19 to 74 years (M = 37.5, SD = 9.8). Program participants were predominantly Caucasian (87.5%), with 6.8% African Americans, 2.3% Native Americans, and 3.4% Hispanic Americans. Participants had attained an average of 12.6 years of education (SD = 1.76; range = 8-19). Regarding the marital status of program participants, 50% were currently married or lived with a partner, 38.6% were divorced or separated, and 11.4% had never married. Among the participants 85.1% were gainfully employed, with approximately half of the participants self-identified as laborers (49.4%), 9.2% professionals, 11.5% service workers, 6.9% students, 2.3% on welfare or disability, 3.4% business owner or self-employed, 2.3% homemakers, and 14.9% unemployed (see Table 1). Role of Self-Determined Goals in Predicting Recidivism in Domestic Violence Offenders Mo Yee Lee, Adriana Uken and John Sebold Research on Social Work Practice 2007; 17; 30

The frequency distribution in SPSS - 1
This table shows the SPSS format for a frequency distribution for the variable polviews in the GSS2000R dataset. The variable name polviews and label THINK OF SELF AS LIBERAL OR CONSERVATIVE are used as the title for the table. The numbers at the beginning of each row of the table (1, 2, 3, etc.) are the numeric values or codes for the variable. These are the numbers that appear in the columns of the data editor for each variable. SPSS lists the values for missing data separate from the codes for valid data. On my computer the options for Pivot Table Labeling are to show the name and label for the variable and to show values and labels for individual values. If your options are set differently, your table will not have the same appearance.

If Value Labels have been added for the variable, the text labels will be shown in the table, as well as the numeric codes. It is generally useful to add value labels for nominal and ordinal variables, since this reduces the chance of making a misstatement in our interpretation of the results by mixing up codes and labels.

The frequency table may list three rows of totals: The total frequency and percent and total percent for valid cases (258, 100.0%) The total number and percentage of the total for missing cases, and The total number of cases that are valid or missing. We generally report the total number of cases, the total number or percent that have missing data, and the total number of valid cases that are used in the analysis.

The SPSS frequency distribution contains four columns of statistics. The first column contains the count for each category or value of the variable, including the missing data categories.

The SPSS frequency distribution has three columns of percentages: Percent of total cases Percent of valid cases Cumulative percent of valid cases The Percent column is the percentage of all categories, including the missing data categories. For example, 4.1% of the cases in this sample answered “DK” to this question (Don’t Know). Beyond citing the percentage of missing cases, we would not usually cite percents from this column.

The Valid Percent column is the column that we generally report. For example, if we said that 5% of our sample said they were extremely liberal, we are usually indicating that 5% of those who answered the question gave this response. Just as we use this column to cite percents, we would use it for proportions and probabilities.

The Cumulative Percent column is used when we want to highlight the percent that had a score below or above a particular value. For example, we might state that 26.4% of the sample thought of themselves as liberal. Cumulative percents are only used for ordinal and interval variables. Nominal categories do not have a natural order that supports summing above or below a particular value.

Producing a frequency distribution in SPSS - 1
To produce a frequency distribution in SPSS, select the Descriptive Statistics | Frequencies… command from the Analyze menu.

Producing a frequency distribution in SPSS - 2
Move the target variable for the frequency distribution from the list box of Variable(s):. Click on the Ok button to complete the request.

Frequency distribution for an ordinal variable
When the variable is an ordinal variable treated as quantitative and contains a limited number of categories, the table shows the frequency count and relative frequency for each of the possible values.

Frequency distribution for interval, quantitative variables
Had we selected a quantitative variable with many different values, such as age, the table has too many categories to be useful. A more meaningful display is a frequency distribution distribution where ranges of values are grouped using class intervals. We will use the SPSS visual binning tool to break the values into categories, or class intervals. In previous versions of SPSS, this was called the visual bander.

Creating class intervals - 1
To create class intervals, we can use the SPSS Visual Binning tool. Select Visual Binning from the Transform menu.

First, move the variable age to the Variables to Bin: list box. Second, click on the Continue button to provide details to SPSS.

First, click on the variable age to move default information to the fields needed for the class intervals.

Creating class intervals – 4
When we click on the variable name, SPSS transfers the variable name and label, proposes a label for the new variable (which it does not name), and draws a histogram of the data.

First, add a name for the banded variable, e.g. ageGroups. Second, note the minimum value of 19 and the maximum of 89. To create attractive intervals, I select a lower bound of 10 for the first class interval, and an upper bound of 90 for the last class interval. With an interval width of 10, I will create 8 intervals that includes the full range of values: Less than 20 (lower bound of 10 + width of 10 = 20) 20-29 30-39 40-49 50-59 60-69 70-79 80 and more While I could specify the exact range for the first and last interval, I am following SPSS’s convention of making them open-ended.

Next, choose Excluded at the option for Upper Endpoints. This is SPSS’s method for creating true class intervals that do not overlap. The first interval will include values less than the lower bound of interval plus the class width. The subsequent intervals will have an upper bound that is less than the lower bound of the next class. SPSS will set the number of decimals needed to make certain a value is eligible for only one interval. If the values have no decimals, a class width of 10 will produce intervals like 10-19, while values with two decimal places will produce intervals like Finally, click on the Make Cutpoints button to enter the information needed to created the classes.

First, the First Cutpoint Location is the upper bound of the first interval, i.e. the lower bound of 10 plus the interval width of 10, which equals 20. Cut points are values which divide intervals, not the intervals themselves. The number of intervals created will be the number of cutpoints plus one. Second, enter 10 as the Width for our intervals. SPSS does not activate the Apply button until we click in the third field for Number of Cutpoints text box, even though our two specified values are sufficient to compute the classes.

When we click on the Number of Cutpoints text box, SPSS automatically computes the number of cut points (7) and the last cut point location (80). The Apply button is now active. Click on it to compute the classes.

SPSS lists the cut points in the grid and displays their positions (the vertical blue lines) in the histogram. To add value labels showing the class intervals, click on the Make Labels button.

SPSS adds value labels that specify the contents of each interval. Click on the OK button to complete the creation of the ageGroup variable.

SPSS confirms your request with this dialog box. Click on the OK button to create the variable.

Frequency distribution for grouped data - 1
To create the frequency distribution for the grouped variable, select the Frequencies command from the Analyze menu. The ageGroups variable is added to the data editor.

Move the target variable, ageGroups, for the frequency distribution from the list box of Variable(s):. Click on the Ok button to complete the request.

The frequency distribution for the grouped variable has fewer categories than the continuous variable.

Frequency Distribution Homework Problems

Frequency Distribution Homework Problems - 1
The problem includes a table to be completed from the statistical output, and narrative statements that might be included in a description of the sample. I find it easier to complete the table first, and then extract information from the table to complete the narrative questions.

The notes provide information about the data set to use, the variables for which we will create frequency distributions (marital, age, sex, and race), and the directions for creating class intervals for the interval variable, age.

The table for age which we created in the example on binning data can be used to complete the age portion of the table.

The percent header (%) tells us to enter percents in the column. First, we enter the number of valid cases for age, 270. Second, we enter each of the decimal fractions in the Valid Percent column from the SPSS output into the corresponding cells in Table 1. Third, we repeat the second step for all of the other age intervals.

To create the frequency distribution for the other variables in the problem, select the Frequencies command from the Analyze menu.

First, move the other three variables mentioned in the problem (marital, sex, and race) to the list box for Variable(s). Second, click on the OK button to produce the output.

SPSS produces frequency tables for the other three variables. We will copy the results from the Valid Percent column in the SPSS tables to Table 1 in the problem.

First, we enter the number of valid cases for marital status, 270. Second, we enter each of the decimal fractions in the Valid Percent column from the SPSS output into the corresponding cells in Table 1. Third, we repeat the second step for all of the other categories for marital status.

First, we enter the number of valid cases for sex, 270. Second, we enter each of the decimal fractions in the Valid Percent column from the SPSS output into the corresponding cells in Table 1. Third, we repeat the second step for females.

Note: in this problem none of the four variables had missing data, and the n for each was 270. This will not usually be the case. Second, we enter each of the decimal fractions in the Valid Percent column from the SPSS output into the corresponding cells in Table 1. First, we enter the number of valid cases for race, 270. Third, we repeat the second step for Black and Other.

With the table completed, we move to the narrative statements. The blank in the first statement asks for the number of survey respondents in the entire data set. In this problem, that corresponds to the number of valid cases for each variable. However, if each of the included variables had some cases with missing data, none of the n’s would be the correct number of cases in the data set.

The total number of cases in the data set is the sum of the number of valid cases and the number of missing cases for any of the variables: e.g = 270.

The number of cases in the data set, 270, is entered into the first sentence. The first blank in the next sentence asks for the number of valid cases for the marital status variable. The number has been entered into Table 1 and we add it to this blank.

The next question asks us to select the category of marital status that was most likely, i.e. had the largest percentage of cases. The most likely category is also referred to as the mode. Clearly, the largest category is married, at 51.1%.

Following the selected category, we enter the percentage who were married from Table: 51.1. Married is selected from the drop down menu.

The next sentence asks us to enter the percentages that compare those who were never married to those who were divorced.

The next paragraph asks for the same information for the age variable.

The first blank in the next sentence asks for the number of valid cases for the marital status variable. The number has been entered into Table 1 and we enter it in this blank. The next drop down menu asks us to select the modal group for age. The largest category for age is 30 to 39 with 24.4% of the subjects.

The remainder of the sentence asks us to enter the percentages for the categories that are mentioned in the narrative statement.

The next sentence asks for similar information for the gender variable. Since Sex has only two categories, the statement comparing category percentages is redundant with the question about the most likely category and will be omitted.

The first blank in the next sentence asks for the number of valid cases for the gender variable. The number has been entered into Table 1 and we enter it in this blank. The next drop down menu ask us to select the modal group for sex.

Following the selected category, we enter the percentage who were female from Table: 58.9. Female is selected from the drop down menu.

The final sentence asks for the same information for the race variable that was asked for the marital status and age variables.

The first blank in the next sentence asks for the number of valid cases for the race variable. The number has been entered into Table 1 and we add it to this blank. The next drop down menu ask us to select the modal group for race. The modal group for race is white, with 84.4% of the respondents.

White is selected from the drop down menu. The remainder of the sentence asks us to enter the percentages for the categories that are mentioned in the narrative statement.

When we have answered all the questions, we click on the Submit button to have the problem graded.

The green shading indicates that all of our answers are correct.

The frequency distribution

Similar presentations

Presentation on theme: "The frequency distribution"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The frequency distribution

Similar presentations

Presentation on theme: "The frequency distribution"— Presentation transcript:

Similar presentations

About project

Feedback