C HAPTER 4: D ESCRIBING N UMERICAL D ATA H OMEWORK #3
C HAPTER 4 P ROBLEM 54 Cars. A column in this data file gives the engine displacement in liters of 509 vehicles sold in the United States. These vehicles are 2012 models, are not hybrids, have automatic transmissions, and lack turbochargers. Another column in this data file cars gives the rated combined fuel economy (in miles per gallon) for 509 vehicles sold in the United States.
C HAPTER 4 P ROBLEM 54 ( A ) Produce a histogram of these data. Describe and interpret the histogram. The histogram extends from 10 to 37.5 with 2.5 sized bins. The histogram peaks in the bin The histogram is right skewed.
C HAPTER 4 P ROBLEM 54 ( B ) Compare the histogram to the boxplot. What does the histogram tell you that the boxplot does not, and vice versa? The boxplot tells you the mean, median, IQR, and that there is an outlier. The histogram shows more about the shape of the distribution and where observations actually locate.
C HAPTER 4 P ROBLEM 54 ( C ) Find the mean and standard deviation of the rated mileages. How are these related to the histogram, if at all? The mean = and standard deviation =4.81. The mean is the middle and the SD explains the deviation from the mean.
C HAPTER 4 P ROBLEM 54 ( D ) Find the coefficient of variation and briefly interpret its value. CV=24.03 A higher CV (100+) denotes variation. These data are not very spread out.
C HAPTER 4 P ROBLEM 54 ( E ) Identify any unusual values (outliers). Do you think that these are coding errors? There is one outlier at 37 mpg which is the Scion iQ. This probably isn’t a blunder or rogue but truly an interesting outlier
C HAPTER 4 P ROBLEM 54 ( F ) Government standards call for cars to get 27.5 MPG. What percentage of these vehicles meet this goal? (Are all of these vehicles cars?) We have created a variable which is one when the mileage is at least 27.5 and 0 otherwise. Then we simply need to sum the variable and divide by the sample size. This gives approximately 8%.
C HAPTER 4 P ROBLEM 57 Information Industry. This data table includes several characteristics of 428 companies classified as being in the information industry in One column gives the total revenue of the company, in millions of dollars.
C HAPTER 4 P ROBLEM 57( A ) Find the median, mean, and standard deviation of the total revenue of these companies. What units do these summary statistics share? Mean= Median= SD= They are all $1,000s
C HAPTER 4 P ROBLEM 57( B ) Describe the shape of the histogram and boxplot. What does the White Space Rule have to say about the histogram? It is all white space. The data is highly concentrated on the lower end and there are some outliers that are very very high. These outliers conceal much of the data.
C HAPTER 4 P ROBLEM 57( C ) Do the data have any extreme outliers? Identify the company if there’s an extreme outlier. AT&T, Verizon, and Microsoft are all extreme outliers.
C HAPTER 4 P ROBLEM 57( D ) What do these graphs of the distribution of net sales tell you about this industry? Is this industry dominated by a few companies, or is there a level playing field with many comparable rivals? There are several dominant companies at the top and there are many less competitive companies fighting at the bottom.
C HAPTER 4 P ROBLEM 59 Tech Stocks. These data give the monthly returns on stocks in three technology companies: Dell, IBM, and Microsoft. For each month from January 1990 through the end of 2005 (192 months), the data give the return earned by owning a share of stock in each company. The return is the percentage change in the price, divided by 100.
C HAPTER 4 P ROBLEM 59( A ) a. Describe and contrast histograms of the three companies. Be sure to use a common scale for the data axes of the histograms to make the comparison easier and more reliable.
C HAPTER 4 P ROBLEM 59( A ) The histograms, boxplots, and violin plots show that Dell has the highest median and IQR. Microsoft has the second highest median and IQR. IBM has lowest median and interquartile range. Microsoft has the most outliers.
C HAPTER 4 P ROBLEM 59( B ) Find the mean, SD, and coefficient of variation for each set of returns. Are means and SDs useful summaries of variables such as these? The means and standard deviations are regularly used to characterize the expected returns and risks of equity market data. Because this type of data often deviate from the assumptions of a normal distribution, we should exercise case when interpreting them.
C HAPTER 4 P ROBLEM 59( C ) What does comparison of the coefficients of variation tell you about these three stocks? The CVs tell us that Dell varies least, then Microsoft, and IBM varies the most. In this case, however, because the means are so close to zero, the CV’s are not good indicators of risk or scale. Coefficients of variation are valuable only when the means are not close to zero.
C HAPTER 4 P ROBLEM 59( D ) Investors prefer stocks that grow steadily. In that case, what values are ideal for the mean and SD of the returns? For the coefficient of variation? Investors would prefer smaller CVs that denote less variability. Investors would also like to see positively skewed data as well, which leans towards growth. In this case, because the means are so close to zero, the CV’s are not good indicators of risk or scale.
C HAPTER 4 P ROBLEM 59( E ) It is common to find that stocks that have a high average return also tend to be more volatile, with larger swings in price. Is that true for these three stocks? Yes. The highest means/medians have the highest SD.