Download 80.87 Kb.

Chapter 7 Notes Sampling Distributions AP Stats Name__________________________ Parameter – a number that describes some characteristic of the population. The value of a parameter is usually not known because we cannot examine the entire population. Statistic – number that describes some characteristics of a sample. 7.1 What is a Sampling Distribution? μ – for the population mean and for the sample mean. ρ – (greek letter rho) for the population proportion and (phat) the sample proportion is used to represent the unknown parameter ρ EXAMPLE: Identify the population, the parameter, the sample, and the statistics in each of the following settings.
The population is all 10 year old boys, parameter in the 75^{th} percentile, or Q3. The sample is the 50 10 yr old boys included in the sample, the statistic is the sample Q_{3} = 56 inches.
The population is all 12 to 17 yr old in the US, Parameter is the proportion p with cell phones. The sample is the 1102 12 to 17 yr old in the sample; the statistic is the sample proportion with a cell phone phat = 0.71 SAMPLING VARIABILITY sampling variability: the value of a statistic varies in repeated random sampling ACTIVITY: Draw the class Dot Plot below: Proportion of red chips is p = ½
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Record that proportion on the class dot plot Describe the distribution: Definition: The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. Be careful of interchanging the following terms:
Describing Sampling Distributions The fact that statistics from samples have definite sampling distributions allows us to answer the question: “How trustworthy is a statistic as an estimator of the parameter?” To answer this: consider the center spread and shape. The dot plot on the right shows the approximate sampling distribution of phat. The center is very close to the parameter value of 0.5, the parameter value. Understanding a sampling distribution: If we took all possible samples of 20 chips from the population, calculated phat for each sample and then found the mean of all those pvalues, we’d get exactly 0.5. We say that phat is an unbiased estimator of p. Definition: A statistic used to estimate a parameter is an unbiased estimator if the mean of its sampling distribution is equal to the true value of the parameter being estimated. n=100 n=1000 Unbiased does not mean perfect! An unbiased estimator will almost always provide an estimate that is not equal to the value of the population parameter. It is called “unbiased” because in repeated samples, the estimates won’t consistently be too high or consistently too low. Larger samples have a clear advantage over smaller samples. They are much more likely to produce an estimate close to the true value of the parameter. The variability of a statistic is described by the spread of its sampling distribution. This spread is determined primarily by the size of the random sample. Larger samples give smaller spread. The spread of the sampling distribution does not depend on the size of the population, as long as the population is at least 10 times larger than the sample. Bias, variability, and shape We can think of the true value of the population parameter as the bull’s eye on a target and of the sample statistic as an arrow fired at the target. Both bias and variability describe what happens when we take many shots at the target. The lesson about center and spread is clear: given a choice of statistics to estimate an unknown parameter, choose one with no or low bias and minimum variability. Bias means that our aim is off and we consistently miss the bull’seye in the same direction. Our sample values do not center on the population value. High variability means that repeated shots are widely scattered on the target. Repeated samples do not give very similar results. The blue line represents the true number of tanks. Note the different shapes. Which statistic gives the best estimator? Why? Why n1 when calculating sample variance and standard deviation? If we replace 1/n with 1/(n1) then the variance becomes a biased estimator. (a very watered down explanation) P 429 1,3,5,7,9,11,13,1720 Identify the population, the parameter, the sample, and the statistic in each setting.
Population: people who signed a card saying that they intend to quit smoking Parameter of interest: proportion of the population who signed the cards saying they would not smoke who actually quit smoking Sample: a random sample of 1,000 people who signed the cards Sample statistics: = 0.21 3. Hot turkey. Tom is cooking a large turkey breast for a holiday meal. He wants to be sure that the turkey is safe to eat, which requires a minimum internal temperature of 165° F. Tom uses a thermometer to measure the temperature of the turkey meat at four randomly chosen points. The minimum reading in the sample is 170° F. Population: all the turkey meat. Parameter of interest: minimum temperature. Sample: 4 randomly chosen points in the turkey. Sample statistic: sample minimum = 170F. For each boldface number in 5 and 7, 1) state whether it is a parameter or a statistic and 2) use appropriate notation to describe each number for example p = 0.65 5. Get your bearings. A large container of ball bearings has mean diameter 2.5003 centimeters (cm). This is within the specifications for acceptances of the container by the purchases. By chance, an inspector chooses 100 bearings from the container that have mean diameter 2.5009 cm. Because, this is outside the specified limits, the container is mistakenly rejected. μ = 2.5003 is a parameter (related to the population of all the ball bearings in the container) = 2.5009 is a statistic (related to the sample of 100 ball bearings). 7. Unlisted Numbers. A telemarketing firm in LA uses a device that dials residential phone numbers in that city at random. Of the first 100 numbers dialed, 48%, are unlisted. This is not surprising because 52% of all LA residential phones are unlisted. = 0.48 is a statistic (related to the sample of 100 numbers dialed) and p = 0.52 is a parameter (related to the population of all residential phone numbers in Los Angeles). 9. Doing homework. A school newspaper article claims that 60% of the students at a large high school did their assigned homework last week. Some skeptical AP Statistics students want to investigate whether this claim is true, so they choose an SRS of 100 students from the school to interview. What values of the sample proportion would be consistent with the claim that the population proportion of students who completed all their homework is p = 0.60? To find out, we used Fathom software to simulate choosing 250 SRSs of size n = 100 students from a population in which p = 0.60. The figure below is the dotplot of the sample proportion of students who did all their homework. a) Is this the sampling distribution of ? Justify your answer. This is not the exact sampling distribution because that would require a value of for all possible samples of size 100. However, it is an approximation of the sampling distribution that we created through simulation. b) Describe the distribution. Are there any obvious outliers? The distribution is centered at 0.60 and is reasonably symmetric and bellshaped. Values vary from about 0.47 to 0.74. The values at 0.47, 0.73 and 0.74 are outliers. c) Suppose that 45 of the 100 students in the actual sample say that they did all their homework last week. What would you conclude about the newspaper article’s claim? Explain. If we found that only 45 students said that they did all their homework last week, we would be skeptical of the newspaper’s claim that 60% of students did their homework last week. None of the simulated samples had a proportion this low. 11. Doing Homework. Refer to #9. a) Make a graph of the population distribution given that there are 3,000 students in the school. REMEMBER THESE ARE INDIVIDUALS SINCE IT IS A POPULATION DISTRIBUTION b) Sketch a possible graph of the distribution of sample data for the SRS of size 100 taken by the AP Statistics students. REMEMBER THESE ARE INDIVIDUALS SINCE IT IS A SAMPLE DISTRIBUTION 13. During winter months outside temperatures at the Starnes’s cabin in Colorado can stay well below freezing (32° F or 0° C) for weeks at a time. To prevent the pipes from freezing, Mrs. Starnes sets the thermostat at 50° F. The manufacturer claims that the thermostat allows variation in home temperature that follows a Normal distribution with σ = 3° F. To test this claim, Mrs. Starnes programs her digital thermometer to take an SRS of n = 10 readings during a 24 hour period. Suppose the thermostat is working properly and that the actual temperatures in the cabin vary according to a Normal distribution with mean μ = 50° F and standard deviation σ = 3° F. The Fathom screen shot below shows the results of taking 500 SRSs of 10 temperature readings from a population distribution that’s N(50,3) and recording the sample variance s^{2 }each time. a) Describe the approximate sampling distribution The approximate sampling distribution is skewed to the right with a center at 9° F^{2}. The values vary from about 2 to 27.5° F^{2} b) Suppose that the variance from an actual sample is s_{x}^{2} = 25. What would you conclude about the thermostat manufacture’s claim? Explain. A sample variance of 25 is quite large compared with what we would expect, since only one out of 500 SRSs had a variance that high. It suggests that the manufacturer’s claim is false and that the thermostat actually has more variability than claimed. 17. IRS AUDITS the IRS plans to examine an SRS of individual federal income tax returns from each state. One variable of interest is the proportion of returns claiming itemized deductions. The total number of tax returns in each state varies from over 15 million in California to about 240,000 in Wyoming. a) Will the sampling variability of the sample proportion change from state to state if an SRS of 2000 tax returns is selected in each state? Explain your answer. Since the smallest number of total tax returns (i.e., the smallest population) is still more than 10 times the sample size, the variability of the sample proportion will be (approximately) the same for all states. b) Will the sampling variability of the sample proportion change from state to state if an SRS of 1% of all tax returns is selected in each state? Explain your answer. Yes. It will change—the sample taken from Wyoming will be about the same size, but the sample in, for example, California will be considerably larger, and therefore the variability of the sample proportion will be smaller. 18. Predict the election Just before a presidential election, a national opinion poll increases the size of its weekly random sample form the usual 1500 people to 4000 people. a) Does the larger random sample reduce the bias of the poll result? 
search 