General - In the following material we are going to discuss the estimation of population parameters such as the mean or the proportion (who have some attribute). Why do we want to do this sort of thing? If we manufacture a product that has a finite life, such as a light bulb or an automobile tire we might like to know the mean life of our product. This information could be helpful in setting our product warranty or developing an advertising program. Similarly if you were a politician running for election you would be very interested in the proportion of the people who were going to vote for you in the upcoming election.
In the cases cited above we wanted to develop an estimate of the population parameter of interest. In this course we will talk about two types of estimates: Point Estimates and Interval Estimates.
- A point estimate is a single value that can be used to approximate a population parameter. An example might be the mean income of south sound community college students that we get by taking a simple random sample of the population from the population of community college students in the south sound.
- An interval estimate covers a range of values and is used to estimate the true value of a population parameter. For example we might say the we are 95% confident that the interval from $21,223 to $26,748 actually does contain the true mean income of the population of community college students in the south sound.
In developing estimates it will be important for us to differentiate between estimates based on large samples and those based on small samples. The statistical techniques that we will use are different. In this course you call a sample large if n > 30 (the sample size is greater than 30. If n < 30 we will say that we have a small sample. The exact way we will calculate an interval estimate will depend on sample size (large, small), population distribution (normal, not normal) and our knowledge of the population standard deviation (more about this a little later on in this lesson). Now let's take a look at some point estimates.
Point Estimates - Suppose we buy 50 boxes of a particular brand of cereal. The boxes all say that they weigh 15 ounces but we want to see if that is really the population mean. When we weigh our boxes we get a sample mean of 15.06 ounces, this is our point estimate of the mean weight of boxes of this kind of cereal. Another point estimate might be developed by taking 100 Christmas lights and seeing how long they go before they burnout. Suppose our sample of 100 lights has a sample mean of 150 hours. This would be our point estimate of the mean life of the population of this type of bulb.
Interval Estimates -
The trickiest part about interval estimates is understanding what they really tell us. The figure on the right shows the case where we have taken a number of samples, all of the same size and calculated the confidence interval for the mean of a particular population. As you can see from the figure most of the intervals contain the true (but unknown) population mean. But, one of the intervals shown does not contain the true mean. We did everything correctly, we gathered our data carefully, our calculations were flawless, but the true mean is not in our interval. No matter how good a job we do when we we sample there is always a chance that the true mean will not be in the interval that we build. Therefore we talk about a 95% or someother level of confidence. This means that if we were to repeat our sampling process 100 times, using the same sample size and other procedures about 95% of our intervals would contain the true population mean. What we are saying is that the process we use should give us a certain level of confidence that the true mean is in our interval. The true population mean is not a variable, it is what it is, we just don't happen to know the value.
Calculating Confidence Intervals
- The first thing we have to specify is the degree of confidence, this is the probability (relative frequency basis - remember that from earlier in the course?) that the confidence interval actually does contain the true value of the population parameter. The confidence interval is usually expressed as a percent such as 99%, 95%, 90% (or the equivalent decimal form). If can also be expressed as 1 - alpha (Greek letter).
- Now we are going to use information about the confidence level to help us determine the critical values of z (Zalpha/2 in the text). Remeber when we drew a picture of a normal distribution? If not, here (to your right) is such a picture. Note that the area between the lower boundary and the upper boundard is .95 or 95%, this is our confidence intervals probabilty. Our distribution is symmetric so half the area is on one side of the mean and half is on the otherside. In addition note that the tails of the distribution are labeled alpha/2 where alpha plus our confidence interval equal 1.
- Divide the decimal value of the confidence interval by 2. In this case 0.95/2 = 0.4750. Now go into the body of the Z table (currently table A-2 in the text) looking for this area. When you have found the area determine the value of z that goes with this area. In this case z = 1.96. This is our critical value of z also called zalpha/2 in the text (except they used the Greek letter for alpha).
- The next step is to calculate themargin of error, which is represented by the letter E in our text. The figure on the right shows two formulas for calculating E. In the first case (left most) the population standard deviation is known and we can use its value to calculate E. In the second case (right most) we have a large sample, the population standard deviation is not known but we have the sample standard deviation, s, and can use it in our calculation of E (the margin of error).
- To calculate the upper and lower bounds of the confidence interval we will add E to the sample mean (upper bound) and subtract E from the sample mean (lower bound). This is shown in the first figure in the Calculating Confidence Intervals section.
- :Once we have the upper and lower limits of the confidence interval we can write our result as
What if we want to build a confidence interval for a population proportion? The figure on the right shows the appropriate techniques to use for a 95% confidence interval for a proportion. The value of z is found the same way that we found it when we were building a confidence interval for a population mean. However, the value of E, the margin of error is calculated differently. For proportions E is calculated as the square root of the product of the sample proportion times one minus the sample proportion divided by the sample size (Note all but z are under the radical, see figure above). To calculate the actual interval we add E to the sample proportion (upper limit) and subtract E from the sample proportion (lower limit).
Let us now use the above information to build a condifence interval.
Example Problem 1 - Estimation
- This problem gives the mean starting salaries of college graduates who have taken a statistics course (sample mean=$43,704), the standard deviation of the sample data (s=$9879), the sample size (n=100) and the confidence level desired (95%).
- Divide the decimal form of the confidence interval by 2, 0.95/2 = 0.4750.
- Go to the Z table and determine the z value that goes with a probability of 0.4750. In this case z=1.96. This is our critical value of Z
- Now calculate the margin of error, E, as z times standard deviation/ square root of n.
- To finish up we add E to the sample mean to obtain the upper limit of the confidence interval and subtract E from the sample mean to obtain the lower limit of the confidence interval. $43,704 - $1,936 < Population mean < $43,704 + $1,936.
The final result is $41768
- Example Problem 2 - A recent survey found that 308 of 611 voters said that in an election they voted for the candidate who won. The first part of the question asks us to develop a point estimate of the percentage of voters who said they voted for the candidate who won. The second part asks us to find a 98% confidence interval estimate of the percentage of voters who said they voted for the candidate who won.
- Calculate the population proportion and convert it to a percent. 308/611 = 0.504, converted into a percent this says that 50.4% of the voters surveyed said they voted for the winning candidate. This is our point estimate.
- Part 2
- Now we will start to develop our interval estimate. Divide the decimal form of the confidence interval by 2, 0.98/2 = 0.4900.
- Go to the Z table and determine the z value that goes with a probability of either 0.01 or 0.99. In this case z=2.33 (approximately). This is our critical value of Z
- Now calculate the margin of error, E, as z times the square root of the product of the sample proportion and one minus the sample proportion divided by the sample size (remember, all but z are under the radical). In this case E = 2.33 * Square root(0.504 * (1-.504)/611) = 0.047 or 4.7% if we want our answer in percent.
- The interval is calculated as the sample proportion plus E (upper limit) and the sample proportion minus E (lower limit): 0.504+0.047=0.551 or 55.1% and 0.504-0.047=0.457 or 45.7%. The inteval in % is (45.7%, 55.1%)