Constructing a frequency table involves putting data into categories showing the number of observations in each mutually exclusive category. The categories are mutually exclusive in the sense that there is no overlap between categories. Data points fit into one category and only one category. In addition the frequency table categories are collectively exhaustive in that there is some category that fits for every data point.
- Step 1 - Set up groupings called classes
- Determine the highest and lowest data values
- Subtract the lowest value from the highest value
- Divide the resulting difference by 5, 10 and 15. These three results give you a rough idea of the class width to use in your table. The book recommends 5 to 20 classes but I think 20 is too many so I am asking you to use 5 to 15 classes as a guideline.
- Round the widths to values that real people can grasp and select one of the resultant numbers as your class width.
- Set the lower limit of the first class. It must be smaller than the value of the smallest data point.
- Now using your first class starting value determine the lowest value of the second class by adding the class width to the first class's lower limit.
- The upper limit of the first class will be just below the lower limit of the second class.
- Continue this process until the upper limit of your last class exceeds the value of the largest data point.
- Step 2 - Determine the appropriate class for each data point.
- Step 3 - Count the number of data points in each class. That is your class frequency
- Step 4 - Build the table
Consider a problem that asks you to build a frequency distribution. Let's say we have information on the weight of cans of Regular Coca Cola and we want to build a frequency table. The data (weights of coke cans) is as follows:
As a rough rule of thumb we probably want between 5 and 15 classes (Note: that is my view. Texts and instructors vary). Lets try 5, 10 and 15 and get a rough estimate of the class widths for those numbers of classes.
|High value - Low value||Number of classes||Suggested class width||Human adjusted class width|
In this case let's pick the class width as 0.0050. Why did we pick that number? It fits our results, but more importantly human beings can probably look at a table built with that class width and readily understand it. If we chose something like 0.0067 we could still build a table but it would be a nightmare for people to comprehend. So, our class width is 0.0050 because it fits our data and its easy to grasp.
The next issue is what starting value do we use in our table? Out lowest value is 0.7901 so we will start our table at 0.7900. Again we picked a number that lets us build a table that people can read and comprehend. At this point we have the first class starting at 0.7900 lb and the class width as 0.0050 lb. Given this information the second class will start at 0.7900 + 0.0050 or 0.7950. The third class will start at 0.7950 + 0.0050 or 0.8000 and you will continue this process generating lower class limits until your classes allow you to cover all of the data. How about upper class limits? In the example the upper class limits are 0.0001 less than the next lower class limit. For example, class one has an upper limit of 0.7949 and class two has a lower limit of 0.7950. The idea here is to make it clear where you should count a data point. Going through the data in the appendix one point at a time and assigning each point to a class you come up with the result shown.
What can we tell about the data by looking at the frequency table? The most obvious thing is that almost half of the data is in the class from 0.8150 to 0.8149 lbs and virtually all of the data is between 0.8000 and 0.8299 lbs. If we were comparing this data to data for Diet Coke or some version of Pepsi we would look to see how the other sodas weight data is spread across the weight classes.