Frequncy Tables

Construction of a Frequency Table

Constructing a frequency table involves putting data into categories showing the number of observations in each mutually exclusive category. The categories are mutually exclusive in the sense that there is no overlap between categories. Data points fit into one category and only one category. In addition the frequency table categories are collectively exhaustive in that there is some category that fits for every data point.

Steps in building a frequency table

Step 1 - Set up groupings called classes

Determine the highest and lowest data values
Subtract the lowest value from the highest value
Divide the resulting difference by 5, 10 and 15. These three results give you a rough idea of the class width to use in your table. The book recommends 5 to 20 classes but I think 20 is too many so I am asking you to use 5 to 15 classes as a guideline.
Round the widths to values that real people can grasp and select one of the resultant numbers as your class width.
Set the lower limit of the first class. It must be smaller than the value of the smallest data point.
Now using your first class starting value determine the lowest value of the second class by adding the class width to the first class's lower limit.
The upper limit of the first class will be just below the lower limit of the second class.
Continue this process until the upper limit of your last class exceeds the value of the largest data point.

Step 2 - Determine the appropriate class for each data point.
Step 3 - Count the number of data points in each class. That is your class frequency
Step 4 - Build the table

Example Frequency Table

Consider a problem that asks you to build a frequency distribution. Let's say we have information on the weight of cans of Regular Coca Cola and we want to build a frequency table. The data (weights of coke cans) is as follows:

Weight Regular Coke (pounds)

0.8192 0.8150 0.8163 0.8211 0.8181 0.8247

0.8062 0.8128 0.8172 0.8110 0.8251 0.8264

0.7901 0.8244 0.8073 0.8079 0.8044 0.8170

0.8161 0.8194 0.8189 0.8194 0.8176 0.8284

0.8165 0.8143 0.8229 0.8150 0.8152 0.8244

0.8207 0.8152 0.8126 0.8295 0.8161 0.8192

Weight Regular Coke (pounds)
0.8192	0.8150	0.8163	0.8211	0.8181	0.8247
0.8062	0.8128	0.8172	0.8110	0.8251	0.8264
0.7901	0.8244	0.8073	0.8079	0.8044	0.8170
0.8161	0.8194	0.8189	0.8194	0.8176	0.8284
0.8165	0.8143	0.8229	0.8150	0.8152	0.8244
0.8207	0.8152	0.8126	0.8295	0.8161	0.8192

As a rough rule of thumb we probably want between 5 and 15 classes (Note: that is my view. Texts and instructors vary). Lets try 5, 10 and 15 and get a rough estimate of the class widths for those numbers of classes.
Possible Class Widths
High value - Low value Number of classes Suggested class width Human adjusted class width

0.0343 5 0.0067 0.0050

0.0343 10 0.0034 0.0050

0.0343 15 0.0023 0.0025

Possible Class Widths
High value - Low value	Number of classes	Suggested class width	Human adjusted class width
0.0343	5	0.0067	0.0050
0.0343	10	0.0034	0.0050
0.0343	15	0.0023	0.0025

In this case let's pick the class width as 0.0050. Why did we pick that number? It fits our results, but more importantly human beings can probably look at a table built with that class width and readily understand it. If we chose something like 0.0067 we could still build a table but it would be a nightmare for people to comprehend. So, our class width is 0.0050 because it fits our data and its easy to grasp.

The next issue is what starting value do we use in our table? Out lowest value is 0.7901 so we will start our table at 0.7900. Again we picked a number that lets us build a table that people can read and comprehend. At this point we have the first class starting at 0.7900 lb and the class width as 0.0050 lb. Given this information the second class will start at 0.7900 + 0.0050 or 0.7950. The third class will start at 0.7950 + 0.0050 or 0.8000 and you will continue this process generating lower class limits until your classes allow you to cover all of the data. How about upper class limits? In the example the upper class limits are 0.0001 less than the next lower class limit. For example, class one has an upper limit of 0.7949 and class two has a lower limit of 0.7950. The idea here is to make it clear where you should count a data point. Going through the data in the appendix one point at a time and assigning each point to a class you come up with the result shown.
Weights of Regular Coke
Class Frequency
0.7900-0.7949 1
0.7950-0.7999 0
0.8000-0.8049 1
0.8050-0.8099 3
0.8100-0.8149 4
0.8150-0.8149 17
0.8200-0.8249 6
0.8250-0.8299 4

Weights of Regular Coke
Class	Frequency
0.7900-0.7949	1
0.7950-0.7999	0
0.8000-0.8049	1
0.8050-0.8099	3
0.8100-0.8149	4
0.8150-0.8149	17
0.8200-0.8249	6
0.8250-0.8299	4

What can we tell about the data by looking at the frequency table? The most obvious thing is that almost half of the data is in the class from 0.8150 to 0.8149 lbs and virtually all of the data is between 0.8000 and 0.8299 lbs. If we were comparing this data to data for Diet Coke or some version of Pepsi we would look to see how the other sodas weight data is spread across the weight classes.