3.6. Frequency Charts for Categorical Variables

Often one would like to know the frequency of occurrence of values for a variable in percent. This is similar to a frequency histogram we studies earlier, but a histogram only applies to numerical variables, while the procedure outlines in this section will apply to categorical variables. Unfortunately the procedure is somewhat lengthy, but with a little bit of practice it should not be too bad.

Example: A survey was conducted in the summer of 2004, asking several students in a statistics course a number of questions about their background and musical taste. The data can be found by clicking on the link below. Display a bar chart for the race of the students. In other words, compute how many of the students are white, black, hispanic, etc. and display those figures in a bar chart.

Here is the spreadsheet that contains the results of this survey:

Student Survey
Loading this data into Excel, we see that there is one column of interest, entitled "Race". However, that column represents a categorical variable (ordinal or nominal?) so we can not compute a frequency histogram. Also, the values are not numeric so we can't ask Excel to automatically add up all the "hispanics" (for example).
Spreadsheet Data
But since that column does contain the data we want to display, we need to learn a new procedure for handling categorical data. The procedure should automatically count the frequencies of the various races and present those counts in a bar chart.

Before we figure out how Excel can do this automatically let's simply do it by hand. Inspecting the data we see that there are 5 categories, White, Black, Hispanic, Pacific Islander, and Other. We type these categories into an empty part of the Excel spreadsheet and manually count how many people in each category are contained in our data. We add these counts, or frequencies, next to each category manually:

Spreadsheet Data
Now it should be easy to create the appropriate bar chart - make sure to do it, it works just as described in the previous section on creating simple bar charts.

Our manual procedure barely worked because we did not have that much data. For large data sets we need to figure out an automatic procedure to create a table of frequencies and the associated bar chart. Fortunately, Excel has just such a procedure, which we will now explore.

The appropriate tool to create frequency tables for category data is the "PivotTable / Chart Report ..." menu item under the "Data" menu.

For Excel 2007 users: The Pivot tool is found as the first button of the "Insert" ribbon. It has a slightly different interface than shown in the screen shots below but you should be able to figure it out. For detailed assistance - if you are using Excel 2007 - you could try this helpful tutorial.

The PivotTable menu item is actually more flexible than we will need in our course, but it will for sure create the type of tables that we will be interested in. We will, in fact, see that tool again subsequent sections.

The current default "Microsoft Excel list or database" is what we want, so press "Next". (Excel 2007 skips this step)

You should see the final dialog window asking you where the output should go. Accept the default of placing the output in a new worksheet and press "Finish".

You will see a "potential frequency/percentage table" as a new worksheet, containing labels such as "Drop Row Field Here", "Drop Column Fields Here", etc., but no data values are yet contained in the table. There will also be a floating window containing the available variables from your data, in our case "ID", "Sex", "Weight", "Height", "Race", etc. You can "drag-and-drop" these variables to the various slots in the table to create a variety of useful tables for data analysis. (Excel 2007: Your 'field list' my look different but you can still 'drag-and-drop variables as described next)

Pivot Table
  • First drag the variable "Race" from the floating window into the "Drop Row Fields" area of the table. Your table will adjust, showing you all available "Race" categories but as for now no frequencies (counts) yet.
  • Next, again drag the variable "Race" fro
  • m the floating window, but this time drag it to the "Drop Data Item" area in the middle. You will finally see the counts of how many occurrences fell inside each race category.
This should create a table similar to the one we created manually before, except this time it includes 1 blank response. For extra credit, see if you can eliminate the "blank" response row. Hint: maybe you can find a drop-down menu somewhere where you can 'uncheck' unwanted categories. Also, when you double-click the "Count of race" label in the table you can specify exactly what type of counts should be shown and in which way it should be formatted. Try for example to get your counts to appear as percent of the overall total.
In subsequent sections we'll revisit the Pivot Table tool and investigate additional options and possibilities. For now, please practice to create: You should of course use the Pivot tool, not count the frequencies manually. For your reference, the charts are as follows:
Pivot Table