5.2. Frequency Histograms for Categorical Variables

Often one would like to know the frequency of occurrence of values for a variable in percent. This is similar to a frequency histogram we studies earlier, but a frequency histogram only applies to numerical variables, while the procedure outlines in this section will apply to categorical variables.

This chapter re-introduces the Pivot Table tool we already saw in chapter 3.6. (Frequency Charts for Categorical Variables). This time we will add some more details, but a potion of the section is as before so you might want to review section 3.6. before reading on.

Example: A survey was conducted, asking 474 randomly selected people for their income level. Display a percentage table for the frequencies for all income levels. In other words, compute, in percentage, how many of the 474 people fall in income level 1, how many in income level 2, etc.

As usual, Excel provides just the tool for this job. First, however, here is the spreadsheet that contains the results of this survey:

Selected Employees

Loading this data into Excel, we see that there is one column of interest, entitled "Salary Level". However, that column represents a categorical variable (ordinal or nominal?) so we can not compute a frequency histogram. But since that is the column for which we want to have a percentage table, we need to learn a new procedure for handling categorical data.

The appropriate tool to create percentage tables for category data is the "PivotTable Report ..." menu item under the "Data" menu. Actually, the PivotTable menu item is more flexible than we will need in our course, but it will for sure create the type of tables that we will be interested in. We have, of course, already seen the Pivot table (and chart) tool, but now we'll explore a few more options. And, we will, in fact, see that tool again in the next section.

For Excel 2007 users: The Pivot tool is found as the first button of the "Insert" ribbon. It has a slightly different interface than shown in the screen shots below but you should be able to figure it out. For detailed assistance - if you are using Excel 2007 - you could try this helpful video.

The current default "Microsoft Excel list or database" is what we want, so press "Next".

You should see the final dialog window asking you where the output should go. Accept the default of placing the output in a new worksheet and press "Finish".

You will see a "potential frequency/percentage table" as a new worksheet, containing labels such as "Drop Row Field Here", "Drop Column Fields Here", etc. There will also be a floating window containing the available variables from your data, in our case "Gender", "Salary Level", "Years of Education", etc. You can "drag-and-drop" these variables to the various slots in the table to create a variety of useful tables for data analysis.

Note that your picture might look slightly different. Make sure, though, that you see two "buttons", one called "Salary Level" (which applies to the first column) and "Count of Salary Level" (which should apply to the second column). We see, for example, that there were 33 people in the salary range of $10K to 20K, 230 in the next salary range, and so forth. Of course raw numbers are not so useful, so we would like to convert them to percentages.

Here you can specify what computation you want to show in the data area of your table. In most cases, Count is the preferred selection, but  you do need to change the format for the count.

Finally we have constructed our complete frequency chart for the ordinal variable "salary level" as follows:

From the table we can see, for example, that 6.96% + 48.52% = 55.48% of employees earn $30,000 or less.

Example: Use the same Excel data set to find the percentage of males and females that took part in this survey, as well as the percentage of the various job categories.

The procedure is similar to above:

  1. Choose the "PivotTable Report ..." wizzard from the "Data" menu
  2. Select the appropriate column as input
  3. Put that variable in the "row" as well as the "data" field of the table in the wizzard's dialog box
  4. Choose the "% of total" option for the "data"
  5. "Finish" the tables

Here are the resulting tables so that you can check your own results:

 

 

Note: Once you have such a table - Excel calls it a Pivot Table - you can change the categories (variables) to be displayed by dragging them in and out of the table. Maybe you could try to experiment to create a percentage table relating salary level with gender (sex)?