5.2. Frequency Histograms for Categorical Variables
Often one would like to know the frequency of occurrence of values for a variable in percent. This is similar to a frequency histogram we studies earlier, but a frequency histogram only applies to numerical variables, while the procedure outlines in this section will apply to categorical variables.
This chapter re-introduces the Pivot Table tool we already saw in chapter 3.6. (Frequency Charts for Categorical Variables). This time we will add some more details, but a potion of the section is as before so you might want to review section 3.6. before reading on.
Example: A survey was conducted, asking 474 randomly selected people for their income level. Display a percentage table for the frequencies for all income levels. In other words, compute, in percentage, how many of the 474 people fall in income level 1, how many in income level 2, etc.
As usual, Excel provides just the tool for this job. First, however, here is the spreadsheet that contains the results of this survey:
Loading this data into Excel, we see that there is one column of interest, entitled "Salary Level". However, that column represents a categorical variable (ordinal or nominal?) so we can not compute a frequency histogram. But since that is the column for which we want to have a percentage table, we need to learn a new procedure for handling categorical data.
The appropriate tool to create percentage tables for category data is the "PivotTable Report ..." menu item under the "Data" menu. Actually, the PivotTable menu item is more flexible than we will need in our course, but it will for sure create the type of tables that we will be interested in. We have, of course, already seen the Pivot table (and chart) tool, but now we'll explore a few more options. And, we will, in fact, see that tool again in the next section.
For Excel 2007 users: The Pivot tool is found as the first button of the "Insert" ribbon. It has a slightly different interface than shown in the screen shots below but you should be able to figure it out. For detailed assistance - if you are using Excel 2007 - you could try this helpful video.
- Load the above spreadsheet into Excel
- Select "PivotTable Report ..." from the "Data" menu entries (Excel 2007: click "Insert" -> "Pivot Table")
The current default "Microsoft Excel list or database" is what we want, so press "Next".
- In the next step you will be asked to select the data to use. By default Excel will highlight the entire table present in your worksheet, which is perfectly fine for our purposes, so press "Next".
You should see the final dialog window asking you where the output should go. Accept the default of placing the output in a new worksheet and press "Finish".
You will see a "potential frequency/percentage table" as a new worksheet, containing labels such as "Drop Row Field Here", "Drop Column Fields Here", etc. There will also be a floating window containing the available variables from your data, in our case "Gender", "Salary Level", "Years of Education", etc. You can "drag-and-drop" these variables to the various slots in the table to create a variety of useful tables for data analysis.
- First drag the variable "Salary Level" from the floating window into the "Drop Row Fields" area of the table. Your table will adjust, showing you all available salary levels. Next, again drag the variable "Salary Level" from the floating window, but this time drag it to the "Drop Data Item" area in the middle. You will then see the counts of how many occurrences fell inside each salary level, similar to the following:
Note that your picture might look slightly different. Make sure, though, that you see two "buttons", one called "Salary Level" (which applies to the first column) and "Count of Salary Level" (which should apply to the second column). We see, for example, that there were 33 people in the salary range of $10K to 20K, 230 in the next salary range, and so forth. Of course raw numbers are not so useful, so we would like to convert them to percentages.
- You can adjust what exactly is shown in the "data" area of your table by double-clicking the button entitled "Count of Salary Level" that is circled above - note that your picture might be slightly different, you do need to double-click "Count of Salary Level", not "Salary Level" - similar to the above picture. You will see the following dialog:
Here you can specify what computation you want to show in the data area of your table. In most cases, Count is the preferred selection, but you do need to change the format for the count.
- Click on "Options" to reveal additional options and make sure the "Show data as" field is set to "% of total", as shown below. Then click OK.
Finally we have constructed our complete frequency chart for the ordinal variable "salary level" as follows:
From the table we can see, for example, that 6.96% + 48.52% = 55.48% of employees earn $30,000 or less.
Example: Use the same Excel data set to find the percentage of males and females that took part in this survey, as well as the percentage of the various job categories.
The procedure is similar to above:
- Choose the "PivotTable Report ..." wizzard from the "Data" menu
- Select the appropriate column as input
- Put that variable in the "row" as well as the "data" field of the table in the wizzard's dialog box
- Choose the "% of total" option for the "data"
- "Finish" the tables
Here are the resulting tables so that you can check your own results:
- Percentages of male/females in the survey
- Percentages of job categories in the survey
Note: Once you have such a table - Excel calls it a Pivot Table - you can change the categories (variables) to be displayed by dragging them in and out of the table. Maybe you could try to experiment to create a percentage table relating salary level with gender (sex)?