5.1 Percentage Tables
So far we have analyzed data one variable at a time. We have seen how to compute mean, mode, median, and variance, but each formula only applied to one variable. Now we want to investigate two (or more) variables simultaneously. Usually, a typical question about two variables is:
Is there some relation between one variable and another one, and if so, how can one use knowledge about one variable to predict, approximately, the other.
Answers to such questions can be very useful:
- if smoking causes cancer, we should stop smoking
- if having a college degree increases the chance to have a well-paying job, we should try our best to graduate college
- if exercising and working out increases our general state of health, we should exercise and work out regularly
- if a new drug really does have a positive impact on lowering blood pressure, we should take it if we have high blood pressure
In most cases the "if" part is the difficult one to determine, i.e. it is not so easy to find out whether two variables (for example smoking and cancer) are indeed related, and even if they are related, it might be difficult to determine which is cause and which is result (if smokers have higher cancer rates, does smoking cause cancer, or does having cancer cause you to smoke).
We will start our investigation about relationships between variables by taking a closer look at representing data in tables.
Example: The residents of Green Township where asked what their opinion about a new Zoning Ordinance was. The answers were broken down by age of the people who were questioned. The result of the survey is summarized in the following table:
age
50 or underage
over 50Total
For Zoning
92
87
179
Against or no opinion
158
75
233
Total
250
162
412
This table can, of course, be entered into Excel directly, and using a few tools that Excel provides the data entry and formatting is quick and effortless.
- First, let's enter the "raw" data, i.e. all data that is actually collected as opposed to computed data. Our spreadsheet will look similar to this:
Note that some labels may not be completely visible - we will rectify that later automatically.
- Next, we will asked Excel to compute the totals for us "on the fly". Excel provides a very convenient button for that on the tool bar: the "Auto Sum" button . Position the cursor in the cell for the first row total and press "Auto Sum" . Excel will indicate the cells that it is going to sum up, which should be all cells to the left (of course you could also enter the "=sum()" formula and pick the range manually, but the 'auto sum' tool is quicker in this case).
If your table does not look like this it is fine, but it should contain the appropriate row and column totals.
- Press ENTER to accept the choice and Excel will automatically compute the total and enter it in the appropriate cell.
- Keep on using the "Auto Sum" button until all totals are computed. In other words, each time position the cursor first in the cell that will contain the sum, then press "Auto Sum", then press ENTER. Note that Excel automatically uses cells to the left or above, as appropriate.
- Finally, we want to format our table nicely. First, select all cells from the table, including the labels and all totals, but not more than that. Next, from the "Format" menu, select "AutoFormat". Choose a format that you like, for example "Colorful 2", then press enter. Here is the final table, nicely formatted (with all labels visible):
Row and Column Percenta
The above table looks nice, but is not very helpful. It is not meaningful, for example, to know that 92 people age 50 or under are for zoning. What would be useful, of course, would be to know percentages instead of actual figures. So, to better analyze the data we will convert each number to percent. However, for each entry there are three possible percentages to compute:
- we could take a row total to convert a number into percent (row percentages)
- we could take a column total to convert a number into percent (column percentages)
- we could take the grand total to convert a number into percent (total percentages)
Row and column percentages are the most useful, and we will first show how to generate each of them, then discuss when to use which.
Let's convert each number into the appropriate row percentages, i.e. we will use the row total to convert a number into percent:
First, we will copy the original table to a new location just below it:
- select all cells from the table
- choose "Copy" from the "Edit" menu
- position the cursor in the first column a few cells below the original table
- select "Paste" from the "Edit" menu.
A second copy of the original table will appear. At this point you have a choice:Excel would do the computations for you, of course, but in this case the power of Excel might actually be overkill. So, we will first convert the figures by hand, using a normal calculator, and then - as an appendix for the Excel enthusiast - we'll show how to use Excel if you insist.
- you could convert the numbers to percentages "by hand" (well, by hand would mean using a regular calculator)
- you could convert the numbers to percentages using Excel
Converting to Row Percentage
In cell B8 the 'raw data' is 92, indicating that there were 92 people age 50 or under who were 'for' zoning. The total in row 8 is 179, indicating that a total of 179 people were 'for' zoning. Thus, the row percentage of people 'for' zoning who are age 50 or under is
92 / 179 = 0.514
Thus, 0.514 (or 51.4%) of those people 'for' zoning were age 50 or under, and therefore we manually enter the number 0.514 into cell B8, replacing the original value of 92.
Similarly, in cell C8 the raw data is 87, which we convert into row percentage as follows
87 / 179 = 0.486
Thus, 0.486 (or 48.6%) of those people 'for' zoning were over 50 years of age, so we manually enter the number 0.486, replacing the value of 87.
We repeat similar calculations to turn all of the raw data in rows 9 (dividing by 233) and 10 (dividing by 412) into row percentages, and we replace the row totals by 1 (or 100%). Our converted table should look like this, showing the converted numbers but not in percentages:
- Mark the cells containing the ratios in the second table (B8 to D10)
- Select "Cells ..." from the "Format" menu and choose as category "Percentage"
- Hit RETURN to accept that selection
Converting to Column Percentage
Similarly, we can create a third table, containing column percentages. The details are left as an exercise, but for the calculations we need to convert the raw data values by dividing by the column totals instead of the row totals. If we do everything correctly our three tables (properly formatted) should look like this:
When to use Row Percentages, when Column Percentages
Now that we know how to create either row or column percentages, let's try to determine when to use which. For that, consider two similar but very much different questions:
- How many people who are for the zoning law are age 50 or under (in percent) ?
- How many people, in percent, 50 or under are for the zoning law (in percent) ?
These questions seem similar: we are looking at the intersection between the row "For Zoning" and the column "Age 50 or under". From the first table we know that 92 people fall into that category, but that number is not in percent. On the other hand, there are two candidates for the percentage number, 52.4% from the row percentages or 36.8% from the column percentages. Which one answers which question ?
- Question 1 asks, rephrased: out of all people who are for the zoning law, how many of them are age 50 or under. In other words, question 1 considers all people who are for the zoning law as a total - that is a row total, so that the answer to question 1 is the row percentage 51.4%.
- Question 2 asks, rephrased: out of all people who are 50 or under, how many of them are for the zoning law. In other words, question 2 considers all people who are 50 or under as a total - that is a column total, so that the answer to question 2 is the column percentage 36.8%.
From that example we see that the key to answer questions such as these is which group is considered the "total" group for the particular question:
- if the total for that group is found in a row, use row percentages
- if the total for that group is found in a column, use column percentages
It seems that generating these percentage tables is a fair amount of work. Of course Excel provides an easier method for generating such tables from actual data, which we will explore soon.
Appendix: Converting to Percentages using Excel
In this case it is simpler doing the proper computations using a regular calculator, but we could use Excel just as well, as outlined below. First, our starting point is the original table with the raw data values, copied to a second version as in this picture:
Now position the cursor into cell B8 (second table, containing the value 92). We will replace this "raw" value by a computed one as follows:The value 92 in the second table will now be replaced by the appropriate ration 0.5139 (which is not in percent, but do not worry about that right now).
- type "=" (the equal sign)
- click once in the first table on the value 92 (cell B3)
- type "/" (the division symbol)
- click once in the first table on the value 179 (cell D3)
- hit ENTER
We repeat this calculation for the next cell, C9 (second table, containing the value 87). We compute row percentages:
If we keep going this way, we can convert all numbers into row percentages. Actually, all numbers so far will be decimal values, but we can easily format them as percentages as outlined above.
- type "=" (the equal sign)
- click once in the first table on the value 87 (cell C3)
- type "/" (the division symbol)
- click once in the first table on the value 179 (cell D3)
- hit ENTER
And it should be possible, now, to convert the raw data into column percentages, using Excel - see above for the final tables.
The advantages of this approach is that - while more work than using a calculator - the percentage table(s) will automatically update if the raw data values change if we use Excel formulas.
Practicing
To practice, consider the following table of raw numbers, relating sex (or gender) with puls rate (the numbers are from a survey for this class):
Convert the table into (a) row percentages and (b) column percentages, then use the appropriate figures to answer the following questions:
low pulse rate high pulse rate Totals Male 1 1 2 Female 9 6 15 Totals 10 7 17
Please note that only 2 people who took the survey were male, which - using common sense - will limit our ability to draw conclusions about, say, the population of all students at Seton Hall (but we're not yet drawing conclusions or making predictions anyway).
- How many males (in percent) have a high pulse rate?
- This is a row percent so the answer (in percent) is: 1 / 2 * 100 = 50%
- how many people (in percent) with a high pulse rate are female?
- This is a column percent so the answer (in percent) is: 6/7 * 100 = 85.7%