Interpreting the Index of Discrimination
The index of discrimination is a useful measure of item quality whenever the purpose of a test is to produce a spread of scores, reflecting differences in student achievement, so that distinctions may be made among the performances of examinees. This is likely to be the purpose of norm-referenced tests.
For the subset of criterion-referenced tests known as mastery model tests, we desire that all examinees score as high as possible. We do not wish to distinguish among examinees who score at mastery level and therefore are not interested in maximizing test score variance. In such cases the index of discrimination is not useful and other measures, such as sensitivity to instruction, are used to judge item quality.
A basic consideration in evaluating the performance of a normative test item is the degree to which the item discriminates between high achieving students and low achieving students. Literally dozens of indices have been developed to express the discriminating ability of test items. Most empirical studies have shown that nearly identical sets of items are selected regardless of the indices of discrimination used. A common conclusion is to use the index which is the easiest to compute and interpret.
Such an index of discrimination is shown on the item analysis reports available from the Scoring Office. This index of discrimination is simply the difference between the percentage of high achieving students who got an item right and the percentage of low achieving students who got the item right. The high and low achieving students are usually defined as the upper and lower twenty-seven percent of the students based on the total examination score. This difference in percentages is expressed as a whole number as a matter of convenience.
A useful rule of thumb in interpreting the index of discrimination is to compare it with the maximum possible discrimination for an item. The maximum possible discrimination is a function of item difficulty. When half or less of the sum of the upper group plus the lower group answered the item correctly, the maximum possible discrimination is the sum of the proportions of the upper and lower groups who answered the item correctly. For example, if 30% of the upper group and 10% of the lower group answered the item correctly, the maximum possible discrimination is 30 plus 10, or 40. This maximum possible discrimination would occur when 40% of the upper group and none of the lower group answered the item correctly.
Note that the actual discrimination of the example is 20. It might be said that the discriminating efficiency of the item, which is the ratio of the actual discrimination to the possible discrimination, is 50%. See Item A in Table 1.
When more than half of the sum of the upper group plus the lower group answer an item correctly, the maximum possible discrimination is 200 minus the sum of the proportions of the upper and lower groups who answered the item correctly. For example, if 96% of the upper group and 84% of the lower group answered the item correctly, the maximum possible discrimination for the item would be 200 minus 180 (96 plus 84), or 20. Since the actual index of discrimination for the item is 96 minus 84, or 12, the discriminating efficiency of the item is 12/20 or 60%. See Item B in Table 1.
It is important to recognize that an item which half of the students answer correctly has the highest possible discriminating potential. Consider an item which 80% of the upper group and 20% of the lower group answer correctly. According to the rule of thumb for items answered by half or less of the students, the maximum discriminating ability of the item is 80 plus 20, or 100. Since the index of discrimination of the item is 60, the discriminating efficiency is 60%. See Item C in Table 1. As the difficulty of an item varies so that more than half of the combined upper and lower groups answer the item correctly, the discriminating ability will decrease from 100. The lower limit of the maximum discriminating ability is zero when all of the combined upper and lower groups, or none of them, answer an item correctly.
The techniques discussed above enable one to determine the upper limit of the index of discrimination. In most practical situations, determining a lower limit for the index of discrimination is not a problem, since the most discriminating items are selected from the available item pool. The practical rule is the higher the discrimination, the better.
However, there are a number of techniques which may be used to determine a lower limit below which the index of discrimination is not significantly different from zero. The first, and most tedious, would be to determine the statistical significance of the difference between two proportions, that is, the difference between the proportion of the upper group who answered the item correctly and the proportion of the lower group who answered the item correctly.
A second method would be to use a specially prepared table such as the one in Appendix A of Julian Stanley's Measurement In Today's Schools, (fourth edition). Prentice-Hall, 1964. Table A-5 (pp 353-355) indicates the level at which an item can be considered sufficiently discriminating in terms of numbers of persons. The number of persons must be converted to a proportion before relating it to the index of discrimination given on the item analysis report. Use of this table is convenient and gives values appropriate for 2, 3, 4, or 5 option items.
A third method of determining the statistical significance of the index of discrimination would be to compute its standard error. This might be accomplished by doing an item analysis on two samples of a large group. The reliability of the index of discrimination may be determined by correlating the pairs of values from the two item analyses. The rule may then be applied that the index of discrimination must be more than twice as large as the standard error in order for the index to be statistically different from zero at the 2.5 percent level of significance. Experience with University College final examinations has shown that the standard error technique and the use of Stanley's table result in the establishment of almost identical criteria for testing the significance of the index of discrimination when item analyses are based on 500 students. Comparable criteria will also be developed by applying the technique of determining the statistical significance of the difference between two proportions, when the items have difficulty indices of approximately 50.