I sat down this morning to write out some thoughts on a rubric I developed for a course this semester. The more I wrote, the more I rambled. I’ve concluded that each of these points needs to be elaborated individually, but for now, here’s a brain-dump.
—–
Introduction
Last semester I taught on very short notice a course entitled “Understanding Educational Research.” It’s essentially a thesis prep class, but because different advisors have different concepts of the thesis, I chose to play it safe by basing the coursework on a topical review of literature that may or may not lead into the student’s thesis. Because this lit review would be a major assignment (50% of the final grade), I knew I needed a solid rubric and I set out deliberately to develop one through the following steps.
1. Start with good evidence and theory
I believe the best rubrics embody the best thinking in their respective field. Unless you are the leader in your field, this means you need to go out and see what other are saying. Find something someone else has done, whether empirical or theoretical, and build your rubric around it. Generally speaking, I’m a fan of both versions of Bloom’s Taxonomy and the lesser-known Krathwohl’s Taxonomy.
Specific to the topic of master’s thesis lit reviews, I found an unpublished article by two friends to be hugely helpful. These friends adapted a rubric from Doing a Literature Review: Releasing the Social Science Imagination (Hart, 1999), and then used it to evaluate 30 theses. Their article was the first assigned reading for my course, and my students spent much more time discussion the rubric than they did going over their evaluation results.
2. Involve other people
Whether you talk to colleagues, your students, or (preferably) both, get someone else to look over the rubric early and often. In my case, the students worked in groups to determine which of Hart’s criteria were applicable to our class assignment, and then collaboratively crafted draft rubrics during next three class sessions. I served primarily as a sieve, sorting out the contributions of each group and keeping the standards adequately elevated. Which is a nice segue into…
3. Aim high
If your rubric doesn’t describe the heights to which you believe your students may soar, you can only blame yourself when their work disappoints you. Few students will ever do more than they’re told. And why should they? It is not their responsibility to guess what extra work will get them a higher grade. It is imperative that your rubrics include what you know they can do, even if they don’t know it yet.
I had to employ a little subterfuge to raise my students’ expectations. OK, I flat-out lied to them. The article (described above) I assigned at the beginning of class wasn’t really a pre-pub version of some friends’ article; it was Scholars Before Researchers: On the Centrality of the Dissertation Literature Review in Research Preparation (Boote & Beile, 2005). I had removed every mention of “doctoral” and “dissertation” and replaced them with “master’s” and “thesis.” So when my masters students were contemplating which criteria they would meet for their lit reviews, they were working from suggestions for doctoral students. I changed the names in the reference so they wouldn’t find the original article and catch me in my ruse.
It worked perfectly. It wasn’t until after all the lit reviews were submitted that I revealed the intrigue to my students. Yes, I saw a metaphorical dagger or two being flung my direction, but I haven’t fielded a single formal complaint. And, I believe, their work was much better when they held themselves to such a high standard.
4. Avoid subjective terms and judgments of quantity
Most rubrics fail to achieve greatness in part because they rely on overly subjective judgments. Terms like rarely, some, clearly, and (my personal favorite) nearly always are often used to distinguish between levels of performance. But these terms leave so much latitude to the rater that nearly every result is debatable. Other rubrics avoid this pitfall by quantifying degrees of frequency (e.g. “Students correctly cite their sources 70%-89% of the time”). This practice only conveys the impression of objectivity because the criteria are typically not actually measured. Neither using subjective terms, no pseudoquantification is advisable.
This was an issue with many of Hart’s original performance levels for lit reviews. Consider the following levels for one of his criteria (emph. added):
| Criterion |
1 |
2 |
3 |
| Placed the research in the historical context of the field. |
History of topic not discussed. |
Some mention of history of topic. |
Critically examined history of topic. |
Notice the subjective terms in the top two performance levels. The difference between no discussion, some discussion, and critically discussed is endlessly discussable. But, this is what we typically see on good rubrics. What other options do we have?
Rather than vary the degree to which a student has performed the same verb, we can find different verbs that describe more acceptable performance. In my case I grabbed verbs from Bloom’s original taxonomy. Here’s is the row from our rubric that corresponds to Hart’s row above:
| Criterion |
1 |
2 |
3 |
4 |
| Placed the research in the historical context of the field. |
Mentions the history of the topic, but does not describe it. |
Describes the topic’s history in isolation from external influences. |
Frames the history of the topic in relevant social, scientific, and educational events/attitudes. |
Compares the target topic’s history with histories of related topics. |
5. Purposefully weight each criterion
Many rubrics assign the same value to each criterion. While it is possible that they all be equally important, I believe that most of the time this phenomenon is the result of laziness on our part. We don’t want to think about how much “grammar and spelling” should be worth compared to “addressing the topic.” How we determine the weight assigned to each criterion (importance? difficulty? frequency?) is for another blog post.
The students’ input was invaluable for this issue on our rubric. They conveyed sincerity in their arguments for why one row should be more than another, and the final rubric – by which their work was judged – represents their collective opinions. The weights ranged from 6% for defining key term to 20% for summarizing the methods researchers have used to explore the topic.
6. Use non-linear performance levels
I would say 95% of the rubrics I have seen attempt to fit their performance levels to an equal-interval scale. That is, they put the same distance between each level. For example, Hart’s rubric (shown above) used a 1-2-3 scale. But what if the space – perhaps measured by effort – between the second and third levels isn’t the same as the space between the first and second? Rather than blindly following this convention, great rubrics may deliberately space out their performance levels unequally.
For our lit review rubric, we chose 70-80-85-100 for two reasons: First, in our opinion, an A-level paper needed to meet the highest criteria. A 70-80-90-100 distribution would have allowed someone to claim an “A-” without ever performing at the highest level. Second, the effort required to move from the second to third levels was consistently less than that required to move from the third to fourth level.
7. Either include zero as a performance level, or do not describe no-performance
One of my biggest pet peeves are rubrics that assign a value of 1 to the lowest level of performance and contain a description of null performance at that level. Looking at Hart’s rubric above, notice that a student who doesn’t do anything under that criterion still receives a 1-out-of-3 score. This would allow students to claim A-level credit when they neglected a criterion that had been important enough to include on the rubric. Taken to the extreme, a blank paper would earn 33% credit.
A better way would be to choose between 1) including a null description with a zero-credit performance level, or 2) letting your lowest performance level be greater than zero, but describe some minimal performance at that level.
For our lit review rubric, we chose the latter. The value of the minimum performance level is 70%, but it is possible that the student will not even accomplish that level. There is a note at the bottom of the rubric which states that students will receive a score lower than 70% if they fail to fulfill those minimum requirements.
8. Check for understanding (both before and after the assignment)
When a rubric is handed out at the same time the task is assigned, it makes sense to check that the students actually understand what is being asked of them. If students’ literature reviews were scored with Hart’s rubric, the students would need to know what is meant by “critically examines the history of the topic.” Additionally, as the assignments are scored, the rater should watch for common misunderstandings so they can be cleared up the next time the rubric is used.
Because the students helped develop our lit review rubric, I assumed they had a good understanding of what was expected. I was wrong. For example, given that these are masters students in a department of education and that they are all current, former, or future educators, there appeared to be confusion surrounding the term method. Some interpreted it, as I had intended, to imply research methods, but others took it to mean teaching methods. I will be clarifying this distinction on future versions of the rubric.
9. Analyze the results
No assessment tool works well the first time it is used. Commercial tests go through rounds of pilot testing before they are released. State tests… usually need more, but let’s hold ourselves to a high standard. Rubric-derived scores need to be tabulated for each criterion and each level of performance, and then the resulting patterns should be evaluated for their appropriateness. Was there one criterion on which many students scored very low? How can we fix that next time? Do we need to adjust the rubric or the instruction?
If you are concerned that the results may depend on who scored the assignment, you should have multiple raters independently score the same students’ work. This check for inter-rater reliability will tell you if more work needs to be done on the rubric, or perhaps on training scorers to use it.
In most cases an internal consistency reliability analysis (Cronbach’s alpha, KR-20, etc.) is not appropriate for rubric results. Internal consistency checks that your high-scorers aren’t losing points on the easy sections, and that your low-scorers are not getting high marks on the difficult sections. The criteria on a rubric are chosen because they represent various ways in which the quality of the student’s work may vary. We would expect criterion scores to be relatively independent, a trait which internal consistency would mistake as a lack of reliability.
For our lit review rubric, I shaded each cell according to how many students ended up in each performance level. The shading (see below) revealed some welcome and disconcerting patterns. First, average scores for each criterion ranged between 81% and 92%, with an average total score of 83%. This is an appropriate result for such a difficult assignment. Second, very few students achieved the highest performance level for two criteria, which I believe was due to an interaction between the criteria and the specific topics students chose for their lit reviews. Third, students did not do well on the rhetoric criterion, which included organization, grammar/spelling, and APA style.
10. Revise as necessary
The problems uncovered by the analysis (if you didn’t find any problems, look again), can be classified into two gross categories: Problems with the rubric, and problems with the instruction. If the issues concern clarity or applicability of the criteria, then revise them. On the other hand, if students (or a certain set of students) scored lower than you expected, it might not be a problem with the rubric, but with the students’ preparedness. It would be a shame to revise or scrap a functioning rubric just because it didn’t give the results we wanted. Instead, consider altering the instructional activities, and then reusing the rubric to track any changes in student performance.
On my shaded copy of our lit review rubric, I highlighted the problematic cells and inserted endnotes describing the problems and possible solutions. I am not teaching this course next semester, so I need to record my concerns right away. With these notes, I can pick up and revise the rubric the next time I use it.

Conclusion
Whether having a great rubric is worth all this work is a valid question. But a wonderful aspect of rubrics is that they can be reused each semester (or even multiple times within a semester) without impacting the students’ results. So you can put in a little time now, then a little time next year, and develop the rubric in baby steps. So then, it’s not a question of whether it’s worth the time, but how much time is it worth.