Assessment Concerns in Jon Mott’s “PLNs, Portfolios, and a Loosely-Coupled Gradebook”

Opinion
 Posted by jeremy on June 17th, 2009

Jon Mott has a good post on combining ePortfolio efforts with cloud computing and the potential benefits of decentralizing the assessment process. While I agree with some of his points, his comments demonstrate the consistent lack of understanding many educators have of validity and reliability. This post is meant to clarify the issues somewhat, and offer an admonition as he continues to develop this idea.

(Some assessment specialists may take issue with my treatment of these constructs. I apologize if I’ve oversimplified. Please link to more appropriate explanations as you see fit.)

Validity

The 1999 Standards define validity as…

… the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (p. 4)

This definition communicates three essential qualities of validity in assessment.

  1. Validity is not absolute, but a matter of degree. It is more proper to refer to the quality and quantity of evidence supporting validity than to declare (or aim for) an assessment as valid.
  2. Assessments cannot be valid. Assessments cannot be invalid. Data cannot be valid or invalid. Validity is a property of the interpretations one makes of assessment results.
  3. Interpretations are derived from specific uses of the assessment results. Validity cannot be appropriately discussed without regard for how the assessment results are to be used.

With that in mind, notice that this sentence from Jon’s post only reflects one of those three points.

Student learning portfolios are essential in the movement toward more valid and authentic assessment in higher education.

These stipulations are not simply semantic; they reflect decades of theory and empirical research and yield procedural advantages over the vernacular concept of validity. Discussing assessments (portfolios included) without regard for modern validity is akin to pontificating about the cloud without understanding the network.

Reliability

Jon doesn’t mention reliability specifically in his post, but his notion of a quasi-decentralized (loosely-coupled) assessment framework for student portfolios has a prima facie effect on the reliability of the assessment results. (Notice: Assessment results can be more or less reliable, not the assessment itself.)

In classical test theory, reliability is conceptualized as the degree to which a student’s assessment results (obtained or observed score) reflects that student’s true score (which is “known only to God”). We assume that an infinite number of assessment tasks, rated by an infinite number of raters, etc., would average out to the student’s true score, […insert math here…] so we operationalize reliability as the consistency of assessment results.

Hypothetically, an assessment should yield the same result for two students of equal true score. In other words, the results should be consistent across students of like ability. It would be unfortunate for one student to “pass” and another to “fail” solely because the assessment lacked reliability.

Jon cites Gary Brown’s encouragement to give students a high degree of control over their portfolios. I would argue that the degree to which ownership of the portfolio is decentralized (accorded to each student), there will be a necessary degradation in reliability of the judgments passed on the portfolio. I say this for two reasons:

  1. If two students contribute two distinctly different artifacts to their respective portfolios, one may fail simply due to the degree of difficulty in his/her selected artifacts, while another may pass due to the relative facility of his/hers. Assessment techniques exist that could mitigate this risk, but Jon make no allusions to them in his post.
  2. Portfolio judges will be unable to disaggregate two distinct learning outcomes: a) the quality of work in the portfolio, and b) the ability to recognize and select quality work to include in the portfolio (and to exclude work of lesser quality). Again, students of equal ability may receive vastly different results. One may argue that the ability to judge one’s own work is important(1), but then it should be assessed distinctly from the quality of one’s work.

Conclusion

I must acknowledge that Jon recommends “loosely-coupling,” not “decoupling,” the university and assessment; I am simply warning him of the assessment minefield into which he is marching. To shift assessment practice without regard of modern assessment theory is an all-too-common folly.

—–

(1) “Si tu réussis à bien te juger, c’est que tu es un véritable sage.” (“If you are successful at correctly judging yourself, you are truly wise.”) The Little Prince by Antoine de Saint-Exupéry

2 Responses to “Assessment Concerns in Jon Mott’s “PLNs, Portfolios, and a Loosely-Coupled Gradebook””

  1. Jon Mott Says:

    Jeremy-

    Thanks for the response both here and on my blog. I appreciate the dialogue. You can rest assured that I understand the concepts of validity and reliability. 15 hours of methods in grad school and a decade in an assessment-oriented career have ingrained them onto my consciousness.

    I am guilty, perhaps, of thinking of this problem primarily from a technological affordance perspective (i.e. how teachers and students interact and exchange information during the learning process) and not enough from the perspective of “modern assessment theory.” I appreciate your invitation (“warning”!) to think about these issues more carefully. I am and will.

    I’m crafting a lengthier response which I’ll post on my blog, but I’m wondering if you would clarify something for me. What do you mean by the “degree of difficulty” or “relative facility” of a student’s “selected artifacts”? I’m not quite sure what you mean here. Are you referring to the technical difficulty of creating and or publishing the artifact?

    Thanks again for the response. I’ll try to get my reaction written and posted in the next day or two.

    Jon

  2. jeremy Says:

    Jon, I’m glad that you are thinking about these. I’m guilty of seeing assessment as the only concern, so I have to admit that there are other concerns that must balance against it. I trust your experience and judgment, but it’s only very recently that teacher ed ePortfolios finally came under validity scrutiny.

    As far as the difficulty/facility issue:

    1. Every artifact in a portfolio represents a task completed.

    2. Different tasks have different levels of difficulty/facility.

    3. I assume the purpose of the portfolio is to assess student ability/accomplishments.

    4. A student with less ability could choose artifacts that represent easier tasks, inflating their ability. While a more ambitious student could choose an artifact that represents a more difficult task, and have their ability underestimated.

    One way to mitigate this would be to dictate the types of artifacts that may be included, factor in degrees of difficulty, make scoring rubrics, rating scales, etc., publicly available, and so forth. But these represent a return of ownership to the institutions, which runs counter to Brown’s comments.

    So the question is…

    How much structure do we put in place to assure a certain level of reliability, while still providing the students the ownership Brown recommends?