Statistics Corner: How do we calculate rater/coder agreement and Cohen's Kappa?

Article appearing in Shiken 16.2 (Nov 2012) pp. 30-36.

Author: James Dean Brown
University of Hawai'i at Manoa

I am working on a study in which two raters coded answers to 6 questions about study abroad attitudes/experience for 170 Japanese university students. The coding was done according to a rubric in which there were 4-8 possible responses per question. Since most -- if not all -- of the data is categorical, I have heard that Cohen's Kappa is the most common way of ascertaining inter-rater agreement. What is the best way to actually calculate that? Since more and more people are moving away from single-rater assessments to multi-rater assessments, this question should be relevant to Shiken Research Bulletin readers.

In order to address your question, I will have to describe the agreement coefficient as well as the Kappa coefficient. I will do so with a simple example, then with the more complex data that you have in your study.

Download full article (PDF)