This is the most exciting image for kappa on the internet.

This is the most exciting image for kappa on the internet.

Cohen's Kappa

Cohenʼs kappa is a statistical tool used to quantify how well multiple observers agree ranging
from 0.00 (completely random agreement) to 1.00 (perfect agreement).

Used in systematic reviews to quantify how well independent reviewers agree on dichotomous
items like article screening (include / not include).

Example:
• 2 independent reviewers evaluate each of us in a single mock oral station
• Grading as pass/fail
• Both reviewers agreed that 33/36 residents should be graded as PASS
• Both reviewers agreed that 2/36 residents should be graded as FAIL
• Reviewer #1 thought one resident (1/36) should be graded as PASS, while Reviewer #2
thought that resident should be graded as FAIL
• How well did the 2 reviewers agree?

 

Example chart of two reviewers grades.

Example chart of two reviewers grades.

Observed agreement (OA) = probability they would agree on single score
OA = # they agree on / # total
OA = 35/36
OA = 0.97 (they agree 97% of the time)
**Problem: this doesnʼt account for agreement by chance**
Agreement by chance (AC) = probability that the 2 reviewers would agree based on chance
alone (ie: how often the reviewers would agree if they didnʼt watch the exam and both gave a
pre-determined rate of passes / fails)
AC = chance they would both pass + chance they would both fail
AC = (Previewer 1 pass* Previewer 2 pass) + (Previewer 1 fail* Previewer 2 fail)
AC = (34/36)*(33/36) + (2/36)*(3/36)
AC = 0.87 (they would agree 87% of the time even if they didnʼt watch the exams…)


Kappa score (k): Defined as the agreement between observers relative to the amount they
would agree by chance alone. Kappa is a much more robust way to report agreement when
there is a high likelihood of agreeing by chance (ie: screening articles when exclusions >>>
inclusions).
K = OA – AC / (1 – AC)
K = (0.97-0.87) / (1-0.87) = 0.35