Fleiss’ Kappa

- Name after Joseph L. Fleiss, to measure the agreement among a fixed set of raters for both binary and nominal/ordinal data.

Interpretation:

The degree of absolute agreement of categorical variables.
Landis, J. R. and Koch, G. G. (1977) "The measurement of observer agreement for categorical data" in Biometrics. Vol. 33, pp. 159–174 gives the following suggestion of intepreting Kappa:
below 0.00: poor
between 0.00 and 0.20: Slight
between 0.21 and 0.40: Fair
between 0.41 and 0.60: Moderate
between 0.61 and 0.80: Substantial
above 0.8: Excellent

Assumptions:

The outcomes are categorical

Each object is categorized by the same set of raters.

Can have more than two raters.

The categories used for each object is the same

Formula:

\(\frac{P_o - P_e}{1-P_e}\)

\(p_{ij}\) = the observed probablities;

\(e_{ij}\) = the expected probablities ;

Example in R

Untitled.utf8
library("irr")
## Loading required package: lpSolve
data("diagnoses")
diagnoses
##                     rater1                  rater2                  rater3
## 1              4. Neurosis             4. Neurosis             4. Neurosis
## 2  2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 3  2. Personality Disorder        3. Schizophrenia        3. Schizophrenia
## 4                 5. Other                5. Other                5. Other
## 5  2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 6            1. Depression           1. Depression        3. Schizophrenia
## 7         3. Schizophrenia        3. Schizophrenia        3. Schizophrenia
## 8            1. Depression           1. Depression        3. Schizophrenia
## 9            1. Depression           1. Depression             4. Neurosis
## 10                5. Other                5. Other                5. Other
## 11           1. Depression             4. Neurosis             4. Neurosis
## 12           1. Depression 2. Personality Disorder             4. Neurosis
## 13 2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 14           1. Depression             4. Neurosis             4. Neurosis
## 15 2. Personality Disorder 2. Personality Disorder             4. Neurosis
## 16        3. Schizophrenia        3. Schizophrenia        3. Schizophrenia
## 17           1. Depression           1. Depression           1. Depression
## 18           1. Depression           1. Depression           1. Depression
## 19 2. Personality Disorder 2. Personality Disorder             4. Neurosis
## 20           1. Depression        3. Schizophrenia        3. Schizophrenia
## 21                5. Other                5. Other                5. Other
## 22 2. Personality Disorder             4. Neurosis             4. Neurosis
## 23 2. Personality Disorder 2. Personality Disorder             4. Neurosis
## 24           1. Depression           1. Depression             4. Neurosis
## 25           1. Depression             4. Neurosis             4. Neurosis
## 26 2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 27           1. Depression           1. Depression           1. Depression
## 28 2. Personality Disorder 2. Personality Disorder             4. Neurosis
## 29           1. Depression        3. Schizophrenia        3. Schizophrenia
## 30                5. Other                5. Other                5. Other
##                     rater4                  rater5                  rater6
## 1              4. Neurosis             4. Neurosis             4. Neurosis
## 2                 5. Other                5. Other                5. Other
## 3         3. Schizophrenia        3. Schizophrenia                5. Other
## 4                 5. Other                5. Other                5. Other
## 5              4. Neurosis             4. Neurosis             4. Neurosis
## 6         3. Schizophrenia        3. Schizophrenia        3. Schizophrenia
## 7         3. Schizophrenia                5. Other                5. Other
## 8         3. Schizophrenia        3. Schizophrenia             4. Neurosis
## 9              4. Neurosis             4. Neurosis             4. Neurosis
## 10                5. Other                5. Other                5. Other
## 11             4. Neurosis             4. Neurosis             4. Neurosis
## 12             4. Neurosis             4. Neurosis             4. Neurosis
## 13        3. Schizophrenia        3. Schizophrenia        3. Schizophrenia
## 14             4. Neurosis             4. Neurosis             4. Neurosis
## 15             4. Neurosis             4. Neurosis                5. Other
## 16        3. Schizophrenia        3. Schizophrenia                5. Other
## 17             4. Neurosis                5. Other                5. Other
## 18           1. Depression           1. Depression 2. Personality Disorder
## 19             4. Neurosis             4. Neurosis             4. Neurosis
## 20                5. Other                5. Other                5. Other
## 21                5. Other                5. Other                5. Other
## 22             4. Neurosis             4. Neurosis             4. Neurosis
## 23                5. Other                5. Other                5. Other
## 24             4. Neurosis             4. Neurosis             4. Neurosis
## 25             4. Neurosis             4. Neurosis                5. Other
## 26 2. Personality Disorder 2. Personality Disorder             4. Neurosis
## 27           1. Depression                5. Other                5. Other
## 28             4. Neurosis             4. Neurosis             4. Neurosis
## 29        3. Schizophrenia        3. Schizophrenia        3. Schizophrenia
## 30                5. Other                5. Other                5. Other

Calculate Fleiss’ Kappa

library(irr)
kappam.fleiss(diagnoses)
##  Fleiss' Kappa for m Raters
## 
##  Subjects = 30 
##    Raters = 6 
##     Kappa = 0.43 
## 
##         z = 17.7 
##   p-value = 0

References:

Landis, J. R. and Koch, G. G. (1977) "The measurement of observer agreement for categorical data" in Biometrics. Vol. 33, pp. 159–174
Fleiss, J.L., and others. 1971. “Measuring Nominal Scale Agreement Among Many Raters.” Psychological Bulletin 76 (5): 378–82.
Joseph L. Fleiss, Myunghee Cho Paik, Bruce Levin. 2003. Statistical Methods for Rates and Proportions. 3rd ed. John Wiley; Sons, Inc.