2004
Volume 3, Issue 1
  • E-ISSN: 2665-9085

Abstract

Abstract

This study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.

Loading

Article metrics loading...

/content/journals/10.5117/CCR2021.1.003.GEIS
2021-03-01
2024-11-08
Loading full text...

Full text loading...

/deliver/fulltext/26659085/3/1/03_CCR2021.1_GEIS.html?itemId=/content/journals/10.5117/CCR2021.1.003.GEIS&mimeType=html&fmt=ahah

References

  1. Altman, D. G.(1991). Practical statistics for medical research. Chapman and Hall.
    [Google Scholar]
  2. Barbiero, A., & Ferrari, P. A.(2015). GenOrd: Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions (1.4.0) [Computer software]. CRAN.R-project.org/package=GenOrd
    [Google Scholar]
  3. Berelson, B.(1971). Content analysis in communication research. Hafner.
    [Google Scholar]
  4. Chan, W., & Chan, D. W.-L.(2004). Bootstrap Standard Error and Confidence Intervals for the Correlation Corrected for Range Restriction: A Simulation Study. Psychological Methods, 9(3), 369–385. https://doi.org/10/dqvthj
    [Google Scholar]
  5. Cohen, J.(1988). Statistical power analysis for the behavioral sciences (2nd ed). L. Erlbaum Associates.
    [Google Scholar]
  6. Feng, G. C.(2013). Factors affecting intercoder reliability: A Monte Carlo experiment. Quality & Quantity, 47(5), 2959–2982. https://doi.org/10.1007/s11135-012-9745-9
    [Google Scholar]
  7. Feng, G. C.(2014). Intercoder reliability indices: Disuse, misuse, and abuse. Quality & Quantity, 48(3), 1803–1815. https://doi.org/10.1007/s11135-013-9956-8
    [Google Scholar]
  8. Feng, G. C., & Zhao, X.(2016). Do Not Force Agreement: A Response to. Methodology, 12(4), 145–148. https://doi.org/10/gdqc5m
    [Google Scholar]
  9. Fico, F. G., Lacy, S., & Riffe, D.(2008). A Content Analysis Guide for Media Economics Scholars. Journal of Media Economics, 21(2), 114–130. https://doi.org/10.1080/08997760802069994
    [Google Scholar]
  10. Fleiss, J. L., Levin, B., & Paik, M. C.(2003). Statistical methods for rates and proportions (3rd ed). Wiley.
    [Google Scholar]
  11. Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T.(2015). mvtnorm: Multivariate Normal and t Distributions (1.0-3) [Computer software]. CRAN.R-project.org/package=mvtnorm
    [Google Scholar]
  12. Gustafson, P.(2004). Measurement error and misclassification in statistics and epidemiology: Impacts and Bayesian adjustments. Chapman & Hall/CRC.
    [Google Scholar]
  13. Gwet, K. L.(2014). Handbook of Inter-Rater Reliability (3rd ed.). Advanced Analytics.
    [Google Scholar]
  14. Hayes, A. F., & Krippendorff, K.(2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77–89. https://doi.org/10/cs2t97
    [Google Scholar]
  15. Kepplinger, H. M.(1989). Content Analysis and Reception Analysis. American Behavioral Scientist, 33, 175–182. https://doi.org/10/cczw6m
    [Google Scholar]
  16. Krippendorff, K. (2004a). Content analysis: An introduction to its methodology (2nd ed). SAGE.
    [Google Scholar]
  17. Krippendorff, K. (2004b). Reliability in Content Analysis: Some Common Misconceptions and Recommendations. Human Communication Research, 30(3), 411–433. https://doi.org/10.1111/j.1468-2958.2004.tb00738.x
    [Google Scholar]
  18. Krippendorff, K.(2012). Comment: A dissenting view on so-called paradoxes of reliability coefficients. In C. T.Salmon (Ed.), Communication Yearbook (Vol. 36, pp. 481–500). Routledge.
    [Google Scholar]
  19. Krippendorff, K.(2016). Misunderstanding Reliability. Methodology, 12(4), 139–144. https://doi.org/10.1027/1614-2241/a000119
    [Google Scholar]
  20. Krippendorff, K.(2017). Three concepts to retire. Annals of the International Communication Association, 41(1), 92–99. https://doi.org/10/gf659g
    [Google Scholar]
  21. Landis, J. R., & Koch, G. G.(1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1), 159. https://doi.org/10.2307/2529310
    [Google Scholar]
  22. Lombard, M., Snyder-Duch, J., & Bracken, C. C.(2002). Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability. Human Communication Research, 28(4), 587–604. https://doi.org/10.1111/j.1468-2958.2002.tb00826.x
    [Google Scholar]
  23. R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing. www.R-project.org/
    [Google Scholar]
  24. Riffe, D., Lacy, S., & Fico, F.(1998). Analyzing media messages: Using quantitative content analysis in research. Erlbaum.
    [Google Scholar]
  25. Scharkow, M., & Bachl, M.(2017). How Measurement Error in Content Analysis and Self-Reported Media Use Leads to Minimal Media Effect Findings in Linkage Analyses: A Simulation Study. Political Communication, 34(3), 323–343. https://doi.org/10/ggbm28
    [Google Scholar]
  26. Schuck, A. R. T., Vliegenthart, R., & De Vreese, C. H.(2015). Matching Theory and Data: Why Combining Media Content with Survey Data Matters. British Journal of Political Science, 1–9. https://doi.org/10/gdqc3h
    [Google Scholar]
  27. Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H. G.(2020). In Validations We Trust? The Impact of Imperfect Human Annotations as a Gold Standard on the Quality of Validation of Automated Content Analysis. Political Communication, 37(4), 550–572. https://doi.org/10.1080/10584609.2020.1723752
    [Google Scholar]
  28. van ’t Veer, A. E., & Giner-Sorolla, R.(2016). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. https://doi.org/10/f85xtx
    [Google Scholar]
  29. Zhao, X., Liu, J. S., & Deng, K.(2012). Assumptions behind inter-coder reliability indices. In C. T.Salmon (Ed.), Communication Yearbook (Vol. 36, pp. 419–480). Routledge.
    [Google Scholar]
/content/journals/10.5117/CCR2021.1.003.GEIS
Loading
/content/journals/10.5117/CCR2021.1.003.GEIS
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error