- Home
- A-Z Publications
- Computational Communication Research
- Previous Issues
- Volume 3, Issue 1, 2021
Computational Communication Research - Volume 3, Issue 1, 2021
Volume 3, Issue 1, 2021
-
-
Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment
AbstractWe examined the validity of 37 sentiment scores based on dictionary-based methods using a large news corpus and demonstrated the risk of generating a spectrum of results with different levels of statistical significance by presenting an analysis of relationships between news sentiment and U.S. presidential approval. We summarize our findings into four best practices: 1) use a suitable sentiment dictionary; 2) do not assume that the validity and reliability of the dictionary is ‘built-in’; 3) check for the influence of content length and 4) do not use multiple dictionaries to test the same statistical hypothesis.
-
-
-
A Weakly Supervised and Deep Learning Method for an Additive Topic Analysis of Large Corpora
Authors: Yair Fogel-Dror, Shaul R. Shenhav & Tamir SheaferAbstractThe collaborative effort of theory-driven content analysis can benefit significantly from the use of topic analysis methods, which allow researchers to add more categories while developing or testing a theory. This additive approach enables the reuse of previous efforts of analysis or even the merging of separate research projects, thereby making these methods more accessible and increasing the discipline’s ability to create and share content analysis capabilities. This paper proposes a weakly supervised topic analysis method that uses both a low-cost unsupervised method to compile a training set and supervised deep learning as an additive and accurate text classification method. We test the validity of the method, specifically its additivity, by comparing the results of the method after adding 200 categories to an initial number of 450. We show that the suggested method provides a foundation for a low-cost solution for large-scale topic analysis.
-
-
-
Statistical Power in Content Analysis Designs: How Effect Size, Sample Size and Coding Accuracy Jointly Affect Hypothesis Testing – A Monte Carlo Simulation Approach.
By Stefan GeißAbstractThis study uses Monte Carlo simulation techniques to estimate the minimum required levels of intercoder reliability in content analysis data for testing correlational hypotheses, depending on sample size, effect size and coder behavior under uncertainty. The ensuing procedure is analogous to power calculations for experimental designs. In most widespread sample size/effect size settings, the rule-of-thumb that chance-adjusted agreement should be ≥.80 or ≥.667 corresponds to the simulation results, resulting in acceptable α and β error rates. However, this simulation allows making precise power calculations that can consider the specifics of each study’s context, moving beyond one-size-fits-all recommendations. Studies with low sample sizes and/or low expected effect sizes may need coder agreement above .800 to test a hypothesis with sufficient statistical power. In studies with high sample sizes and/or high expected effect sizes, coder agreement below .667 may suffice. Such calculations can help in both evaluating and in designing studies. Particularly in pre-registered research, higher sample sizes may be used to compensate for low expected effect sizes and/or borderline coding reliability (e.g. when constructs are hard to measure). I supply equations, easy-to-use tables and R functions to facilitate use of this framework, along with example code as online appendix.
-
-
-
Down to a r/science: Integrating Computational Approaches to the Study of Credibility on Reddit
Authors: Austin Hubner, Jessica McKnight, Matthew Sweitzer & Robert BondAbstractDigital trace data enable researchers to study communication processes at a scale previously impossible. We combine social network analysis and automated content analysis to examine source and message factors’ impact on ratings of user-shared content. We found that the expertise of the author, the network position that the author occupies, and characteristics of the content the author creates have a significant impact on how others respond to that content. By observationally examining a large-scale online community, we provide a real-world test of how message consumers react to source and message characteristics. Our results show that it is important to think of online communication as occurring interactively between networks of individuals, and that the network positions people inhabit may inform their behavior.
-
-
-
A tool for tracking the propagation of words on Reddit
Authors: Tom Willaert, Paul Van Eecke, Jeroen Van Soest & Katrien BeulsAbstractThe data-driven study of cultural information diffusion in online (social) media is currently an active area of research. The availability of data from the web thereby generates new opportunities to examine how words propagate through online media and communities, as well as how these diffusion patterns are intertwined with the materiality and culture of social media platforms. In support of such efforts, this paper introduces an online tool for tracking the consecutive occurrences of words across subreddits on Reddit between 2005 and 2017. By processing the full Pushshift.io Reddit comment archive for this period (Baumgartner et al., 2020), we are able to track the first occurrences of 76 million words, allowing us to visualize which subreddits subsequently adopt any of those words over time. We illustrate this approach by addressing the spread of terms referring to famous internet controversies, and the percolation of alt-right terminology. By making our instrument and the processed data publically available, we aim to facilitate a range of exploratory analyses in computational social science, the digital humanities, and related fields.
-
Most Read This Month
Most Cited Most Cited RSS feed
-
-
Computational observation
Authors: Mario Haim & Angela Nienierza
-
- More Less