- Home
- A-Z Publications
- Computational Communication Research
- Previous Issues
- Volume 4, Issue 2, 2022
Computational Communication Research - Volume 4, Issue 2, 2022
Volume 4, Issue 2, 2022
-
-
OSD2F: An Open-Source Data Donation Framework
AbstractThe digital traces that people leave through their use of various online platforms provide tremendous opportunities for studying human behavior. However, the collection of these data is hampered by legal, ethical, and technical challenges. We present a framework and tool for collecting these data through a data donation platform where consenting participants can securely submit their digital traces. This approach leverages recent developments in data rights that have given people more control over their own data, such as legislation that now mandates companies to make digital trace data available on request in a machine-readable format. By transparently requesting access to specific parts of this data for clearly communicated academic purposes, the data ownership and privacy of participants is respected, and researchers are less dependent on commercial organizations that store this data in proprietary archives. In this paper we outline the general design principles, the current state of the tool, and future development goals.
-
-
-
A framework for privacy preserving digital trace data collection through data donation
Authors: Laura Boeschoten, Jef Ausloos, Judith E. Möller, Theo Araujo & Daniel L. OberskiAbstractA potentially powerful method of social-scientific data collection and investigation has been created by an unexpected institution: the law. Article 15 of the EU’s 2018 General Data Protection Regulation (GDPR) mandates that individuals have electronic access to a copy of their personal data, and all major digital platforms now comply with this law by providing users with “data download packages” (DDPs). Through voluntary donation of DDPs, all data collected by public and private entities during the course of citizens’ digital life can be obtained and analyzed to answer social-scientific questions – with consent. Thus, consented DDPs open the way for vast new research opportunities. However, while this entirely new method of data collection will undoubtedly gain popularity in the coming years, it also comes with its own questions of representativeness and measurement quality, which are often evaluated systematically by means of an error framework. Therefore, in this paper we provide a blueprint for digital trace data collection using DDPs, and devise a “total error framework” for such projects. Our error framework for digital trace data collection through data donation is intended to facilitate high quality social-scientific investigations using DDPs while critically reflecting its unique methodological challenges and sources of error. In addition, we provide a quality control checklist to guide researchers in leveraging the vast opportunities afforded by this new mode of investigation.
-
-
-
The Sentiment is in the Details
More LessAbstractDetermining the sentiment in the individual sentences of a newspaper article in an automated fashion is a major challenge. Manually created sentiment dictionaries often fail to meet the required standards. And while computer-generated dictionaries show promise, they are often limited by the availability of suitable linguistic resources. I propose and test a novel, language-agnostic and resource-efficient way of constructing sentiment dictionaries, based on word embedding models. The dictionaries are constructed and evaluated based on four corpora containing two decades of Danish, Dutch (Flanders and the Netherlands), English, and Norwegian newspaper articles, which are cleaned and parsed using Natural Language Processing. Concurrent validity is evaluated using a dataset of human-coded newspaper sentences, and compared to the performance of the Polyglot sentiment dictionaries. Predictive validity is tested through two long-standing hypotheses on the negativity bias in political news. Results show that both the concurrent validity and predictive validity is good. The dictionaries outperform their Polyglot counterparts, and are able to correctly detect a negativity bias, which is stronger for tabloids. The method is resource-efficient in terms of manual labor when compared to manually constructed dictionaries, and requires a limited amount of computational power.
-
-
-
Fifteen Seconds of Fame: TikTok and the Supply Side of Social Video
Authors: Benjamin Guinaudeau, Kevin Munger & Fabio VottaAbstractTikTok has rapidly developed from a punchline for jokes about “kids these days” into a formidable force in American politics. The speed of this development is unprecedented, even in the rapidly-changing world of digital politics. Through a combination of hashtag and snowball sampling, we identify 11,546 TikTok accounts who primarily post about politics, allowing us to analyze trends in the posting, viewing and commenting behavior on 1,998,642 tiktoks they have uploaded. We test a number of theories about how the unique combination of affordances on TikTok shapes how it is used for political communication. Compared to the dominant platform for political videos (YouTube) we find that a higher percentage of TikTok users upload videos, TikTok view counts are more dominated by virality, and viewership of videos are less dependent on a given accounts’ number of followers/subscribers. We discuss how these findings affect the production of content that ultimately determines the experience of TikTok consumers.
-
-
-
Talking politics: Building and validating data-driven lexica to measure political discussion quality
By Kokil JaidkaAbstractSocial media data offers computational social scientists the opportunity to understand how ordinary citizens engage in political activities, such as expressing their ideological stances and engaging in policy discussions. This study curates and develops discussion quality lexica from the Corpus for the Linguistic Analysis of Political Talk ONline (CLAPTON).
Supervised machine learning classifiers to characterize political talk are evaluated for out-of-sample label prediction and generalizability to new contexts. The approach yields data-driven lexica, or dictionaries, that can be applied to measure the constructiveness, justification, relevance, reciprocity, empathy, and incivility of political discussions. In addition, the findings illustrate how the choices made in training such classifiers, such as the heterogeneity of the data, the feature sets used to train classifiers, and the classification approach, affect their generalizability. The article concludes by summarizing the strengths and weaknesses of applying machine learning methods to social media posts and theoretical insights into the quality and structure of online political discussions.
-
-
-
Beyond Counting Words
Authors: Anne C. Kroon, Toni van der Meer & Rens VliegenthartAbstractTopics and frames are at the heart of various theories in communication science and other social sciences, making their measurement of key interest to many scholars. The current study compares and contrasts two main deductive computational approaches to measure policy topics and frames: Dictionary (lexicon) based identification, and supervised machine learning. Additionally, we introduce domain-specific word embeddings to these classification tasks. Drawing on a manually coded dataset of Dutch news articles and parliamentary questions, our results indicate that supervised machine learning outperforms dictionary-based classification for both tasks. Furthermore, results show that word embeddings may boost performance at relatively low cost by introducing relevant and domain-specific semantic information to the classification model.
-
-
-
The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research
Authors: Stijn Peeters & Sal HagenAbstractThis paper introduces the 4CAT Capture and Analysis Toolkit (4CAT), an open-source Web-based research tool. 4CAT can capture data from a variety of online sources (including Twitter, Telegram, Reddit, 4chan, 8kun, BitChute, Douban and Parler) and analyze them through analytical processors. 4CAT seeks to make robust data capture and analysis available to researchers not familiar with computer programming, without ‘black-boxing’ the implemented research methods. Before outlining the practical use of 4CAT, we discuss three ‘affordances’ that inform its design: modularity, transparency, and traceability. 4CAT is modular because new data sources and analytical processors can be easily added and changed; transparent because it aims to render legible its inner workings; and traceable because of automatic and shareable documentation of intermediate analysis steps. We then show how 4CAT operationalizes these features through a description of its general setup and a short walkthrough. Finally, we discuss how 4CAT strives for an ‘ethics by design’ development philosophy that enables ethically sound data-driven research. 4CAT is then positioned as both an answer to and a further call for ‘tool criticism’ in computational social research.
-
-
-
Do I sound American?
Authors: Jiyoun Suk, Josephine Lukito, Min-Hsin Su, Sang Jung Kim, Chau Tong, Zhongkai Sun & Prathusha SarmaAbstractOngoing research into how states coordinate foreign disinformation campaign has raised concerns over social media’s influence on democracies. One example is the spread of Russian disinformation in the 2016 US presidential election. Russia’s Internet Research Agency (IRA) Twitter accounts have been known to deliver messages with strategic attempts and political goals. We use publicly available IRA Twitter data created during and after the 2016 US election campaign (2016 and 2017) to examine the nature of strategic message features of foreign-sponsored online disinformation and their social media sharing. We use computational approaches to identify unique syntactic features of online disinformation tweets from IRA compared to American Twitter corpora, reflecting their functional and situational differences. More importantly, we examine what message features in IRA tweets across syntax, topic, and sentiment were associated with more sharing (retweets). Implications are discussed.
-
Most Read This Month
Most Cited Most Cited RSS feed
-
-
Computational observation
Authors: Mario Haim & Angela Nienierza
-
- More Less