- Home
- A-Z Publications
- Computational Communication Research
- Previous Issues
- Volume 5, Issue 2, 2023
Computational Communication Research - Volume 5, Issue 2, 2023
Volume 5, Issue 2, 2023
-
-
C-MFD 2.0: Developing a Chinese Moral Foundation Dictionary
Authors: Calvin Yixiang Cheng & Weiyu ZhangOur paper describes the development of a Chinese Moral Foundation Dictionary (C- MFD 2.0) to assist the automated detection and analysis of moral intuition in the Chinese context. Taking the approach of method-theory co-development, we follow a human-led and machine-assisted procedure that emphasizes opportunities for theoretical advancement, which enhances the validity of the computational tool. Starting from the existing Chinese translation of the established English MFDs, we enriched the dictionary with a list of unique Chinese moral-relevant words from an extensive Chinese dictionary. Four experts went through four rounds of coding, followed by validation by 202 crowd coders. The C-MFD 2.0 identifies not only the classic five moral foundations but also proposes several potential moral foundation candidates in the Chinese context. It demonstrates acceptable validity in detecting moral foundation values from Chinese texts. Our paper illustrates how a non-English moral foundation dictionary can be built for computational analysis and furthers moral foundation theory in a cross-language context.
-
-
-
Validating a Mixed-Method Approach for Multilingual News Framing Analysis: A case study of COVID-19
Authors: Gulsah Akcakir, Yanru Jiang, Jun Luo & Seonhye NohA reliable multilingual news framing analysis can shed light on the similarities and dissimilarities of journalistic practices across geographical areas and cultures, facilitating comparative studies in media discourse and framing analysis. However, there is an insufficient supply of standardizing and validating computational approaches for multilingual text analysis. To fill the gap, this study validates a multi-stage mixed-method approach, Analysis of Topic Model Networks (ANTMN), a novel approach that identifies media frames by integrating Latent Dirichlet Allocation (LDA) topic modeling and network analysis, for the news framing analysis in a multilingual context. Our multilingual ANTMN analysis standardizes the pipeline for data collection, cleaning, preprocessing, and validity assessment. By applying this approach to COVID-19 news from seven countries and regions (Germany, Hong Kong, South Korea, Taiwan, Turkey, Uganda, and the United States), this study demonstrates the reliability and validity of multilingual ANTMN analysis both quantitatively and qualitatively. We observe that ANTMN framing analysis is robust to variation in data sources and has the ability to produce generalizable results across multilingual corpora.
-
-
-
Going cross-lingual: A guide to multilingual text analysis
Authors: Hauke Licht & Fabienne LindText-as-data methods have revolutionized the study of political behavior and communication, and the increasing availability of multilingual text collections promises exciting new applications of these methods in comparative research. To encourage researchers to seize these opportunities, we provide a guide to multilingual quantitative text analysis. Responding to the unique challenges research faces in multilingual analysis, we provide a systematic overview of multilingual text analysis methods developed for political and communication science research. To structure this overview, we distinguish between separate analysis, input alignment, and anchoring approaches to cross-lingual text analysis. We then compare these approaches’ resource intensiveness and discuss the strategies they offer for approaching measurement equivalence. We argue that to ensure valid measurement across languages and contexts, researchers should reflect on these aspects when choosing between approaches. We conclude with an outlook on future directions for method development and potential fields of applications. Overall, our contribution helps political and communication scientists to navigate the field of multilingual text analysis and gives impulses for their wider adoption and further development.
-
-
-
Evaluating Transferability in Multilingual Text Analyses
Authors: Justin Chun-ting Ho & Chung-hong ChanMultilingual text analysis is increasingly important to address the current narrow focus of English and other Indo-European languages in comparative studies. However, there has been a lack of a comprehensive approach to evaluate the validity of multilingual text analytic methods across different language contexts. To address this issue, we propose that the validity of multilingual text analysis should be studied through the lens of transferability, which assesses the extent to which the performance of a multilingual text analytic method can be maintained when switching from one language context to another. We first formally conceptualize transferability in multilingual text analysis as a measure of whether the method is equivalent across language groups (linguistic transferability) and societal contexts (contextual transferability). We propose a model-agnostic approach to evaluate transferability using (1) natural and synthetic data pairs, (2) manual annotation of errors, and (3) the Local Interpretable Model-Agnostic Explanations (LIME) technique. As an application of our approach, we analyze the transferability of a multilingual BERT (mBERT) model fine-tuned with annotated manifestos and media texts from five Indo-European language-speaking countries of the Comparative Agendas Project. The transferability is then evaluated using natural and synthetic parliamentary data from the UK, Basque, Hong Kong, and Taiwan. Through the evaluation of transferability, this study sheds light on the common causes that lead to prediction errors in multilingual text classification using mBERT.
-
-
-
URLs Can Facilitate Machine Learning Classification of News Stories Across Languages and Contexts
Authors: Ernesto de León, Susan Vermeer & Damian TrillingComparative scholars studying political news content at scale face the challenge of addressing multiple languages. While many train individual supervised machine learning classifiers for each language, this is a costly and time-consuming process. We propose that instead of relying on thematic labels generated by manual coding, researchers can use ‘distant’ labels created by cues in article URLs. Sections reflected in URLs (e.g., nytimes.com/politics/) can therefore help create training material for supervised machine learning classifiers. Using cues provided by news media organizations, such an approach allows for efficient political news identification at scale while facilitating implementation across languages. Using a dataset of approximately 870,000 URLs of news-related content from four countries (Italy, Germany, Netherlands, and Poland), we test this method by providing a comparison to ‘classical’ supervised machine learning and a multilingual BERT model, across four news topics. Our results suggest that the use of URL section cues to distantly annotate texts provides a cheap and easy-to- implement way of classifying large volumes of news texts that can save researchers many valuable resources without having to sacrifice quality.
-
-
-
Selecting Relevant Documents for Multilingual Content Analysis: An Evaluation of Keyword and Semantic Similarity Search Approaches
Authors: Sean Palicki, Stefanie Walter, Wouter van Atteveldt, Alice Beazer & Isaac BravoComparative research in communication often involves selecting and analyzing documents in multiple languages. Machine translation is an effective preprocessing step for automated content analysis, however its impact on data collection remains under-examined. Using a parallel language corpus of European Parliament debates, this paper evaluates machine translation as an approach for multilingual document retrieval, i.e., selecting documents for analysis. We compare several strategies for retrieving relevant multilingual documents, including 1) expert-validated search queries, 2) machine translated search queries, and 3) multilingual semantic similarity search, comparing them against monolingual searches, and describing how these strategies can impact results from topic modeling. Results show that expert-validated search queries achieve reliable results across languages, while the accuracy of machine translated search queries varies significantly between languages and impacts further analyses. Whereas semantic similarity search retrieved a similar subset of relevant documents across languages, results were less accurate than keyword approaches. In sum, validated translations of search queries can be effective for multilingual document retrieval, but errors can lead to systematic bias in further analysis results. These results are important for researchers seeking opportunities to introduce, validate and generalize findings and theories beyond English-speaking countries.
-
-
-
Machine Translation as an Underrated Ingredient? Solving Classification Tasks with Large Language Models for Comparative Research
Authors: Akos Mate, Miklós Sebők, Lukasz Wordliczek, Dariusz Stolicki & Ádám FeldmannWhile large language models have revolutionised computational text analysis methods, the field is still tilted towards English language resources. Even as there are pre-trained models for some smaller languages, the coverage is far from universal, and pre-training large language models is an expensive and complicated task. This uneven language coverage limits comparative social research in terms of its geographical and linguistic scope. We propose a solution that sidesteps these issues by leveraging transfer learning and open-source machine translation. We use English as a bridge language between Hungarian and Polish bills and laws to solve a classification task related to the Comparative Agendas Project (CAP) coding scheme. Using the Hungarian corpus as training data for model fine-tuning, we categorise the Polish laws into 20 CAP categories. In doing so, we compare the performance of Transformer-based deep learning models (monolinguals, such as BERT, and multilinguals such as XLM-RoBERTa) and machine learning algorithms (e.g., SVM). Results show that the fine-tuned large language models outperform the traditional supervised learning benchmarks but are themselves surpassed by the machine translation approach. Overall, the proposed solution demonstrates a viable option for applying a transfer learning framework for low-resource languages and achieving state-of-the-art results without requiring expensive pre-training.
-
-
-
Lowering the Language Barrier: Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts
Authors: Moritz Laurer, Wouter van Atteveldt, Andreu Casas & Kasper WelbersThe social science toolkit for computational text analysis is still very much in the making. We know surprisingly little about how to produce valid insights from large amounts of multilingual texts for comparative social science research. In this paper, we test several recent innovations from deep transfer learning to help advance the computational toolkit for social science research in multilingual settings. We investigate the extent to which ‘prior language and task knowledge’ stored in the parameters of modern language models is useful for enabling multilingual research; we investigate the extent to which these algorithms can be fruitfully combined with machine translation; and we investigate whether these methods are not only accurate but also practical and valid in multilingual settings – three essential conditions for lowering the language barrier in practice. We use two datasets with texts in 12 languages from 27 countries for our investigation. Our analysis shows, that, based on these innovations, supervised machine learning can produce substantively meaningful outputs. Our BERT-NLI model trained on only 674 or 1674 texts in only one or two languages can validly predict political party families’ stances towards immigration in eight other languages and ten other countries.
-
-
-
Word-level machine translation for bag-of-words text analysis: Cheap, fast, and surprisingly good
More LessThe quality of automated machine translation is rapidly approaching that of professional human translation. However, the best methods remain costly in terms of money, computational resources, and/or time, particularly when applied to large volumes of text. In contrast, word-level translation is both free and fast, simply mapping each word in a source language deterministically to a target language. This paper demonstrates that high-quality word-level translation dictionaries can be generated cheaply and easily, and that they produce translations that can serve reliably as inputs into some of the most common automated text analysis methods. It advances the field on two fronts: it assesses different techniques for creating word-level translation dictionaries, and it systematically compares the similarity of word-level translations against those produced by either state-of-the-art neural machine translation or professional human translation. Comparisons are performed for three common text analysis tasks — sentiment analysis, dictionary-based content analysis, and topic modeling — across a total of eleven different source languages and two target languages (English and French). Across all languages and tasks, word-level dictionaries perform sufficiently well to make them an attractive alternative when resource constraints make neural machine translation inaccessible. The translation dictionaries as well as the code used to generate and validate them are available on Github.
-
-
-
Back to the basics: Applying multilingual dictionary analysis to the Comparative Manifesto Project corpus
By Joshua CovaFor researchers interested in political communication and electoral politics, the Comparative Manifesto Project (CMP) is a widely-used database, which contains an extensive repository of annotated electoral manifestos. However, by relying on the fact that country-specific expert coders assign text excerpts to different policy areas, researchers frequently fail to engage with the multilingual nature of the corpus. This article uses a deductively-defined multilingual dictionary analysis to extract information on a set of electorally salient policies in seven European countries. In this application, I not only show that inter-coder reliability tests are encouragingly high and can be conducted relatively rapidly, but I also illustrate that the usage of multilingual policy-specific dictionary analysis is linked with high precision and high recall metrics. This approach can not only be helpful in examining electoral issue salience, but can also help in uncovering the language and the framing that political parties employ when discussing different policies.
-
Most Read This Month
Most Cited Most Cited RSS feed
-
-
Computational observation
Authors: Mario Haim & Angela Nienierza
-
- More Less