2004
Volume 4, Issue 2
  • E-ISSN: 2665-9085

Abstract

Abstract

Determining the sentiment in the individual sentences of a newspaper article in an automated fashion is a major challenge. Manually created sentiment dictionaries often fail to meet the required standards. And while computer-generated dictionaries show promise, they are often limited by the availability of suitable linguistic resources. I propose and test a novel, language-agnostic and resource-efficient way of constructing sentiment dictionaries, based on word embedding models. The dictionaries are constructed and evaluated based on four corpora containing two decades of Danish, Dutch (Flanders and the Netherlands), English, and Norwegian newspaper articles, which are cleaned and parsed using Natural Language Processing. Concurrent validity is evaluated using a dataset of human-coded newspaper sentences, and compared to the performance of the Polyglot sentiment dictionaries. Predictive validity is tested through two long-standing hypotheses on the negativity bias in political news. Results show that both the concurrent validity and predictive validity is good. The dictionaries outperform their Polyglot counterparts, and are able to correctly detect a negativity bias, which is stronger for tabloids. The method is resource-efficient in terms of manual labor when compared to manually constructed dictionaries, and requires a limited amount of computational power.

Loading

Article metrics loading...

/content/journals/10.5117/CCR2022.2.003.VRIE
2022-10-01
2024-11-08
Loading full text...

Full text loading...

/deliver/fulltext/26659085/4/2/CCR2022.2.003.VRIE.html?itemId=/content/journals/10.5117/CCR2022.2.003.VRIE&mimeType=html&fmt=ahah

References

  1. Alba, A., Gruhl, D., Ristoski, P., & Welch, S. (2018). Interactive dictionary expansion using neural language models. HumL@ ISWC, 7–15.
    [Google Scholar]
  2. Aldayel, A., & Magdy, W. (2021). Stance detection on social media: State of the art and trends. Information Processing & Management, 58 (4), 102597. https://doi.org/10.1016/j.ipm.2021.102597
    [Google Scholar]
  3. Alhothali, A., & Hoey, J. (2017). Semi-Supervised Affective Meaning Lexicon Expansion Using Semantic and Distributed Word Representations. arXiv:1703.09825 [Cs]. https://arxiv.org/abs/1703.09825
    [Google Scholar]
  4. Almeida, F., & Xexéo, G. (2019). Word Embeddings: A Survey. arXiv:1901.09069 [Cs, Stat]. https://arxiv.org/abs/1901.09069
    [Google Scholar]
  5. Al-Rfou, R., Perozzi, B., & Skiena, S. (2013). Polyglot: Distributed Word Representations for Multilingual NLP. 10.
    [Google Scholar]
  6. Amsler, M. (2020). Using Lexical-Semantic Concepts for Fine-Grained Classification in the Embedding Space [PhD thesis]. University of Zurich.
    [Google Scholar]
  7. Bleich, E., & van der Veen, A. M. (2018). Media portrayals of Muslims: A comparative sentiment analysis of American newspapers, 1996–2015. Politics, Groups, and Identities, 1–20. https://doi.org/10.1080/21565503.2018.1531770
    [Google Scholar]
  8. Boukes, M., van de Velde, B., Araujo, T., & Vliegenthart, R. (2020). What’s the Tone? Easy Doesn’t Do It: Analyzing Performance and Agreement Between Off-the-Shelf Sentiment Analysis Tools. Communication Methods and Measures, 14 (2), 83–104. https://doi.org/10.1080/19312458.2019.1671966
    [Google Scholar]
  9. Chen, Y., & Skiena, S. (2014). Building Sentiment Lexicons for All Major Languages. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 383–389. https://doi.org/10.3115/v1/P14-2063
    [Google Scholar]
  10. de Vreese, C., Esser, F., & Hopmann, D. N. (2016). Comparing Political Journalism. Routledge. https://doi.org/10.4324/9781315622286
    [Google Scholar]
  11. Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. Studies in Linguistic Analysis.
    [Google Scholar]
  12. Glogger, I. (2019). Soft Spot for Soft News? Influences of Journalistic Role Conceptions on Hard and Soft News Coverage. Journalism Studies, 20 (16), 2293–2311. https://doi.org/10.1080/1461670X.2019.1588149
    [Google Scholar]
  13. Hallin, D. C., & Mancini, P. (2004). Comparing Media Systems: Three Models of Media and Politics. In Cambridge Core. /core/books/comparing-mediasystems/B7A12371782B7A1D62BA1A72C1395E43; Cambridge University Press. https://doi.org/10.1017/CBO9780511790867
    [Google Scholar]
  14. Hlavac, M. (2018). Stargazer: Well-Formatted Regression and Summary Statistics Tables.https://CRAN.R-project.org/package=stargazer.
    [Google Scholar]
  15. Khoo, C. S., & Johnkhan, S. B. (2018). Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons. Journal of Information Science, 44 (4), 491–511. https://doi.org/10.1177/0165551517703514
    [Google Scholar]
  16. Lengauer, G., Esser, F., & Berganza, R. (2012). Negativity in political news: A review of concepts, operationalizations and key findings. Journalism, 13 (2), 179–202. https://doi.org/10.1177/1464884911427800
    [Google Scholar]
  17. Makki, R., Brooks, S., & Milios, E. E. (2014). Context-specific sentiment lexicon expansion via minimal user interaction. 2014 International Conference on Information Visualization Theory and Applications (IVAPP), 178–186.
    [Google Scholar]
  18. Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–751.
    [Google Scholar]
  19. Mohammad, S. M. (2016). Sentiment analysis: Detecting valence, emotions, and other affectual states from text. In H.Meiselman (Ed.), Emotion measurement. Elsevier.
    [Google Scholar]
  20. Muddiman, A., McGregor, S. C., & Stroud, N. J. (2019). (Re)Claiming Our Expertise: Parsing Large Text Corpora With Manually Validated and Organic Dictionaries. Political Communication, 36 (2), 214–226. https://doi.org/10.1080/10584609.2018.1517843
    [Google Scholar]
  21. Nivre, J., Abrams, M., Agić, Ž., Ahrenberg, L., Antonsen, L., Aplonova, K., Aranzabe, M. J., Arutie, G., Asahara, M., Ateyah, L., Attia, M., Atutxa, A., Augustinus, L., Badmaeva, E., Ballesteros, M., Banerjee, E., Bank, S., Barbu Mititelu, V., Basmov, V., & Zhu, H. (2018). Universal dependencies 2.3.
    [Google Scholar]
  22. Otto, L., Glogger, I., & Boukes, M. (2017). The Softening of Journalistic Political Communication: A Comprehensive Framework Model of Sensationalism, Soft News, Infotainment, and Tabloidization. Communication Theory, 27 (2), 136–155. https://doi.org/10.1111/comt.12102
    [Google Scholar]
  23. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162
    [Google Scholar]
  24. Proksch, S.-O., Lowe, W., Wäckerle, J., & Soroka, S. (2019). Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches. Legislative Studies Quarterly, 44 (1), 97–131. https://doi.org/10.1111/lsq.12218
    [Google Scholar]
  25. Reinemann, C., Stanyer, J., Scherr, S., & Legnante, G. (2012). Hard and soft news: A review of concepts, operationalizations and key findings. Journalism, 13 (2), 221–239. https://doi.org/10.1177/1464884911427803
    [Google Scholar]
  26. Rheault, L., Beelen, K., Cochrane, C., & Hirst, G. (2016). Measuring Emotion in Parliamentary Debates with Automated Textual Analysis. PLOS ONE, 11 (12), e0168843. https://doi.org/10.1371/journal.pone.0168843
    [Google Scholar]
  27. Rudkowsky, E., Haselmayer, M., Wastian, M., Jenny, M., Emrich, Š., & Sedlmair, M. (2018). More than Bags of Words: Sentiment Analysis with Word Embeddings. Communication Methods and Measures, 12 (2-3), 140–157. https://doi.org/10.1080/19312458.2018.1455817
    [Google Scholar]
  28. Shi, T., Malioutov, I., & İrsoy, O. (2020). Semantic Role Labeling as Syntactic Dependency Parsing. arXiv:2010.11170 [Cs]. https://arxiv.org/abs/2010.11170
    [Google Scholar]
  29. Soroka, S., Young, L., & Balmas, M. (2015). Bad News or Mad News? Sentiment Scoring of Negativity, Fear, and Anger in News Content. The ANNALS of the American Academy of Political and Social Science, 659 (1), 108–121. https://doi.org/10.1177/0002716215569217
    [Google Scholar]
  30. Straka, M., & Straková, J. (2017). Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 88–99.
    [Google Scholar]
  31. van Atteveldt, W., Sheafer, T., Shenhav, S. R., & Fogel-Dror, Y. (2017). Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War. Political Analysis, 25 (02), 207–222. https://doi.org/10.1017/pan.2016.12.
    [Google Scholar]
  32. van Atteveldt, W., van der Velden, M. A. C. G., & Boukes, M. (2021). The Validity of Sentiment Analysis: Comparing Manual Annotation, Crowd-Coding, Dictionary Approaches, and Machine Learning Algorithms. Communication Methods and Measures, 15 (2), 121–140. https://doi.org/10.1080/19312458.2020.1869198
    [Google Scholar]
  33. Young, L., & Soroka, S. (2012). Affective News: The Automated Coding of Sentiment in Political Texts. Political Communication, 29 (2), 205–231. https://doi.org/10.1080/10584609.2012.671234
    [Google Scholar]
/content/journals/10.5117/CCR2022.2.003.VRIE
Loading
/content/journals/10.5117/CCR2022.2.003.VRIE
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error