Extracting semantic relations using syntax: An R package for querying and reshaping dependency trees.

Kasper Welbers; Wouter van Atteveldt; Jan Kleinnijenhuis

doi:10.5117/CCR2021.2.003.WELB

E-ISSN: 2665-9085

oa Extracting semantic relations using syntax

An R package for querying and reshaping dependency trees.
Auteurs: Kasper Welbers¹, Wouter van Atteveldt² & Jan Kleinnijenhuis³
View Affiliations Hide Affiliations

Affiliations: ¹ VU University Amsterdam ² VU University Amsterdam ³ VU University Amsterdam
Publisher: Amsterdam University Press
Source: Computational Communication Research, Volume 3, Issue 2, okt. 2021, p. 1 - 16
DOI: https://doi.org/10.5117/CCR2021.2.003.WELB
Taal: Engels

Samenvatting

Abstract

Most common methods for automatic text analysis in communication science ignore syntactic information, focusing on the occurrence and co-occurrence of individual words, and sometimes n-grams. This is remarkably effective for some purposes, but poses a limitation for fine-grained analyses into semantic relations such as who does what to whom and according to what source. One tested, effective method for moving beyond this bag-of-words assumption is to use a rule-based approach for labeling and extracting syntactic patterns in dependency trees. Although this method can be used for a variety of purposes, its application is hindered by the lack of dedicated and accessible tools. In this paper we introduce the rsyntax R package, which is designed to make working with dependency trees easier and more intuitive for R users, and provides a framework for combining multiple rules for reliably extracting useful semantic relations.

Article metrics loading...

/content/journals/10.5117/CCR2021.2.003.WELB

2021-10-01

2024-11-09

Full text loading...

/deliver/fulltext/26659085/3/2/CCR2021.2.003.WELB.html?itemId=/content/journals/10.5117/CCR2021.2.003.WELB&mimeType=html&fmt=ahah

References

Barberá, P., Boydstun, A., Linn, S., McMahon, R., & Nagler, J. (2016). Methodological challenges in estimating tone: Application to news coverage of the us economy. In Meeting of the midwest political science association, chicago, il..
[Google Scholar]
Benoit, K., & Matsuo, A. (2017). spacyr: R wrapper to the spacy nlp library [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=spacyr (R package version 0.9.0).
[Google Scholar]
Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4 (1), 8–23.
[Google Scholar]
Chen, D., & Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp) (pp. 740–750).
[Google Scholar]
Fundel, K., Küffner, R., & Zimmer, R. (2007). Relex—relation extraction using dependency parse trees. Bioinformatics, 23 (3), 365–371.
[Google Scholar]
Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, 10 (1), 1–309.
[Google Scholar]
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21 (3), 267–297. doi: 10.1093/pan/mps028.
[Google Scholar]
Harris, Z. S. (1957). Co-occurrence and transformation in linguistic structure. Language, 33 (3), 283–340.
[Google Scholar]
He, L., Lee, K., Lewis, M., & Zettlemoyer, L. (2017). Deep semantic role labeling: What works and what ’s next. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers) (Vol. 1, pp. 473–483).
[Google Scholar]
Honnibal, M., & Johnson, M. (2015, September). An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1373–1378). Lisbon, Portugal: Association for Computational Linguistics. Retrieved from https://aclweb.org/anthology/D/D15/D15-1162.
[Google Scholar]
Jurafsky, D., & James, H. M. (2020). Speech and language processing. 3rd edn. draft. Online: https://web.stanford.edu/˜ jurafsky/slp3.
[Google Scholar]
Kleinnijenhuis, J., van Hoof, A. M., & van Atteveldt, W. (2019). The combined effects of mass media and social media on political perceptions and preferences. Journal of Communication, 69 (6), 650–673.
[Google Scholar]
Koopmans, R., & Statham, P. (1999). Political claims analysis: Integrating protest event and political discourse approaches. Mobilization: an international quarterly, 4 (2), 203–221.
[Google Scholar]
Kübler, S., McDonald, R., & Nivre, J. (2009). Dependency parsing. Synthesis Lectures on Human Language Technologies, 1 (1), 1–127.
[Google Scholar]
Lasswell, H. D., & Leites, N. (1965). Language of politics: Studies in quantitative semantics (2nd ed.). Cambridge, MA: MIT Press.
[Google Scholar]
Luotolahti, J., Kanerva, J., Pyysalo, S., & Ginter, F. (2015). Sets: Scalable and efficient tree search in dependency graphs. In Proceedings of the 2015 conference of the north american chapter of the association for computational linguistics: Demonstrations (pp. 51–55).
[Google Scholar]
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The stanford corenlp natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55–60).
[Google Scholar]
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., & Stoyanov, V. (2016). Semeval-2016 task 4: Sentiment analysis in twitter. In Proceedings of the 10th international workshop on semantic evaluation (semeval-2016) (pp. 1–18).
[Google Scholar]
Nivre, J., De Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., ... others (2016). Universal dependencies v1: A multilingual treebank collection. In Lrec.
[Google Scholar]
Osgood, C. E., Saporta, S., & Nunnally, J. C. (1956). Evaluative assertion analysis. Litera.
[Google Scholar]
Poria, S., Cambria, E., Ku, L.-W., Gui, C., & Gelbukh, A. (2014). A rule-based approach to aspect extraction from product reviews. In Proceedings of the second workshop on natural language processing for social media (socialnlp) (pp. 28–37).
[Google Scholar]
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., ... Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58 (4), 1064–1082.
[Google Scholar]
Schrodt, P. A. (2015). Comparing methods for generating large scale political event data sets. presented at the Text as Data meetings, New York University, 16-17 October 2015.
[Google Scholar]
Schuster, S., & Manning, C. D. (2016). Enhanced english universal dependencies: An improved representation for natural language understanding tasks. In Proceedings of the tenth international conference on language resources and evaluation (lrec ’16) (pp. 2371–2378).
[Google Scholar]
Shlain, M., Taub-Tabib, H., Sadde, S., & Goldberg, Y. (2020). Syntactic search by example. arXiv preprint arXiv:2006.03010.
[Google Scholar]
Siddharthan, A., & Mandya, A. A. (2014). Hybrid text.simplification using synchronous dependency grammars with hand-written and automatically harvested rules. In Proceedings of the 14th conference of the european chapter of the association for computational linguistics (eacl 2014)..
[Google Scholar]
Straka, M., & Straková, J. (2017, August). Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe. In Proceedings of the conll 2017 shared task: Multilingual parsing from raw text to universal dependencies (pp. 88–99). Vancouver, Canada: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/K/K17/K17-3009.pdf.
[Google Scholar]
Van Atteveldt, W. (2008). Semantic network analysis: Techniques for extracting, representing, and querying media content (dissertation). BookSurge.
[Google Scholar]
Van Atteveldt, W., Sheafer, T., Shenhav, S. R., & Fogel-Dror, Y. (2017). Clause analysis: using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008–2009 gaza war. Political Analysis, 1–16.
[Google Scholar]
Van Cuilenburg, J. J., Kleinnijenhuis, J., & De Ridder, J. A. (1986). A theory of evaluative discourse: Towards a graph theory of journalistic texts. European Journal of Communication, 1 (1), 65–96.
[Google Scholar]
Wijffels, J. (2019). udpipe: Tokenization, parts of speech tagging, lemmatization and dependency parsing with the ‘udpipe’ ‘nlp’ toolkit [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=udpipe (R package version 0.8.3).
[Google Scholar]
Wueest, B., Clematide, S., Bünzli, A., Laupper, D., & Frey, T. (2011). Electoral campaigns and relation mining: Extracting semantic network data from newspaper articles. Journal of Information Technology & Politics, 8 (4), 444–463.
[Google Scholar]
Zhou, J., & Xu, W. (2015). End-to-end learning of semantic role labeling using recurrent neural networks. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers) (Vol. 1, pp. 1127–1137).
[Google Scholar]

/content/journals/10.5117/CCR2021.2.003.WELB

Extracting semantic relations using syntax

CCR 3, 1 (2021); https://doi.org/10.5117/CCR2021.2.003.WELB

/content/journals/10.5117/CCR2021.2.003.WELB

Data & Media loading...

oa Extracting semantic relations using syntax

An R package for querying and reshaping dependency trees.

Samenvatting

Most Read This Month

Most Cited Most Cited RSS feed

Computational observation

A framework for privacy preserving digital trace data collection through data donation

Conversational Agent Research Toolkit

Fifteen Seconds of Fame: TikTok and the Supply Side of Social Video

Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment

The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research

Detecting Impoliteness and Incivility in Online Discussions

OSD2F: An Open-Source Data Donation Framework

How Document Sampling and Vocabulary Pruning Affect the Results of Topic Models

Opinion-based Homogeneity on YouTube