Schaalvergroting in het syntactische alternantieonderzoek: Een nieuwe analyse van het presentatieve er met automatisch gegenereerde predictoren

Dirk Speelman; Stefan Grondelaers; Benedikt Szmrecsanyi; Kris Heylen

doi:10.5117/NEDTAA2020.1.005.SPEE

ISSN: 1384-5845
E-ISSN: 2352-1171

oa Schaalvergroting in het syntactische alternantieonderzoek

Een nieuwe analyse van het presentatieve er met automatisch gegenereerde predictoren
By Dirk Speelman, Stefan Grondelaers, Benedikt Szmrecsanyi & Kris Heylen
Publisher: Amsterdam University Press
Source: Nederlandse Taalkunde, Volume 25, Issue 1, Apr 2020, p. 101 - 123
DOI: https://doi.org/10.5117/NEDTAA2020.1.005.SPEE
Language: English
- Published online: 01 Apr 2020

Abstract

In this paper, we revisit earlier analyses of the distribution of er ‘there’ in adjunct-initial sentences to demonstrate the merits of computational upscaling in syntactic variation research. Contrary to previous studies, in which major semantic and pragmatic predictors (viz. adjunct type, adjunct concreteness, and verb specificity) had to be coded manually, the present study operationalizes these predictors on the basis of distributional analysis: instead of hand-coding for specific semantic classes, we determine the semantic class of the adjunct, verb, and subject automatically by clustering the lexemes in those slots on the basis of their ‘semantic passport’ (as established on the basis of their distributional behaviour in a reference corpus). These clusters are subsequently interpreted as proxies for semantic classes. In addition, the pragmatic factor ‘subject predictability’ is operationalized automatically on the basis of collocational attraction measures, as well as distributional similarity between the other slots and the subject. We demonstrate that the distribution of er can be modelled equally successfully with the automated approach as in manual annotation-based studies. Crucially, the new method replicates our earlier findings that the Netherlandic data are easier to model than the Belgian data, and that lexical collocations play a bigger role in the Netherlandic than in the Belgian data. On a methodological level, the proposed automatization opens up a window of opportunities. Most important is its scalability: it allows for a larger gamut of alternations that can be investigated in one study, and for much larger datasets to represent each alternation.

Article metrics loading...

/content/journals/10.5117/NEDTAA2020.1.005.SPEE

2020-04-01

2025-05-01

The full text of this item is not currently available.

References

Arnold, Jennifer E., ThomasWasow, RyanGinstrom & AnthonyLosongco(2000). Heaviness vs. Newness: the effects of structural complexity and discourse status on constituent ordering. Language76, 28-55.
[Google Scholar]
Baayen, R. Harald(2011). Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics11, 295-328.
[Google Scholar]
Bresnan, Joan, AnnaCueni, TatianaNikitina & R. HaraldBaayen(2007). Predicting the dative alternation. In: GerlofBouma, IreneKrämer & JoostZwarts (Eds.), Cognitive Foundations of Interpretation. Amsterdam: Royal Netherlands Academy of Arts and Sciences, 69-94.
[Google Scholar]
Daelemans, Walter & AntalVan den Bosch(2005). Memory-based language processing. Cambridge: Cambridge University Press.
[Google Scholar]
De Sutter, Gert, DirkSpeelman & DirkGeeraerts(2005). Regionale en stilistische effecten op de woordvolgorde in werkwoordelijke eindgroepen. Nederlandse Taalkunde10, 97-128.
[Google Scholar]
De Troij, Robbert, StefanGrondelaers, DirkSpeelman & AntalVan den Bosch (geaccepteerd). Lexicon or grammar? Using Memory-Based Learning to investigate the syntactic relationship between Belgian and Netherlandic Dutch. Te verschijnen in Natural Language Engineering.
[Google Scholar]
Everitt, Brian S., SabineLandau, and MorvenLeese(2001). Cluster analysis. London: Arnold.
[Google Scholar]
Evert, Stefan.(2009). Corpora and collocations. In: AnkeLüdeling & MerjaKytö (red.), Corpus linguistics: an international handbook. Vol. 2. Berlin and New York: Mouton De Gruyter, 1212-1248.
[Google Scholar]
Firth, John R.(1957). Papers in Linguistics 1934-1951. Londen: Oxford University Press.
[Google Scholar]
Gries, Stefan Th.(2001). A multifactorial analysis of syntactic variation: Particle movement revisited. Journal of Quantitative Linguistics8(1), 33-50.
[Google Scholar]
Gries, Stefan Th.(2005). Syntactic priming: A corpus-based approach. Journal of Psycholinguistic Research43, 365-399.
[Google Scholar]
Gries, Stefan Th.(2013). 50-something years of work on collocations: what is or should be next. International Journal of Corpus Linguistics18(1), 137-165.
[Google Scholar]
Grondelaers, Stefan, MarcBrysbaert, DirkSpeelman & DirkGeeraerts(2002). Er als accessibility marker: on- en offline evidentie voor een procedurele interpretatie van presentatieve zinnen. Gramma/TTT: Tijdschrift voor Taalwetenschap9(1), 1-22.
[Google Scholar]
Grondelaers, Stefan & DirkSpeelman(2007). A variationist account of constituent ordering in presentative sentences in Belgian Dutch. Corpus Linguistics and Linguistic Theory3, 161-193.
[Google Scholar]
Grondelaers, Stefan, DirkSpeelman, DenisDrieghe, MarcBrysbaert & DirkGeeraerts(2009). Introducing a new entity into discourse: comprehension and production evidence for the status of Dutch er ‘there’ as a higher-level expectancy monitor. Acta Psychologica130 (2), 153-160.
[Google Scholar]
Grondelaers, Stefan, DirkSpeelman & DirkGeeraerts(2002). Regressing on er. Statistical analysis of texts and language variation. In: AnnieMorin & PascaleSébillot (red.), 6ièmes journées internationales d’analyze statistique des données textuelles (6th international conference on textual data statistical analysis). Rennes: Institut National de Recherche en Informatique et en Automatique, 335-346.
[Google Scholar]
Grondelaers, Stefan, DirkSpeelman & DirkGeeraerts(2008). National variation in the use of er ‘there’: regional and diachronic constraints on cognitive explanations. In: GitteKristiansen & RenéDirven (red.), Cognitive sociolinguistics: language variation, cultural models, social systems. Berlin & New York: Mouton de Gruyter, 153-204.
[Google Scholar]
Jaeger, T. Florian(2005). Optional that indicates production difficulty: Evidence from disfluencies. In: Proceedings of DiSS05, Disfluency in Spontaneous Speech Workshop, 103-109. Aix-en-Provence, France.
[Google Scholar]
Lapata, Maria(1999). Acquiring lexical generalizations from corpora: A case study for diathesis alternations. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, Maryland, USA: Association for Computational Linguistics, 397-404.
[Google Scholar]
Levin, Beth(1993). English Verb Classes and Alternations: A preliminary investigation. Chicago, IL: University of Chicago Press.
[Google Scholar]
Lin, Dekang(1998). Automatic retrieval and clustering of similar words. In: Proceedings of the 17th international conference on Computational linguistics. Montreal, Canada, 768-774.
[Google Scholar]
Levshina, Natalia & KrisHeylen(2014). A radically data-driven Construction Grammar: Experiments with Dutch causative constructions. In: RonnyBoogaart, TimothyColleman & GijsbertRutten (red.), Extending the scope of Construction Grammar. Berlin & New York: Mouton de Gruyter, 17-46.
[Google Scholar]
Levy, Roger& T.Florian Jaeger(2007). Speakers optimize information density through syntactic reduction. In: BernharSchölkopf, John C.Platt & ThomasHoffman (red.), Advances in neural information processing systemsVol. 19. Cambridge, MA: MIT Press, 849-856.
[Google Scholar]
Ordelman, Roeland, FranciscaDe Jong, ArjanVan Hessen & HenriHondorp(2007). TwNC: a Multifaceted Dutch News Corpus. ELRA Newsletter12(3–4). <doc.utwente.nl/68090/>
[Google Scholar]
Padó, Sebastian & MirellaLapata(2010). Dependency-based construction of semantic space models. Computational Linguistics33, 161-199.
[Google Scholar]
Pijpops, Dirk(2019). How, why and where does argument structure vary? A usage-based investigation into the Dutch transitive-prepositional alternation. Doctorale dissertatie KU Leuven.
[Google Scholar]
Speelman, Dirk & DirkGeeraerts(2009). Causes for causatives: the case of Dutch doen and laten. In: EveSweetser & TedSanders (red.), Causal categories in discourse and cognition. Berlin & New York: Mouton de Gruyter, 173-204.
[Google Scholar]
SpeelmanDirk, KrisHeylen & StefanGrondelaers (te verschijnen). A bottom-up, data-driven operationalization of semantic classes and predictability in syntactic alternation research. In: Tanja KaroliChristensen & TorbenJuel Jensen (red.), Explanations in Sociosyntax. Cambridge University Press.
[Google Scholar]
StefanowitschAnatol & Stefan Th.Gries(2005). Covarying collexemes. Corpus Linguistics and Linguistic Theory1(1), 1-43.
[Google Scholar]
Tagliamonte, Sali & R.Harald Baayen(2012). Models, forests and trees of York English: Was/were variation as a case study of statistical practice. Language Variation and Change24, 135-178.
[Google Scholar]
Theijssen, Daphne, LouBoves, Hansvan Halteren & NellekeOostdijk(2010). Evaluating automatic annotation: automatically detecting and enriching instances of the dative alternation. Language Resources and Evaluation46, 565-600.
[Google Scholar]
Turney, Peter D. & PatrickPantel(2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research37, 141-188.
[Google Scholar]
Vandenbosch, Antal.2012. Example-based modeling of syntactic alternations. Plenary talk at New Ways of Analyzing Syntactic Variation, November 16th 2012, Radboud University Nijmegen.
[Google Scholar]

/content/journals/10.5117/NEDTAA2020.1.005.SPEE

Schaalvergroting in het syntactische alternantieonderzoek

NedTaal 25, 101 (2020); https://doi.org/10.5117/NEDTAA2020.1.005.SPEE

/content/journals/10.5117/NEDTAA2020.1.005.SPEE

Data & Media loading...

Article Type: Research Article

Keyword(s): Belgian vs. Netherlandic Dutch; computational methods; distributional analysis; existential sentences; syntactic variation

oa Schaalvergroting in het syntactische alternantieonderzoek

Een nieuwe analyse van het presentatieve er met automatisch gegenereerde predictoren

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Leve hun! Waarom hun nog steeds hun zeggen

Tussentaal wordt omgangstaal in Vlaanderen

Expressive markers in online teenage talk

Understanding grammar at the community level requires a diachronic perspective

Goed of fout

Language-specific tendencies towards morphological or syntactic constructions

Feiten en fictie - Taalvariatie in Vlaamse televisiereeksen vroeger en nu

Perceptie van tussentaal in het gesproken Nederlands in Vlaanderen

Connectieven in de rechterperiferie - Een contrastieve analyse van dus en donc in gesproken taal

Expeditie Tussentaal - Leeftijd, identiteit en context in “Expeditie Robinson”