Tekstgenres analyseren op lexicale complexiteit met T‑Scan

Henk Pander Maat; Nick Dekker

doi:10.5117/TVT2016.3.PAND

ISSN: 1573-9775
E-ISSN: 2352-1236

oa Tekstgenres analyseren op lexicale complexiteit met T‑Scan
Authors: Henk Pander Maat¹ & Nick Dekker²
View Affiliations Hide Affiliations

Affiliations: ¹ Henk Pander Maat is senior-onderzoeker bij het Utrecht Institute of Linguistics OTS aan de Universiteit Utrecht. ² Nick Dekker studeerde in Utrecht Nederlands en Communicatie en organisatie; hij is nu webredacteur bij Vitens.
Publisher: Amsterdam University Press
Source: Tijdschrift voor Taalbeheersing, Volume 38, Issue 3, Dec 2016, p. 263 - 304
DOI: https://doi.org/10.5117/TVT2016.3.PAND
Language: English

Abstract

Using T-Scan to analyse the lexical complexity of text genres

T-Scan is a tool for the automatic analysis of Dutch text. This paper presents the first large-scale corpus analysis with T-Scan, focusing on lexical complexity. A collection of nearly 1000 text specimens was assembled, containing ten genres: travel blogs, celebrity news features, novels, textbooks for vocational secondary schools, textbooks for general secondary schools, news reports, opinion pieces, political programs, medical advice texts and research articles. The lexical complexity features in the analysis include morphology, word frequency, various word concreteness indices, personal pronouns, names and verb tense. Systematic genre differences are found, such that a genre detection model comprising 18 T-Scan features correctly identifies 83 percent of the corpus texts. Most lexical features differentiating genres intuitively relate to text topic complexity. A closer analysis is offered of the contrast between the two textbook samples in the corpus, which differ only in the educational levels they cater for. Again, topic variation seems a more important factor than stylistic variation. We demonstrate a new method to examine stylistic variation, which consists of within-genre comparisons using the genre prediction; more specifically, ‘deviant’ texts are compared to ‘typical’ members of their genre.

Article metrics loading...

/content/journals/10.5117/TVT2016.3.PAND

2016-12-01

2024-11-09

Full text loading...

/deliver/fulltext/15739775/38/3/03_TVT2016.3.PAND.html?itemId=/content/journals/10.5117/TVT2016.3.PAND&mimeType=html&fmt=ahah

/content/journals/10.5117/TVT2016.3.PAND

Tekstgenres analyseren op lexicale complexiteit met T‑Scan

TvT 38, 263 (2016); https://doi.org/10.5117/TVT2016.3.PAND

/content/journals/10.5117/TVT2016.3.PAND

Data & Media loading...

Article Type: Research Article

Keyword(s): automatic text analysis; corpus research; lexical complexity; readability; stylistic variation

oa Tekstgenres analyseren op lexicale complexiteit met T‑Scan

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Over loodzware bassen en wapperende broekspijpen - Gebruik en perceptie van taalintensiverende stijlmiddelen

Leesbaarheidsonderzoek: oude problemen, nieuwe kansen

Aanspreekvormen in Nederlandstalige banneradvertenties

Drie Nederlandse instrumenten voor het automatisch voorspellen van begrijpelijkheid - Een klein consumentenonderzoek

De potentiële impact van informele online communicatie op de spellingpraktijk van Vlaamse tieners in schoolcontext

“Wat vervelend dat de fiets niet is opgeruimd! Heb je een zaaknummer voor mij? ^EK”

‘Jouw keuze’ als frame - Media en voorlichting over bevalling

Fatale spelfouten?

Taalaccommodatie in online datingprofielen

Meten is weten? Over de waarde van de leesbaarheidsvoorspellingen van drie geautomatiseerde Nederlandse meetinstrumenten