zum Inhalt springen

How information density influences NP extraposition in early New High German

Sophia Voigtmann

Saarland University, Germany

This study proposes a new approach to the explanation of NP extraposition in New High German (1650-1900). Instead of length (Sapp 2014) or givenness (Light 2011), we claim that processing difficulties, measured as Shannon Surprisal (P(word) =-log2(word|context), Shannon 1948), cause extraposition. NPs with high surprisal values are, thus, more likely to be extraposed.

To find evidence for this hypothesis, we manually built a corpus of extraposed and embedded NPs out of medical and theological texts from 1650 to 1900 taken from the Deutsches Textarchiv (BBAW 2019). We calculated the mean 2-Skip-Bigram surprisal (Guthrie et al. 2016) on lemmata for every annotated NP. Other factors for the analysis are length, genre (medical vs. theological), Orality Score (Ortmann & Dipper 2022), and the time of publication. To determine the most influential factor for extraposition, logistic regression is performed (The R Core Team 2022).

Extraposition is indeed linked to high surprisal (z=-2.44, p<.05 *) while the length is not significant (z=-0.48, p<0,63). However, the genre (z=-2.58, p<.001**) and the interaction between Orality Score and the period (z=-2.68, p<.001**) are even better predictors. We conclude there are more processing capacities available behind the right sentence bracket (Speyer 2015) because the valency of the main verb is eventually processed at this point. Thus, there are more capacities available to process lexical instead of syntactic difficulties. Furthermore, we detect indications of language change and an influence of the genre, suggesting a difference in writing style.

References
  • BBAW (2019). Deutsches Textarchiv. Grundlage für ein Referenzkorpus der neuhochdeutschen Sprache. Berlin-Brandenburgische Akademie der Wissenschaften; http://www.deutschestextarchiv.de/. [last accessed: 2023-01-19]
  • Eckart de Castilho, R., Mújdricza-Maydt, É., Yimam, S.M., Hartmann, S., Gurevych, I., Frank, A. and Biemann, C. (2016): A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures. In Proceedings of the LT4DH workshop at COLING 2016, Osaka, Japan.
  • Guthrie, D., B. Allison, W. Liu, L. Guthrie, and Y. Wilks (2006). A closer look at skip-gram modelling. Proceedings of the Fifth International Conference on Language Resources and Evaluation.
  • Ortmann, K. and S. Dipper (2022). Coast (conceptual orality analysis and scoring tool). https://github.com/rubcompling/COASTcoast-conceptualorality-analysis-and-scoring-tool [last accessed: 2023-01-18]
  • R Core Team (2022). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
  • Sapp, C. D. (2014). Extraposition in Middle and New High German. The Journal of Comparative Germanic Linguistics 17(2), 129–156.
  • Shannon, C. E. (1948). A mathematical theory of communication. The Bell Sytsem Technical Journal 27(3), 379 – 423
  • Speyer, A. (2015). Auch früher wollte man informieren – Zum Einfluss der Informationsstruktur auf die Syntax in der Geschichte des Deutschen. Zeitschrift für
  • germanistische Linguistik 43(3), 485–515.
*