Early evidence of the Pareto principle in grammatical distribution: Causative situations in Chinese conversational discourse
语法呈帕累托分布的早期证据:现代汉语自然会话中致使情景的语法构式分布

Abstract 摘要

This study is an initial report on Pareto distribution (the 80/20 rule) of grammatical constructions; namely, about 20% of the types of grammatical constructions for causative situations account for about 80% of the uses in conversation. I use a data-driven approach to investigate the grammatical constructions that Chinese L1 speakers choose in spontaneous talk show conversations to describe causative situations. I identify two specific Pareto distributional patterns. 1) The distribution of all 22 constructions for causative situations constitutes a Pareto ABC diagram with the A-class (ba-; unmarked passive; rang-; bei-; resultative; gei-) containing 27.3% of the types but accounting for 88.8% of all the 1,497 uses. 2) Most uses of a grammatical construction come from a small set of subtypes: The full ba– accounts for 87.9% of all ba– uses; the reduced bei– accounts for 86.8%; 37.5% of rang– subtypes account for 84.2%. These patterns can be explained by the Lens concept. I conclude that a few constructions account for most grammatical choices of L1 Chinese speakers in conversation. Understanding these grammatical distributions in natural discourse can improve the efficiency and efficacy of language teaching and Natural Language Processing (NLP).

本研究是关于自然会话中语法构式的帕累托(Pareto)分布(二八法则)的第一份报告——大约20%的语法构式类型占表述致使情景的所有实际用例的80%。基于脱口秀自然会话语料,本文使用数据驱动的方法穷尽式地探究汉语母语者选择何种语法构式表述会话中的致使情景。本文关于帕累托分布的具体发现是:(一)会话中表述致使情景的所有22种汉语语法构式的分布反映了帕累托原理及其ABC等级分布。A级的构式类型数量为22种构式类型的27.3%,却占到所有1,497条用例的88.8%。A级包括的最高频构式依次是:把字句、无标记被动句、让字句、被字句、结果补语、给字句。B级的构式类型数量同样占27.3%,却仅占所有用例的8.9%。C级的构式类型数量占了近一半(45.5%),却只占所有用例的2.3%。(二)语法构式的大多数用例来自个别子类型:完整版把字句占所有把字句用例的87.9%;减短版被字句占所有被字句用例的86.8%;37.5%的让字句类型占所有让字句用例的84.2%。Lens理论可以解释这些分布规律。本文结论是,汉语母语者在自然会话中选用少数构式类型来表述绝大部分致使情景。该发现进一步揭示了自然话语中语法构式的分布,这对语言教学和自然语言处理具有直接参考价值。

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
en_USEN