昨日の記事[2012-10-10-1]に引き続き,The LAEME Corpus の代表性の話題.今回は,語数,より正確には同コーパスで文法情報が付与されている語 (tagged words) の数により,方言・時代ごとの代表性を考える.まず,表を掲げよう.
Table 2: Dialectal and Diachronic Distribution of Linguistic Evidence by Number of Tagged Words
C12b | C13a | C13b | C14a | Total | |
---|---|---|---|---|---|
N | 0 (0.000%) | 362 (0.062) | 0 (0.000) | 52,883 (9.083) | 53,245 (9.146) |
NEM | 11,342 (1.948) | 0 (0.000) | 3,980 (0.684) | 2,344 (0.403) | 17,666 (3.034) |
NWM | 0 (0.000) | 58,332 (10.019) | 16,173 (2.778) | 0 (0.000) | 74,505 (12.797) |
SEM | 40,082 (6.885) | 26,722 (4.590) | 21,921 (3.765) | 31,408 (5.395) | 120,133 (20.634) |
SWM | 1,030 (0.177) | 90,400 (15.527) | 106,981 (18.375) | 108 (0.019) | 198,519 (34.098) |
SW | 1,168 (0.201) | 2,610 (0.448) | 46,032 (7.907) | 30,517 (5.242) | 80,327 (13.797) |
SE | 0 (0.000) | 4,043 (0.694) | 3,199 (0.549) | 30,561 (5.249) | 37,803 (6.493) |
Total | 53,622 (9.210) | 182,469 (31.341) | 198,286 (34.058) | 147,821 (25.390) | 582,198 (100.000) |
Powered by WinChalow1.0rc4 based on chalow