corpus / hellog～英語史ブログ

最終更新時間: 2026-07-15 01:27

2013-04-06 Sat

■ #1440. 音節頻度ランキング [syllable][corpus][lexicon][phonetics][frequency][statistics]

　「#1424. CELEX2」 ([2013-03-21-1]) で紹介した巨大データベースで何かしてみようと考え，Version 2 で新たに加えられた音節頻度 (English Frequency, Syllables) のサブデータベースにより，現代英語で最も多い音節タイプのランキングを得た．
　これは，CELEX2 のもとになっているコーパス全体のうち，7.26%を構成する約130万語の話し言葉サブコーパスから引き出された音節頻度であり，タイプ頻度ではなくトークン頻度によるものである．つまり，話し言葉におけるある単語の頻度が高ければ，その分，その単語に含まれる音節タイプの頻度も高くなるということである．例えば，of を構成する "Ov" (= /ɒv/) と表現される音節は，第4位の頻度である．なお，強勢の有無は考慮せずに頻度を数えている．
　以下のリストに挙げる音素表記は，IPA ではなく CELEX 仕様の独特の表記なので，先に対応表を挙げておこう．

CELEX2 Phonetic Character Set

　では，以下にランキング表でトップ50位までを掲載する．高頻度の単音節語の音節タイプがそのまま上位に反映されていて，あまりおもしろい表ではないが，何かの役に立つときもあるかもしれない．

Rank Syllable Frequency

1 eI 72971

2 Di: 60967

3 tu: 31446

4 Ov 30108

5 In 29906

6 &nd 28709

7 aI 23822

8 lI 19728

9 @ 19566

10 rI 14356

11 ju: 12598

12 dI 12465

13 D&t 12118

14 It 11504

15 wOz 10834

16 fO:r* 9778

17 Iz 9517

18 tI 9161

19 fO 9042

20 Sn, 8969

21 hi: 8928

22 r@n 8638

23 bi: 8505

24 bI 7936

25 nI 7068

26 wID 7046

27 On 7030

28 &z 6919

29 O:l 6569

30 h&d 6240

31 E 6165

32 bl, 6021

33 sI 5836

34 @U 5824

35 t@r* 5687

36 &t 5652

37 hIz 5564

38 bVt 5416

39 mI 5397

40 s@ 5391

41 nOt 5357

42 D@r* 5339

43 I 5283

44 tId 5259

45 DeI 5162

46 IN 5063

47 t@ 5053

48 s@U 4974

49 baI 4894

50 h&v 4769

　全ランキング表を見たい方は，タブ区切り形式で Syllable Frequency Rank Table by CELEX2 を参照．ブラウザ上で閲覧したい方は，こちらからどうぞ．全体としては11492の異なる音節タイプが登録されており，頻度が1以上のものは7934タイプある．「#1023. 日本語の拍の種類と数」 ([2012-02-14-1]) の最後で，英語の音節タイプが日本語に比べて驚くほど多種多様であることに触れたが，この数をみれば納得できるだろう．関連して，syllable の各記事を参照．
　なお，CELEX2 のマニュアルには以下の但し書きが記されていたので，再掲しておく．

Rank	Syllable	Frequency
1	eI	72971
2	Di:	60967
3	tu:	31446
4	Ov	30108
5	In	29906
6	&nd	28709
7	aI	23822
8	lI	19728
9	@	19566
10	rI	14356
11	ju:	12598
12	dI	12465
13	D&t	12118
14	It	11504
15	wOz	10834
16	fO:r*	9778
17	Iz	9517
18	tI	9161
19	fO	9042
20	Sn,	8969
21	hi:	8928
22	r@n	8638
23	bi:	8505
24	bI	7936
25	nI	7068
26	wID	7046
27	On	7030
28	&z	6919
29	O:l	6569
30	h&d	6240
31	E	6165
32	bl,	6021
33	sI	5836
34	@U	5824
35	t@r*	5687
36	&t	5652
37	hIz	5564
38	bVt	5416
39	mI	5397
40	s@	5391
41	nOt	5357
42	D@r*	5339
43	I	5283
44	tId	5259
45	DeI	5162
46	IN	5063
47	t@	5053
48	s@U	4974
49	baI	4894
50	h&v	4769

Please note that the English corpus used by CELEX for deriving these frequencies contains only 7.3% spoken material. This means there is a rather tenuous relationship between the full frequency figures, which are based on written forms, and the syllable frequencies, which merely refer to phonemic conversions of these graphemic transcriptions. Of course it could be argued that frequencies of syllables, as lexical sub-units, are less liable to get skewed from differences in medium than full words, but it has to be taken into account that NO FIRM EVIDENCE ABOUT SPOKEN FREQUENCIES can be derived from these data.

Period	Tokens	Wordcount
E1 (1500--1569)	13	567,795
E2 (1570--1639)	18	628,463
E3 (1640--1710)	21	541,595
Total	52	1,737,853

	shew 系列	show 系列	総語数
1700--1769	80	25	298,764
1770--1839	79	86	368,804
1840--1914	17	162	281,327

PERIOD	nn	n	ne	x	xe	xn	xte	hn	he	tn	tx	txn	txe	ths	s	e	yn	zn	Sum
C12b	18	1	2	7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	28
C13a	23	4	19	6	4	4	0	9	14	0	1	0	1	0	0	0	0	0	85
C13b	20	3	23	2	1	3	4	1	0	0	0	1	0	2	1	1	1	1	64
C14a	5	13	28	9	2	2	0	0	0	3	1	0	0	0	0	1	0	0	64
Sum	66	21	72	24	7	9	4	10	14	3	2	1	1	2	1	2	1	1	241

DIALECT	nn	n	ne	x	xe	xn	xte	hn	he	tn	tx	txn	txe	ths	s	e	yn	zn	Sum
N	0	0	1	9	2	2	0	0	0	0	1	0	0	0	0	0	0	0	15
NEM	14	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	14
NWM	7	0	6	0	0	0	0	8	14	0	0	0	0	2	0	0	0	0	37
SEM	14	20	9	5	0	0	0	0	0	3	0	1	0	0	0	0	0	0	52
SWM	31	1	26	7	5	7	0	2	0	0	1	0	1	0	1	0	1	1	84
SW	0	0	16	3	0	0	4	0	0	0	0	0	0	0	0	1	0	0	24
SE	0	0	14	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	15
Sum	66	21	72	24	7	9	4	10	14	3	2	1	1	2	1	2	1	1	241

	nm	nn	n	ne	x	xe	xn	xt	xte	xst	xts	h	hn	hnn	he	s	e	i	Sum
O2	14	1	0	0	13	0	1	0	0	0	0	31	4	0	0	0	0	0	64
O3	5	22	16	0	56	0	0	0	0	0	0	48	0	0	0	0	0	0	147
O4	1	15	3	0	22	0	0	0	0	0	0	3	0	0	0	0	0	0	44
M1	0	28	4	8	13	0	1	0	0	0	0	0	4	1	9	0	0	0	68
M2	0	1	5	32	1	1	1	0	0	0	0	0	0	0	0	1	0	0	42
M3	0	0	4	31	24	18	0	1	0	4	0	0	0	0	0	0	1	0	83
M4	0	0	4	11	25	6	2	1	6	0	0	0	0	0	0	0	0	1	56
E1	0	0	12	66	2	0	0	25	3	0	0	0	0	0	0	0	0	0	108
E2	0	0	23	44	0	0	0	31	6	0	0	0	0	0	0	0	0	0	104
E3	0	0	54	8	0	0	0	14	0	0	1	0	0	0	0	0	0	0	77
Sum	20	67	125	200	156	25	5	72	15	4	1	82	8	1	9	1	1	1	793

Year	British government		Non-British government
Year	Singular	Plural	Singular	Plural
1930	3	15	12	3
1935	2	13	1	12
1940	2	14	4	2
1945	2	7	2	2
1950	1	26	26	0
1955	2	2	8	0
1960	0	23	8	0
1965	1	13	4	1
Total	13	113	65	18

(* = 5%; ~ = less than 2.5%)		CONV	FICT	NEWS	ACAD
independent clause	wh-question	****	*******	*********	**********
	yes/no-question	*****	*****	*******	*******
	alternative question	~	~	~	~
	declarative question	**	*	~	~
fragments	wh-question	*	**	**	*
fragments	other	***	***	*	*
tag	positive	*	~	~	~
tag	negative	****	*	~	~

	not/n't	other negative forms
CONV	19500	2500
FICT	9500	4000
NEWS	4500	2000
ACAD	3500	1500

DIALECT	nn	n	ne	x	xe	xn	xte	hn	he	tn	tx	txn	txe	ths	s	e	yn	zn	Sum
N	0	0	1	9	2	2	0	0	0	0	1	0	0	0	0	0	0	0	15
NEM	14	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	14
NWM	7	0	6	0	0	0	0	8	14	0	0	0	0	2	0	0	0	0	37
SEM	14	20	9	5	0	0	0	0	0	3	0	1	0	0	0	0	0	0	52
SWM	31	1	26	7	5	7	0	2	0	0	1	0	1	0	1	0	1	1	84
SW	0	0	16	3	0	0	4	0	0	0	0	0	0	0	0	1	0	0	24
SE	0	0	14	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	15
Sum	66	21	72	24	7	9	4	10	14	3	2	1	1	2	1	2	1	1	241

	nm	nn	n	ne	x	xe	xn	xt	xte	xst	xts	h	hn	hnn	he	s	e	i	Sum
O2	14	1	0	0	13	0	1	0	0	0	0	31	4	0	0	0	0	0	64
O3	5	22	16	0	56	0	0	0	0	0	0	48	0	0	0	0	0	0	147
O4	1	15	3	0	22	0	0	0	0	0	0	3	0	0	0	0	0	0	44
M1	0	28	4	8	13	0	1	0	0	0	0	0	4	1	9	0	0	0	68
M2	0	1	5	32	1	1	1	0	0	0	0	0	0	0	0	1	0	0	42
M3	0	0	4	31	24	18	0	1	0	4	0	0	0	0	0	0	1	0	83
M4	0	0	4	11	25	6	2	1	6	0	0	0	0	0	0	0	0	1	56
E1	0	0	12	66	2	0	0	25	3	0	0	0	0	0	0	0	0	0	108
E2	0	0	23	44	0	0	0	31	6	0	0	0	0	0	0	0	0	0	104
E3	0	0	54	8	0	0	0	14	0	0	1	0	0	0	0	0	0	0	77
Sum	20	67	125	200	156	25	5	72	15	4	1	82	8	1	9	1	1	1	793

corpus - hellog～英語史ブログ

■ #1440. 音節頻度ランキング [syllable][corpus][lexicon][phonetics][frequency][statistics]

■ #1428. ye = the [palaeography][spelling][thorn][th][pub][alphabet][graphemics][ppcme2][ppceme][ppcmbe][corpus]

■ #1424. CELEX2 [corpus][dictionary][statistics][frequency][lexicology]

■ #1423. 初期近代英語の3複現の -s (2) [verb][conjugation][emode][corpus][ppceme][ppcbme][number][agreement][analogy][3pp]

■ #1417. 群属格の発達 [genitive][clitic][synthesis_to_analysis][metanalysis][corpus][ppcme2][syntax]

■ #1416. shew と show (2) [spelling][corpus][ppcmbe][johnson][pronunciation_spelling]

■ #1415. shew と show (1) [spelling][phonetics][corpus][hc][diphthong]

■ #1413. 初期近代英語の3複現の -s [verb][conjugation][emode][corpus][ppceme][number][agreement][analogy][3pp]

■ #1399. 初期中英語における between の異形態の分布 [laeme][corpus][preposition][me_dialect][methodology]

■ #1394. between の異形態の分布の通時的変化 [hc][corpus][preposition]

■ #1356. 20世紀イギリス英語での government の数の一致 [bre][number][agreement][noun][syntax][corpus]

■ #1355. 20世紀イギリス英語で集合名詞の単数一致は増加したか？ [bre][number][agreement][noun][syntax][corpus][americanisation]

■ #1346. 付加疑問はどのくらいよく使われるか？ [interrogative][tag_question][ame_bre][corpus][frequency][statistics]

■ #1325. 会話で否定形が多い理由 [corpus][negative][frequency]

■ #1323. Helsinki Corpus の COCOA 検索 [cgi][web_service][hc][corpus]

■ #1322. ANC Frequency Extractor [cgi][web_service][frequency][corpus][anc]

■ #1321. BNC Frequency Extractor [cgi][web_service][frequency][corpus][bnc]

■ #1307. most と mest [analogy][superlative][vowel][me_dialect][corpus][hc][ppcme2][comparison]

■ #1305. 統語タグのついた Google Books Ngram Corpus [corpus][google_books][ame_bre]

■ #1283. 共起性の計算法 [corpus][statistics][bnc][collocation][lltest]

DIALECT	nn	n	ne	x	xe	xn	xte	hn	he	tn	tx	txn	txe	ths	s	e	yn	zn	Sum
N	0	0	1	9	2	2	0	0	0	0	1	0	0	0	0	0	0	0	15
NEM	14	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	14
NWM	7	0	6	0	0	0	0	8	14	0	0	0	0	2	0	0	0	0	37
SEM	14	20	9	5	0	0	0	0	0	3	0	1	0	0	0	0	0	0	52
SWM	31	1	26	7	5	7	0	2	0	0	1	0	1	0	1	0	1	1	84
SW	0	0	16	3	0	0	4	0	0	0	0	0	0	0	0	1	0	0	24
SE	0	0	14	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	15
Sum	66	21	72	24	7	9	4	10	14	3	2	1	1	2	1	2	1	1	241

	nm	nn	n	ne	x	xe	xn	xt	xte	xst	xts	h	hn	hnn	he	s	e	i	Sum
O2	14	1	0	0	13	0	1	0	0	0	0	31	4	0	0	0	0	0	64
O3	5	22	16	0	56	0	0	0	0	0	0	48	0	0	0	0	0	0	147
O4	1	15	3	0	22	0	0	0	0	0	0	3	0	0	0	0	0	0	44
M1	0	28	4	8	13	0	1	0	0	0	0	0	4	1	9	0	0	0	68
M2	0	1	5	32	1	1	1	0	0	0	0	0	0	0	0	1	0	0	42
M3	0	0	4	31	24	18	0	1	0	4	0	0	0	0	0	0	1	0	83
M4	0	0	4	11	25	6	2	1	6	0	0	0	0	0	0	0	0	1	56
E1	0	0	12	66	2	0	0	25	3	0	0	0	0	0	0	0	0	0	108
E2	0	0	23	44	0	0	0	31	6	0	0	0	0	0	0	0	0	0	104
E3	0	0	54	8	0	0	0	14	0	0	1	0	0	0	0	0	0	0	77
Sum	20	67	125	200	156	25	5	72	15	4	1	82	8	1	9	1	1	1	793