corpus / hellog～英語史ブログ

最終更新時間: 2026-02-06 10:29

2010-04-24 Sat

■ #362. 英語例文検索 EReK [corpus][kwic][web_service]

　今日は軽くウェブ上のコンコーダンサーを紹介．英語例文検索 EReK は「英語で書かれたウェブページのテキストを巨大な例文集（コーパス）とみなし，それを検索するサイト」．Yohoo! の Web API が利用されている．出力は KWIC ( Key Word in Context ) で，百数十の例文が表示される．各コンコーダンス・ラインから，ワンクリックでソースに飛ぶことができるのも便利．また，キーワード前後の語での並べ替え機能や，検索対象を .edu ドメインやニュースサイトに限定するオプションも装備されている．「ウェブ上の文書なので正確な表現である保証はありません」と但し書きがあるが，Web上の手軽なコンコーダンサーとして利用価値はありそうだ．
　時々刻々と変化するウェブ・リソースを検索対象とするので一種の monitor corpus とも考えられ，時事を反映した出力が期待できる．例えば，2010年4月24日現在，ニュースサイト限定検索 "volcano" とやれば Iceland や Icelandic と共起するコンコーダンス・ラインが大量に得られる．( see [2010-04-20-1]. )
　姉妹版で日本語版の JReK もあり，こちらは日本語の文章書きに効果を発揮しそう．

Referrer (Inside): [2010-05-15-1]

item	LOB: rate (freq)		FLOB: rate (freq)
item	-ise	-ize	-ise	-ize
recognise	59.6% (99)	40.4% (67)	71.8% (127)	28.2% (50)
realise	63.2% (134)	36.8% (78)	68.7% (125)	31.3% (57)
organise	65.6% (42)	34.4% (22)	67.2% (43)	32.8% (21)
emphasise	37.7% (20)	62.3% (33)	62.9% (39)	37.1% (23)
criticise	52.0% (13)	48.0% (12)	80.0% (24)	20.0% (6)
characterise	0.0% (0)	100.0% (4)	56.3% (18)	43.8% (14)
summarise	35.3% (6)	64.7% (11)	64.7% (11)	35.3% (6)
specialise	56.3% (18)	43.8% (14)	81.8% (27)	18.2% (6)
apologise	68.8% (11)	31.3% (5)	70.6% (12)	29.4% (5)
advertise	100.0% (41)	0.0% (0)	100.0% (55)	0.0% (0)
authorise	77.4% (24)	22.6% (7)	68.2% (15)	31.8% (7)
minimise	90.0% (9)	10.0% (1)	80.0% (16)	20.0% (4)
surprise	100.0% (182)	0.0% (0)	100.0% (173)	0.0% (0)
supervise	100.0% (10)	0.0% (0)	100.0% (9)	0.0% (0)
utilise	70.0% (7)	30.0% (3)	83.3% (5)	16.7% (1)
maximise	50.0% (2)	50.0% (2)	50.0% (9)	50.0% (9)
symbolise	50.0% (3)	50.0% (3)	40.0% (4)	60.0% (6)
mobilise	66.7% (2)	33.3% (1)	20.0% (1)	80.0% (4)
stabilise	58.3% (7)	41.7% (5)	33.3% (3)	66.7% (6)
publicise	81.8% (9)	18.2% (2)	84.6% (11)	15.4% (2)

Rank	raw frequency	observed/expected	t-score	z-score	log-likelihood	MI	MI3
1	little	15-year-old	little	little	little	15-year-old	little
2	young	16-year-old	young	young	young	16-year-old	young
3	that	dark-haired	good	15-year-old	good	dark-haired	good
4	this	13-year-old	that	dark-haired	clever	13-year-old	clever
5	good	nine-year-old	this	16-year-old	poor	nine-year-old	pretty
6	one	14-year-old	old	clever	pretty	14-year-old	that
7	old	four-year-old	poor	pretty	old	four-year-old	15-year-old
8	other	year-old	other	teenage	that	year-old	dark-haired
9	poor	clever	clever	13-year-old	beautiful	clever	poor
10	clever	teenage	one	nine-year-old	lovely	teenage	16-year-old
11	beautiful	blonde	pretty	four-year-old	golden	blonde	this
12	pretty	pretty	beautiful	head	nice	pretty	old
13	small	head	nice	14-year-old	15-year-old	head	beautiful
14	any	little	lovely	poor	teenage	little	teenage
15	nice	wee	big	blonde	dark-haired	wee	lovely
16	big	eldest	small	good	head	eldest	head
17	another	brave	golden	golden	16-year-old	brave	golden
18	lovely	golden	tall	beautiful	tall	golden	nice
19	new	silly	dear	lovely	this	silly	tall
20	golden	young	teenage	year-old	dear	young	blonde

item	-ise rate (freq)	-ize rate (freq)	-ise + -ize
recognise	61.1% (9143)	38.9% (5812)	14955
realise	63.2% (9442)	36.8% (5492)	14934
organise	62.3% (5540)	37.7% (3359)	8899
emphasise	60.0% (2998)	40.0% (1998)	4996
criticise	54.9% (2054)	45.1% (1688)	3742
characterise	52.2% (1398)	47.8% (1278)	2676
summarise	61.4% (1164)	38.6% (731)	1895
specialise	70.7% (1163)	29.3% (481)	1644
apologise	68.8% (1084)	31.2% (492)	1576
advertise	99.5% (1542)	0.5% (7)	1549
authorise	64.5% (987)	35.5% (543)	1530
minimise	65.4% (984)	34.6% (521)	1505
surprise	99.9% (1345)	0.1% (1)	1346
supervise	99.8% (1303)	0.2% (3)	1306
utilise	68.9% (798)	31.1% (360)	1158
maximise	63.2% (719)	36.8% (418)	1137
symbolise	49.2% (324)	50.8% (334)	658
mobilise	45.5% (286)	54.5% (342)	628
stabilise	53.5% (334)	46.5% (290)	624
publicise	69.4% (419)	30.6% (185)	604

rhinoceroses	13
rhinos	100

octopuses	29
octopi	11
octopodes	4

corpus - hellog～英語史ブログ

■ #362. 英語例文検索 EReK [corpus][kwic][web_service]

■ #354. COLT：ロンドンの十代の若者話し言葉コーパス [corpus][colt][lexicology][syllable]

■ #330. Cobuild Concordance and Collocations Sampler [corpus][bnc][cobuild][collocation]

■ #317. 拙著で自分マイニング（キーワード編） [text_tool][flob][corpus][keyword]

■ #314. -ise か -ize か (2) [spelling][bre][bnc][lob][flob][corpus][suffix][z]

■ #311. girl とよく collocate する形容詞は何か [corpus][collocation][bnc]

■ #310. PPCMBE で広がる英語統語論の通時研究 [corpus][ppcmbe][syntax]

■ #308. 現代英語の最頻英単語リスト [lexicology][corpus][link][academic_word_list][alphabet][frequency][statistics][letter_frequency]

■ #307. コーパス利用の注意点 [corpus][link]

■ #305. -ise か -ize か [spelling][ame_bre][bnc][corpus][suffix][z]

■ #271. 語彙研究ツールとしての辞書とコーパス [dictionary][corpus][methodology][lexicology]

■ #246. 男性着は「メンズ」だが，女性着は？ [japanese_english][link][corpus]

■ #161. rhinoceros の複数形 [plural][etymology][bnc][corpus][clipping][drift]

■ #121. octopus の複数形 [plural][greek][bnc][corpus]

■ #78. Verbix とコーパス [software][web_service][conjugation][inflection][oe][me][corpus][variation]