#568. コーパスの定義と英語コーパス入門

2010-11-16

　言語研究における corpus 「コーパス」は様々に定義されているが，McEnery et al. の定義が簡潔である．

. . . a corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language or language variety.

　(1) と (2) についてはおよそ研究者間にコンセンサスがあるが，(3) と (4) については何をもって "sampled" あるいは "representative" とみなすかについて様々な意見がある．しかし，大筋においてこの定義を受け入れることができるだろう．
　手軽に英語コーパスを試すには，オンラインのものが便利である．以下は，（登録の必要なものもあるが）オンラインで簡便に利用できる英語コーパス．

　・ British National Corpus （いくつかのインターフェースが提供されている）

　　* BNC ( The British National Corpus )
　　* BNCweb （要無料登録）
　　* BYU-BNC （要無料登録）

　・ BYU Corpora （ Brigham Young University, Mark Davies 提供のその他のオンラインコーパス群）

　　* COCA ( Corpus of Contemporary American English ) （要無料登録）
　　* COHA ( Corpus of Historical American English ) （要無料登録）
　　* TIME Magazine Corpus of American English （要無料登録）

　・ Cobuild Concordance and Collocations Sampler

　その他，本ブログではコーパス関係の記事をいろいろと掲載しているので，参考にされたい．

　・ hellog 内のコーパス情報の集約記事: [2010-09-15-1]
　・ hellog 内のコーパス関連記事: corpus
　・ hellog 内の BNC 関連記事: bnc

　・ McEnery, Tony, Richard Xiao, and Yukio Tono. Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge, 2006.

Referrer (Inside): [2019-05-21-1] [2016-12-05-1] [2012-10-28-1] [2012-10-26-1] [2012-10-10-1] [2010-12-26-1] [2010-12-25-1]

[ ツイート | 固定リンク | 印刷用ページ ]

#568. コーパスの定義と英語コーパス入門[corpus][link][representativeness]