カテゴリ ジョウホウ ト ジャンル ジョウホウ オ リヨウ シタ BNC コーパス ケンサク システム カイハツ ノ タメ ノ ヨビテキ コウサツ
A Preliminary Study for Developing a Concordance Program of British National Corpus Using Category and Genre Information
Departmental Bulletin Paper
This paper begins with a brief survey of several epoch-making language corpora. The British National Corpus (BNC)， which was completed in 1994 after three years of development，is one of the most representative and reliable corpora in terms of both quality and quantity. The first version of the BNC was limited to EU countries，but the World Edition has been available on CD-ROM since 2000.
In order to develop an efficient concordance program for the BNC World Edition，it is crucially important to conduct a fundamental analysis of the structure of corpus data. The CDIF (Corpus Data Interchange Format)，which was strongly influenced by TEI Guidelines， has been adopted in the BNC. This provides ample contextual information as well as grammatical information about the corpus data. In order to develop my Web-based BNC concordance program，I will refer to this enriched tagged information including Text Classification Codes and David Lee's Genre Classification Scheme.
This paper concludes with a basic strategy to develop my BNC concordance program named 'BNCfinder+'.
LID201105271002.pdf 1.26 MB
Integrated Arts and Sciences