Minutes of BNC Task Group A Meeting, 5th June 1991 OUCS, 13 Banbury Road, Oxford Gavin Burnage 7th June 1991 PRESENT: Michael Bryant, Gavin Burnage, Lou Burnard, Jeremy Clear, Steve Crowdy, Dominic Dunlop. 1. Matters Arising from BNCW02 (Planned Uses of the National Corpus). It was agreed that the BNC is to be a language corpus, not a corpus of texts, and that consequently it will consist of samples extracted from texts rather than texts in full. It was further agreed that BNCW02 should be updated to include (i) explicit specification of the main uses envisaged for the corpus (JC) and (ii) fuller references to literature (suggestions to JC). 2. BNC CD. SC reported that he had contacted the CD manufacturers Nimbus, and that they were interested in producing a CD to contain the spoken and written corpus as well as the sound recordings made for the spoken corpus. The meeting responded enthusiastically to this suggestion, and decided (i) that the matter should be handed on to the Project Committee, who should as soon as possible set up the Exploitation Committee to arrange it, and (ii) that SC should take up the offer to visit Nimbus in Monmouth. 3. Discussion of corpus design documents TGAW03 (Longman/Lancaster criteria and design -- Summers) TGAW04 (Corpus design specification -- Clear) TGAW05 (Corpus Design: OUCS comments -- Burnage) TGAW06 (Lancaster comments -- Leech) TGAW07 (Longman comments - Crowdy) General definitions (BNCW03 section 2) -- Date of texts to be included. No consensus could be reached. SC reiterated that Longman wanted texts dating from 1950; the rest said they wanted texts from a more recent time. It was agreed that the matter should be referred to the Project Committee, and that, in the interim, only work on texts produced after 1985 should be carried out. There was discussion of whether the BNC should be termed a synchronic or a diachronic corpus. It was felt that given the absence of specifically diachronic design features, the corpus could not satisfactorily be termed diachronic. Instead, the notion of the corpus being a synchronic `blurred snapshot' or `fuzzy snapshot' (BNCW05, BNCW06) was accepted. LB and GB said they felt the commitment to covering regional varieties of English to be an important one. Following MB's assertion, it was agreed that the specification of `adult' language should be removed. Production and reception (BNCW03 section 4) SC suggested that the following three categories be added to the list of measures for selecting material: (i) books listed for GCSE study by examination boards, (ii) Prize-winning books, and (iii) recommendations from subject specialists. After discussion it was agreed that many of the different measures listed in section 4.1 and suggested by SC would be useful for identifying texts within the various categories where appropriate. A general strategy for identifying books was agreed as follows: first, a selection of titles is made from a list of books in print on the basis of their subject matter, date, and level. Next, books are chosen from this selection by using the measures in 4.1, and, as the corpus grows, by checking the classification features of the texts already included, with the aim of maintaining a balanced range. It was further agreed that the classification features (BNCW04 section 5.3) are a list of variables for which variability should be maximised, but not set as formal targets. Selection features (BNCW03 section 5.2) -- Domain (BNCW03 section 5.2.1) The meeting agreed the following points. (i) The percentage figures extrapolated from Whitaker should be further rounded off, while still `approximating closely' to the original figures. (ii) The figures relating the proportion of informative texts to imaginative texts should be amended to read `informative 70--80%, imaginative 20--30%'. (iii) The last category on the list of informative domains, Biography, should be dropped, and biographical texts should be classified according to the eight remaining domains. -- Medium (BNCW03 section 5.2.3) SC said the proportion of miscellaneous texts seemed too high; JC said that this was to ensure the corpus did not become too book-based. No resolution was arrived at. -- Classification Features (BNCW03 section 5.3) It was agreed that the year of composition or origin should be added to the list of classification figures. In the light of the written comments and the decisions of the meeting, JC agreed to revise BNCW03 and to circulate it to those concerned. The meeting closed at 17:55.