Online storage of BNC documents
All BNC documents received by OUCS in electronic form will be
stored in the directory [NATCORP.DOX], initially on the OUCS VAX,
to which all project members have READ access. Files will be
named by the document number, with an extension indicating the
format (e.g. .SGML for SGML, .TEX for LaTex, .WP5 for
WordPerfect, .ASC for `plain ASCII'). Only the most recent
version of documents will be kept online. When a new document (or
a new version of an existing one) is acquired, a notice will be
sent to all project members by electronic mail.
No special arrangements are envisaged for storage or
circulation of documents in paper form, but all documents
received by OUCS will be numbered and registered in the document
register. The register itself is held online in the file
[NATCORP.DOX]REGISTER, and a copy of its current state is appended to
this document.
Form of document numbers
Document Provenance
Documents are classified primarily by the part of the project
responsible for them, as follows:
- BNC - project wide
- TGA - Task Group A (corpus design)
- TGB - Task Group B (copyright clearance)
- TGC - Task Group C (corpus encoding and storage)
- TGD - Task Group D (corpus enrichment)
- PC - Project Committee
- EC - Exploitation Committee
- AB - Advisory Board
- EXT - external to the project
It may possibly be felt desirable to add additional codes for
documents internal to one of the participants. Thus documents
circulating only within OUCS might be given the prefix OUCS,
those within Longmans LONG etc. Such documents will not however
(by definition) be included in the project's document registry
and are therefore not considered further here.
Type of document
The document number also indicates the kind of document
concerned, as follows:
- A -- meeting agendas
- N -- informal notes e.g. of meetings
- M -- minutes or other formal record of meetings
- P -- published or formally presented papers
- R -- short formal reports other than minutes
- W -- working drafts and proposals
- X -- formal letters, publicity material, newspaper stories
etc.
A full document number thus consists of a 3 or 4 letter prefix
followed by a two digit sequential number. The current document
is a Working draft relating to the project as a whole, and thus
has the prefix BNCW. The minutes of the Project Committee will
have the prefix PCM, and so on. The document number should be
allocated at the time the document is created (i.e. requested or
proposed) by the group responsible for it. Documents are numbered
sequentially within the group, not within group+type. Note that the
earlier numbers may not be in chronological sequence, because I have
numbered things as they turned up rather than in order of date of
composition.
Current Document Register
This is the state of the document register as of May 1st, 1991.
An updated version may be found in the file [NATCORP.DOX]REGISTER
Project Wide Documents
BNCW01Burnard BNC Document
management system1 May 91
BNCW02ClearPlanned uses of the
National Corpus 11 Apr 91
BNCX04Mission statement
BNCP05Consortium Agreement
BNCW06Apr 91Summers The Spoken
Corpus(includes draft transcription scheme for spoken
texts)
BNCW07ClearTask Groups : 11 Apr 91
2. Project Committee documents
PCM01Minutes of the National Corpus Initiative
Consortium Meeting held 29 Aug 90
PCM02Minutes of the Project Committee Meeting held 4
Feb 91
PCR03Progress Reports tabled at PC meeting held 17
Apr91
Advisory Council documents
ACM01Minutes of Advisory Council Meeting held 7 Mar 91
4. Task Group specific documents
Task group A: Corpus Design
TGAN01Report on BNC TG A Meeting of 10 Apr 91
TGAW02Atkins Clear & Ostler
Corpus design criteria
(Pisa Paper)
TGAW03Summers Longman/Lancaster English
Language Corpus - criteria and design(March 1991)
TGAW04Clear Corpus Design Specification
(24 May 91)
TGAW04Clear Written Corpus Design Specification
(2 September 91)
TGAW05Burnage Corpus Design: OUCS Comments
(31 May 91)
TGAW06Leech Comments on the OUP Corpus Design
Document and the OUCS Response
(3 June 91)
TGAW07Crowdy Longman reply to BNC design specification
(14 May 91)
TGAW08Crowdy BNC: Corpus Design Specification: Longman
comments on draft paper
(10 June 91)
TGAM01Agenda for meeting of 5th June
TGAM02Burnage Minutes of meeting of 5th June
(7 June 91)
TGAW09BurnardQueries for meeting of TGA on spoken
texts
(15 August 91)
TGAW10DunlopMatters for discussion at BNC task group A
meeting of 29th August
(28 August 91)
Task Group C: Encoding and Storage
TGCW01BurnardMarkup scheme for
the BNC, 25 April 91
TGCW02LeechBasic grammatical
tagset initial proposal and Penn Treebank Tagset
(summarises mapping of MM's reduced tagset onto the 66
Lancaster word class tag set)
TGCW03CPH Appendix A 13 Jan 89 (Longman's
original proposals for encoding corpus materials)
TGCW04ClearMarkup for the Oxford Pilot
Corpus
TGCW05BurnageDatabase Design Specification
(29 May 91)
TGCW06DunlopText Submission Guidelines
(29 May 91)
TGCW07DunlopEncoding the Oxford Milton
(31 May 91)
TGCM01Agenda for meeting of June 5th
TCGM02Burnard Minutes of BNC Task Group C
Meeting, 5 June
(7 June 91)
TGCW08DunlopText Submisson Guidelines --- Rekeyed
or scanned materials from OUP
(24 June 91)
TGCW09CrowdySpoken Corpus: Selective side discourse
categories and types; Draft spoken corpus transcription scheme
TGCN01DunlopNotes on meeting held at OUCS, 23rd
August
(27 August 91)
TGCW10Dunlop et alBritish National Corpus
development DTD
(22 August, 1991)
TGCW11OUCSMarked up spoken corpus sample
TGCW12OUCSMarked up written corpus sample
(The Wimbledon Poisoner)
TGCW13DunlopUNIX manual page for vm2
parser
(20 August 91)
TGCW14Du BoisTranscription design principles for
spoken discourse research
5 March 91)
Task Group D: Corpus Enrichment
TGDW01LeechBritish National Corpus --- Basic
Grammatical Tagset
(3 September 91)
TGDW01LeechBritish National Corpus --- Basic
Grammatical Tagset
(alternative tags --- 5 September 91)
TGDW02LangendoenProposal for TEI-Conformant
Encoding of Basic Grammatical Tagset
(3 September 91)
TGDW03DunlopMail to task group D
(points for meeting of 5 September 91)
TGDW04DunlopLexical tagging: Position following
meeting of 5th September
(6 September 91)
TGDW05ClearNotes on Basic Grammatical
Tagset
(4 September 91)
TGDA01Agenda for task group D meeting of 5th
September
TGDM01BryantTask Group D -- Corpus Processing ---
Minutes of First Meeting
(5 September 91)
External Documents of related interest
EXTW01LeechCorpus annotation
schemes(Pisa paper; includes Garside and Leech `Running a
Grammar Factory' to be published in Johansson &
Stenstrom)
EXTW02BiberRepresentativeness in
corpus design (Pisa paper)
EXTW03SampsonNeeded - a
grammatical stocktaking (Pisa paper)
EXTW04JohanssonSome thoughts on
the encoding of spoken texts in machine-readable form (Pisa
Paper, including examples of spoken texts)
EXTP05 Leech & JohanssonLOB
Coding Manual (an extract)
EXTX06Doreen King`Corpus dialecti.
Oxford's oceanographers of language set sail'Oxford
Times, 20 Jul 90
EXTX07Brian Keaney `The keeper of the
living language' Guardian, 25 Apr 91
EXTW08And RostaSystem of preparation
and annotation of I.C.E. texts (12 Dec 90)
EXTW09Akiva Quinn and Gerry
NelsonI.C.E. Markup Manual for written texts (March 91)
EXTW10TEI AI 1W2: List of common morphological features
for inclusion in the TEI starter set of grammatical-annotation
tags
(14 June 91)