[bnc] Storage and Documentation of Texts - Designing and Creating the BNC

Storage and Documentation of Texts

The last stage in creating the corpus was to add detailed descriptive information to each text, in the form of a header, and to validate the SGML structure of the whole. Some hand editing was necessary to correct small SGML errors, but no more than 5-10% of the texts had to be altered.

Header information was added to each text in the corpus from the BNC database, giving information specific to each text, such as the author's name, or the location where a conversation was recorded. These headers are intended for use by computer programs rather than human beings, but their basic content is fairly comprehensible.

You can see a sample BNC text here.

To understand all the details of the XML tagging here, you need to refer to the Users Reference Guide, but the basic features should be fairly self-explanatory.

Up: Contents

Creating the BNC
Creation stage
Permissions Clearance
Collection of Texts
Making electronic texts
Encoding of Texts
Linguistic Annotation of Texts ("tagging")
Storage and Documentation of Texts