add this bookmarking tool

BNC XML Edition

The BNC XML Edition is the latest version of the British National Corpus (for a general presentation of the corpus, see the What is the BNC? page). With a few exceptions, the texts are the same as in the previous BNC World edition. The main differences between this version of the corpus and the BNC World are:
  • errors - known errors and inconsistencies have been corrected
  • lemma information - added to each word to allow searches for lemma collocations with lemmas
  • simplified part-of-speech information added - allowing searches for 'all verbs', 'any noun' etc.
  • multi-word units: in response to popular demand, all items inside multi-word units have been assigned part-of-speech tags as well
  • format - the new version is in XML, which means it is easier to use with different tools and also makes viewing the texts easier.
More detailed information about BNC XML is available in the Reference Guide for the British National Corpus (XML Edition).

The new XML format makes the corpus usable with many other software tools, including even simple web browsers. For searching and concordancing purposes however we recommend use of a proper XML-aware search engine, such as XAIRA (http://www.xaira.org) which was developed specifically for use with the BNC and similar language corpora.

Xaira is an enhanced version of the Sara program, originally produced for use with the BNC. In addition to the features included in the Sara program, you can use Xaira with BNC XML to
  • search by tag only. For example ‘all –ing-forms of verbs’, ‘preposition + that’, etc.
  • search subcorpora defined by existing text categories (genre, written/spoken, year of publication, target audience, etc.)
  • define searchable subcorpora according to your own categorization
  • display search result as graphs
  • quickly see distribution across text categories (existing or user-defined)
  • retrieve collocations based on words, lemmas, or part-of-speech tags
For more information about Xaira, visit the Xaira webpage

Space and system requirements

BNC XML Edition is delivered on two DVD-ROMs which contain
  • the corpus itself in compressed form
  • full reference documentation
  • pre-built indexes for use with a XAIRA server
  • a specially customised version of the XAIRA software for Windows

The corpus consists of 4,000+ texts files that occupy 4 Gb diskspace (unpacked). Once unpacked, the corpus files can be used with any software that can handle xml-files.

To install the corpus using the automatic installation procedure you need 9 Gb of free diskspace, of which 5 will be deleted at the end of the installation. The installation procedure must be run with Administrator Privileges. It has been tested under Windows 2000, Windows XP, and Windows Vista.

To use the corpus with Xaira, you need to install the Xaira client program on your desktop. It is currently available for Windows only. If you are using Xaira with a local copy of the corpus, installed on your desktop, you also need to install the Xaira indexes. The Xaira indexes occupy about 5 Gb of diskspace.

If you want to use BNC XML Edition and Xaira on a network, you install the corpus and the Xaira server system on your server, and the Xaira client on the desktops from where you wish to access the corpus. The corpus itself and the XAIRA server system can be installed on Windows, Linux, or Macintosh systems.

Up: Contents Next: Licensing conditions