BNC User Reference Guide

3 Written texts

Up: Contents Previous: 2 Basic structure Next: 4 Spoken texts

3.1 Divisions of written texts

Written texts exhibit a rich variety of different structural forms. Some have very little organization at levels higher than the paragraphs; others may have a complex hierarchy of parts, sections, chapters etc. Novels are divided into chapters, newspapers into sections, reference works into articles and so forth. In the BNC all such structural divisions are represnted uniformly by means of the <div> element.
  • <div> (text division) contains a subdivision of the front, body, or back of a text.
    n supplies an additional name or number for this division, taken from the original source.
    level specifies the hierarchic level of this division as a number between 1 (outermost or largest division) and 4 (innermost or smallest).
    type identifies the type or function of the division (for a written text).
In written texts, the n attribute is sometimes used to supply an identifying name or number used within the text for a given division, for example, a chapter number, as in the following example:
 <div type="chapter" n="three" level="1">...</div>
More often, however, chapter names or numbers will appear within the text, tagged using the <head> element discussed in section 3.2.1 Headings and captions below.

The value of the attribute type is used to characterise the function of the textual division (see the reference documentation for the values used). If a value is supplied for one division at a given level, it may be assumed to apply to all subsequent divisions at the same level until the end of the enclosing element, although it is not always explicitly specified.

Where <div> levels are nested, for example where the chapters of a novel are grouped into parts each of which may have its own title or number, the level attribute is used to indicate the depth of nesting. This is not strictly necessary (since an XML-aware processor retains this information) but has been added for the convenience of users of previous versions of the corpus, in which the level was explicitly coded into the name of the surrounding element (<div1>, <div2> etc.)

In text ANY, for example, each chapter of the original novel corresponds with a <div level="2">, because the work contains groups of chapters, each of which begins with a page containing just a date. The opening of the text is therefore encoded as follows:
 <wtext type="FICTION">
  <div level="1" n="1">
   <s n="1">
    <w c5="NP0" hw="monday" pos="SUBST">Monday</w>
    <c c5="PUN">, </c>
    <w c5="NP0" hw="january" pos="SUBST">January </w>
    <w c5="ORD" hw="13th" pos="ADJ">13th</w>
    <c c5="PUN">, </c>
    <w c5="CRD" hw="1986" pos="ADJ">1986</w>
    <c c5="PUN">.</c>
   </s>
   <div level="2" n="1">
    <p>
     <s n="2">
      <w c5="NP0" hw="victor" pos="SUBST">Victor </w>
      <w c5="NP0" hw="wilcox" pos="SUBST">Wilcox </w>
      <w c5="VVZ" hw="lie" pos="VERB">lies </w>
      <w c5="AJ0" hw="awake" pos="ADJ">awake</w>
     ...
      </s>
    </p>
   </div>
  </div>...</wtext>
<!-- ANY -->

Note however that in some texts initial sentences (like ‘Monday, January 13th, 1986’ above) may have been misplaced, so that they appear at the start of an inner <div> rather than the start of its parent.

A sequence of paragraph-level elements of arbitrary length may precede the first structural subdivision at any level. A text may have no structural divisions within it at all. Note that any prefatory or appended matter not forming part of a text will not generally be captured: the tei elements <front> and <back> elements are not used.

3.2 Paragraph-level elements and chunks

Written texts may be organized into structural units containing more than one <s> element and smaller than any of the divisions discussed in section 3.1 Divisions of written texts above. The most commonly found such element is the <p> (paragraph):
  • <p> (paragraph) marks paragraphs in prose.
Several other elements may however appear directly within <div> or within <text> elements, not nested within some other element such as a paragraph. An list of these elements follows:
  • <head> (heading) contains any type of heading, for example the title of a section or a poem.
    type describes the kind of heading.
    rend a code briefly characterising the way the element content was originally presented.
  • <quote> (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.
    rend a code briefly characterising the way the element content was originally presented.
  • <sp> (speech) An individual speech in a performance text, or a passage presented as such in a prose or verse text.
    who indicates the person, or group of people, to whom the element content is ascribed.
  • <lg> (line group) contains a group of verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.
  • <list> contains any sequence of items organized as a list.
  • <note> contains a note or annotation.
    place specifies where the note is placed in the original source.
    n internal identifier.
  • <bibl> (bibliographic citation) contains any bibliographic reference, occurring either within the header of a written corpus text in which case it has a fixed substructure, or within the body of a corpus text, in which case it contains only s elements.
    rend a code briefly characterising the way the element content was originally presented.

Each of these elements contains one or more <s> elements, as discussed above; in some cases enclosed by an intermediate element. They are used chiefly to indicate the function of sections of the text, as indicated in the list above.

The following sections provide examples for the use of each of these elements.

3.2.1 Headings and captions

One or more <head> elements of specified types may appear in sequence at the start of any <div> element, or at the start of a <list> or <poem>, as in the following examples:.
 <div level="1" n="1">
  <head type="MAIN">
   <s n="1">
    <w c5="NN1" hw="ageism" pos="SUBST">AGEISM</w>
   </s>
  </head>
  <head type="SUB">
   <s n="2">
    <w c5="AT0" hw="the" pos="ART">THE </w>
    <w c5="NN1" hw="foundation" pos="SUBST">FOUNDATION </w>
    <w c5="PRF" hw="of" pos="PREP">OF </w>
    <w c5="NN1" hw="age" pos="SUBST">AGE </w>
    <w c5="NN1" hw="discrimination" pos="SUBST">DISCRIMINATION</w>
   </s>
  </head>
  <head type="BYLINE">
   <s n="3">
    <w c5="NP0" hw="steve" pos="SUBST">STEVE </w>
    <w c5="NP0-NN1" hw="scrutton" pos="SUBST">SCRUTTON</w>
   </s>
  </head>...</div>
<!-- B01 -->

As shown above, the type attribute is used to distinguish more exactly the function of a heading.

Note that, in the BNC, captions or headings which ‘float’ within the text, that is, which appear elsewhere than at the very beginning of the section which they name, are not encoded as <head> elements. A <head> element can appear only at the start of a text division and is logically associated with it (for example, chapter titles, newspaper headlines etc.). Paragraphs which provide heading or captioning information, but which are logically independent of their position within a textual division (for example, captions attached to pictures or figures, or ‘pull-quotes’ embedded within the text) are represented in the same way as any other paragraph of text, using the <p> element, but specifying the value caption in their rend attribute.

In the following example, the <head> element is followed by a number of captions introducing particular parts of a magazine story:
 <div level="1">
  <head>
   <s n="40">
    <w c5="NN2" hw="trousers" pos="SUBST">TROUSERS </w>
    <w c5="VVB-NN1" hw="suit" pos="VERB">SUIT</w>
   </s>
  </head>
  <p type="caption">
   <s n="41">
    <w c5="EX0" hw="there" pos="PRON">There </w>
    <w c5="VBZ" hw="be" pos="VERB">is </w>
    <w c5="PNI" hw="nothing" pos="PRON">nothing </w>
    <w c5="AJ0" hw="masculine" pos="ADJ">masculine </w>
    <w c5="PRP" hw="about" pos="PREP">about </w>
    <w c5="DT0" hw="these" pos="ADJ">these </w>
    <w c5="AJ0" hw="new" pos="ADJ">new </w>
    <w c5="NN1" hw="trousers" pos="SUBST">trouser </w>
    <w c5="VVZ-NN2" hw="suit" pos="VERB">suits </w>
    <w c5="PRP" hw="in" pos="PREP">in </w>
    <w c5="NN1" hw="summer" pos="SUBST">summer</w>
    <w c5="POS" hw="'s" pos="UNC">'s </w>
    <w c5="AJ0" hw="soft" pos="ADJ">soft </w>
    <w c5="NN2" hw="pastel" pos="SUBST">pastels</w>
    <c c5="PUN">.</c>
   </s>...</p>...</div>
<!--C8B -->

3.2.2 Quotations

A quotation is an extract from some other work than the text itself which is embedded within it, for example as an epigraph or illustration. It is marked up using the <quote> element. This may contain any combination of other chunks (for example paragraphs, poems, lists) but may not directly contain <w> or <s> elements. A reference for the citation may also be contained within it.

For example:
 <quote>
  <p>
   <s n="2080">
    <w c5="DT0" hw="this" pos="ADJ">This </w>
    <w c5="NN1" hw="way" pos="SUBST">way </w>
    <w c5="PRP" hw="for" pos="PREP">for </w>
    <w c5="AT0" hw="the" pos="ART">the </w>
    <w c5="AJ0" hw="sorrowful" pos="ADJ">sorrowful </w>
    <w c5="NN1" hw="city" pos="SUBST">city</w>
    <c c5="PUN">.</c>
   </s>
  ...
  <s n="2083">
    <w c5="VVB" hw="abandon" pos="VERB">Abandon </w>
    <w c5="DT0" hw="all" pos="ADJ">all </w>
    <w c5="NN1" hw="hope" pos="SUBST">hope</w>
    <c c5="PUN">, </c>
    <w c5="PNP" hw="you" pos="PRON">you </w>
    <w c5="PNQ" hw="who" pos="PRON">who </w>
    <w c5="VVB" hw="enter" pos="VERB">enter</w>
    <c c5="PUN"></c>
   </s>
   <bibl>
    <s n="2084">
     <w c5="NP0" hw="dante" pos="SUBST">Dante</w>
    </s>
   </bibl>
  </p>
 </quote>
<!-- C8L -->

3.2.3 Spoken paragraphs

As noted above, the <sp> element is used to mark parts of a written text which were or are intended to be spoken, for example the speeches in a dramatic text or a published interview. Such parts are generally readily identifiable by the use of such conventions as speaker prefixes (the label supplying the name of the speaker) and stage directions, for which the following additional elements are used:
  • <speaker> A specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment.
  • <stage> (stage direction) contains any kind of stage direction within a dramatic text or fragment.
    rend a code briefly characterising the way the element content was originally presented.

The <sp> element is used only for speech which is presented as such in a written text, by contrast with the element <u> discussed in section 4.2 Utterances, which is used only for speaker turns identified in a spoken text, i.e. one which has been transcribed from audio tape.

If present, a <speaker> element will appear only at the start of the <sp> element, followed by one or more <p> elements containing the actual speech.

Here is an example of a stage direction occurring within a speech:
 <sp>
  <p>
   <s n="1115">
    <w c5="CRD" hw="seven" pos="ADJ">Seven </w>
    <w c5="NN2" hw="book" pos="SUBST">books </w>
    <w c5="AT0" hw="a" pos="ART">a </w>
    <w c5="NN1" hw="week" pos="SUBST">week</w>
    <c c5="PUN">.</c>
   </s>
  </p>
  <stage rend="it">
   <s n="1119">
    <w c5="PNP" hw="he" pos="PRON">He </w>
    <w c5="VVZ" hw="dance" pos="VERB">dances</w>
   </s>
  </stage>
  <p>
   <s n="1122">
    <w c5="NN1" hw="library" pos="SUBST">Library </w>
    <w c5="NN2" hw="book" pos="SUBST">books</w>
    <c c5="PUN">.</c>
   </s>
  </p>
 </sp>
<!-- A06 -->
These elements appear frequently in formal transcriptions of written proceedings, notably those parts of the BNC which are extracted from Hansard:
 <sp>
  <p>
   <s n="20468">
    <w c5="DT0" hw="that" pos="ADJ">That </w>
    <w c5="NN1" hw="millionaire" pos="SUBST">millionaire </w>
    <w c5="NN1" hw="mammy" pos="SUBST">mammy</w>
    <w c5="POS" hw="'s" pos="UNC">'s </w>
    <w c5="NN1" hw="boy" pos="SUBST">boy </w>
    <c c5="PUN"></c>
   </s>
   <stage>
    <s n="20469">
     <w c5="NN1" hw="interruption" pos="SUBST">Interruption</w>
    </s>
   </stage>
  </p>
 </sp>
 <sp>
  <speaker>
   <s n="20470">
    <w c5="NP0" hw="mr." pos="SUBST">Mr. </w>
    <w c5="NP0" hw="speaker" pos="SUBST">Speaker</w>
   </s>
  </speaker>
  <p>
   <s n="20471">
    <w c5="NN1-VVB" hw="order" pos="SUBST">Order</w>
    <c c5="PUN">.</c>
   </s>
   <s n="20472">
    <w c5="DT0" hw="that" pos="ADJ">That </w>
    <w c5="VBZ" hw="be" pos="VERB">is </w>
    <w c5="XX0" hw="not" pos="ADV">not </w>
    <w c5="AV0" hw="wholly" pos="ADV">wholly </w>
    <w c5="AJ0" hw="unparliamentary" pos="ADJ">unparliamentary</w>
    <c c5="PUN">.</c>
   </s>
  </p>
 </sp>
<!-- HHV -->

3.2.4 Poetry

Poetry is distinguished from prose in the BNC where it is so presented in the original, for example as fragments of verse or song appearing within or between paragraphs of prose. The <l> (line) element is used to mark each verse line; where there are several such lines, perhaps with a heading, they are grouped together using the <lg> (linegroup) element, and any title or heading present is marked with a <head> element.

For example:
 <lg>
  <l>
   <s n="906">
    <w c5="PNP" hw="i" pos="PRON">I </w>
    <w c5="VVB" hw="send" pos="VERB">send </w>
    <w c5="DPS" hw="i" pos="PRON">my </w>
    <w c5="NN1" hw="soul" pos="SUBST">soul </w>
    <w c5="PRP" hw="through" pos="PREP">through </w>
    <w c5="NN1" hw="time" pos="SUBST">time </w>
    <w c5="CJC" hw="and" pos="CONJ">and </w>
    <w c5="NN1-VVB" hw="space" pos="SUBST">space </w>
    <w c5="TO0" hw="to" pos="PREP">to </w>
    <w c5="VVI" hw="greet" pos="VERB">greet </w>
    <w c5="PNP" hw="you" pos="PRON">you</w>
    <c c5="PUN">.</c>
   </s>
  </l>
  <l>
   <s n="907">
    <w c5="PNP" hw="you" pos="PRON">You </w>
    <w c5="VBD" hw="be" pos="VERB">were </w>
    <w c5="AT0" hw="a" pos="ART">a </w>
    <w c5="NN1" hw="poet" pos="SUBST">poet</w>
    <c c5="PUN">.</c>
   </s>
   <s n="908">
    <w c5="PNP" hw="you" pos="PRON">You </w>
    <w c5="VM0" hw="will" pos="VERB">will </w>
    <w c5="VVI" hw="understand" pos="VERB">understand</w>
    <c c5="PUN">.</c>
   </s>
  </l>
 </lg>
<!-- CCB -->

Note that the <l> element is not used to mark typographic lineation. Layout information is not, in general, preserved in the BNC.

3.2.5 Lists

A list is a collection of distinct items flagged as such by special layout in written texts, often functioning as a single syntactic unit. Lists may appear within or between paragraphs. Where marked, lists are tagged with the <list> element, which may contain the following subelements:
  • <head> (heading) contains any type of heading, for example the title of a section or a poem.
  • <label> contains the label associated with an item in a list; in glossaries, marks the term being defined.
    rend a code briefly characterising the way the element content was originally presented.
  • <item> contains one component of a list.
    rend a code briefly characterising the way the element content was originally presented.

A <list> element consists of an optional <head> element, followed by one or more <item> elements, each of which may optionally be preceded by a <label> element, used to hold the identifier or tag sometimes attached to a list item, for example ‘(a)’. It may also contain a word or phrase used for a similar purpose.

The <item> element may appear only inside lists. It contains the same mixture of elements as a paragraph, and may thus contain one or more nested lists. It may also contains a series of paragraphs, each marked with a <p> element.

Here is an example of a simple list:
 <list>
  <item>
   <s n="87">
    <w c5="VBZ" hw="be" pos="VERB">Is </w>
    <w c5="DPS" hw="you" pos="PRON">your </w>
    <w c5="NN1" hw="nylon" pos="SUBST">nylon </w>
    <hi rend="it">
     <w c5="NN1" hw="nightie" pos="SUBST">nightie </w>
    </hi>
    <w c5="AJ0" hw="fireproof" pos="ADJ">fireproof</w>
    <c c5="PUN">?</c>
   </s>
  </item>
  <item>
   <s n="88">
    <w c5="AT0" hw="the" pos="ART">The </w>
    <w c5="NN1" hw="hurricane" pos="SUBST">hurricane </w>
    <w c5="VBD" hw="be" pos="VERB">was </w>
    <hi rend="it">
     <w c5="AV0" hw="mighty" pos="ADV">mighty </w>
    </hi>
    <w c5="AJ0" hw="fierce" pos="ADJ">fierce</w>
    <c c5="PUN">.</c>
   </s>
  </item>
  <item>
   <s n="89">
    <w c5="VM0" hw="will" pos="VERB">Will </w>
    <w c5="PNP" hw="you" pos="PRON">you </w>
    <hi rend="it">
     <w c5="VVI" hw="mow" pos="VERB">mow </w>
    </hi>
    <w c5="AT0" hw="the" pos="ART">the </w>
    <w c5="NN1" hw="lawn" pos="SUBST">lawn</w>
    <c c5="PUN">?</c>
   </s>
  </item>
  <item>
   <s n="90">
    <w c5="VDD" hw="do" pos="VERB">Did </w>
    <w c5="PNP" hw="you" pos="PRON">you </w>
    <hi rend="it">
     <w c5="VVI" hw="know" pos="VERB">know </w>
    </hi>
    <w c5="AT0" hw="the" pos="ART">the </w>
    <w c5="NN1" hw="time" pos="SUBST">time</w>
    <c c5="PUN">?</c>
   </s>
  </item>
 </list>
<!-- C9R -->
Here is an example of a labelled list:
 <list>
  <label>
   <s n="424">
    <w c5="CRD" hw="1" pos="ADJ">1</w>
    <c c5="PUN">.</c>
   </s>
  </label>
  <item>
   <s n="425">
    <w c5="NN1-NP0" hw="surya" pos="SUBST">Surya </w>
    <c c5="PUN"></c>
    <w c5="NN1" hw="sun" pos="SUBST">Sun </w>
    <c c5="PUN"></c>
    <w c5="AJ0" hw="creative" pos="ADJ">Creative </w>
    <w c5="NN1" hw="agent" pos="SUBST">agent</w>
   </s>
  </item>
  <label>
   <s n="426">
    <w c5="CRD" hw="2" pos="ADJ">2</w>
    <c c5="PUN">.</c>
   </s>
  </label>
  <item>
   <p>
    <s n="427">
     <w c5="NN1-NP0" hw="vayu" pos="SUBST">Vayu </w>
     <c c5="PUN"></c>
     <w c5="NN1" hw="air" pos="SUBST">Air </w>
     <c c5="PUN"></c>
     <w c5="VVG-AJ0" hw="preserve" pos="VERB">Preserving </w>
     <w c5="NN1" hw="agent" pos="SUBST">agent </w>
     <pb n="43"/>
    </s>
   </p>
  </item>
  <label>
   <s n="428">
    <w c5="CRD" hw="3" pos="ADJ">3</w>
    <c c5="PUN">.</c>
   </s>
  </label>
  <item>
   <p>
    <s n="429">
     <w c5="NN2" hw="agni" pos="SUBST">Agni </w>
     <c c5="PUN"></c>
     <w c5="NN1-VVB" hw="fire" pos="SUBST">Fire </w>
     <c c5="PUN"></c>
     <w c5="AJ0" hw="destructive" pos="ADJ">Destructive </w>
     <w c5="NN1" hw="agent" pos="SUBST">agent</w>
    </s>
   </p>
  </item>
 </list>
<!-- CB9 -->

3.2.6 Notes and citations

Annotations occurring in written texts, and bibliographic citations or references, have been marked up in some texts, using the <note> element.

Original notes may contain any mixture of other chunks, and may also contain paragraphs: they appear in written texts only. They may be relocated to the end of the section in which they appear.

For example:
 <note place="SIDE">
  <s n="477">
   <w c5="AT0" hw="the" pos="ART">The </w>
   <w c5="AJ0-NN1" hw="short" pos="ADJ">short </w>
   <w c5="VBZ" hw="be" pos="VERB">is </w>
   <w c5="AT0" hw="a" pos="ART">a </w>
   <w c5="NN1" hw="film" pos="SUBST">film </w>
   <w c5="PRP" hw="about" pos="PREP">about </w>
   <w c5="NN1-VVG" hw="sailing" pos="SUBST">sailing</w>
   <c c5="PUN">.</c>
  </s>...</note>
<!-- A6C -->
Note the use of the n attribute to carry the original footnote number in the above example.

3.2.7 Bibliographic references

Bibliographic citations or references within running texts may also be marked, using the <bibl> element; in the present version of the corpus this is done in some texts only.

For example:
 <quote>
  <p rend="it">
   <s n="1943">
    <w c5="NN1" hw="zombie" pos="SUBST">Zombie </w>
    <w c5="AT0" hw="no" pos="ART">no </w>
    <w c5="NN1" hw="go" pos="SUBST">go </w>
    <w c5="CJS" hw="unless" pos="CONJ">unless </w>
    <w c5="PNP" hw="you" pos="PRON">you </w>
    <w c5="VVB" hw="tell" pos="VERB">tell </w>
    <w c5="VVB-NN1" hw="im" pos="VERB">im </w>
    <w c5="TO0" hw="to" pos="PREP">to </w>
    <w c5="VVI" hw="go" pos="VERB">go</w>
   </s>
   <bibl>
    <s n="1944">
     <w c5="AT0" hw="the" pos="ART">The </w>
     <w c5="NP0" hw="communards" pos="SUBST">Communards</w>
     <c c5="PUN">.</c>
    </s>
   </bibl>
  </p>
 </quote>
<!-- A0L -->

Note that the <bibl> element used within corpus texts has none of the more detailed sub-elements described for it in 5.1.5.2 Structured bibliographic record. Like all the other elements described in the present subsection, the <bibl> element appearing within corpus texts contains only <s> elements.

3.3 Phrase-level elements

Phrase-level elements are elements which cannot appear directly within a textual division, but must be contained by some other element. In practice, this means they will be contained within an <s> element. In addition to the <w>, <mw>, and <c> elements already discussed, only the following phrase-level elements appear within <s> elements in written texts:
  • <pb> (page break) marks the boundary between one page of a text and the next in a standard reference system.
    n gives the number of the page beginning here.
  • <hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made.
    rend a code briefly characterising the way the element content was originally presented.

3.3.1 Page breaks

Wherever possible, the original pagination and page numbering of the source text has been preserved. The <pb> element is used to mark the approximate position in the text at which each new page starts, and its n attribute supplies the number of the page.
 <l>
  <s n="1403">
   <c c5="PUN"></c>
   <w c5="CJC" hw="and" pos="CONJ">and </w>
   <w c5="NN2" hw="creditor" pos="SUBST">creditors </w>
   <w c5="VVB" hw="grow" pos="VERB">grow </w>
   <w c5="AJ0" hw="cruel" pos="ADJ">cruel</w>
   <c c5="PUN">,</c>
  </s>
 </l>
 <l>
  <s n="1404">
   <pb n="75"/>
   <w c5="AV0" hw="so" pos="ADV">so </w>
   <w c5="PNP" hw="he" pos="PRON">he </w>
   <w c5="VVZ" hw="bow" pos="VERB">bows </w>
   <w c5="CJC" hw="and" pos="CONJ">and </w>
   <w c5="NN2-VVZ" hw="scrape" pos="SUBST">scrapes</w>
   <c c5="PUN">,</c>
  </s>
 </l>
<!-- HNU -->
Where several pages have been left out of a transcription, for example because they are blank or contain illustrations only, a <pb> element may be given for each, as in this example:
 <s n="1323">
  <w c5="PNP" hw="i" pos="PRON">I </w>
  <w c5="VHB" hw="have" pos="VERB">have</w>
  <w c5="XX0" hw="not" pos="ADV">n't </w>
  <w c5="VBN" hw="be" pos="VERB">been </w>
  <w c5="PRP" hw="to" pos="PREP">to </w>
  <w c5="AT0" hw="an" pos="ART">an </w>
  <w c5="AJ0" hw="organized" pos="ADJ">organized </w>
  <w c5="NN1" hw="campsite" pos="SUBST">campsite </w>
  <w c5="PRP" hw="for" pos="PREP">for </w>
  <pb n="64"/>
  <pb n="65"/>
  <pb n="66"/>
  <w c5="AV0" hw="perhaps" pos="ADV">perhaps </w>
  <w c5="CRD" hw="fifteen" pos="ADJ">fifteen </w>
  <w c5="NN2" hw="year" pos="SUBST">years</w>
  <c c5="PUN">, </c>
  <w c5="AV0" hw="so" pos="ADV">so </w>
  <w c5="DT0" hw="all" pos="ADJ">all </w>
  <w c5="DT0" hw="this" pos="ADJ">this </w>
  <w c5="VBZ" hw="be" pos="VERB">is </w>
  <w c5="AJ0" hw="new" pos="ADJ">new </w>
  <w c5="PRP" hw="to" pos="PREP">to </w>
  <w c5="PNP" hw="i" pos="PRON">me</w>
  <c c5="PUN">.</c>
 </s>
<!-- A6T -->

3.3.2 Highlighted phrases

Typographic changes or highlighting in the original may not be marked in the transcript at all. Alternatively, highlighted phrases, and the kind of highlighting used, may be recorded in one of two ways:
  • using the rend (rendition) attribute on elements for which this is defined
  • using the <hi> (highlighted) element

The former is used where the whole of the content of one of the elements <bibl>, <corr>, <div>, <head>, <item>, <l>, <label>, <list>, <p>, <quote> or <stage> is highlighted. The latter is used on all other occasions. The values available for the rend attribute in either case and their significance are as listed in the reference documentation in all cases.

It should be noted that the purpose of the rend attribute is not to provide information adequate to the needs of a typesetter, but simply to record some qualitative information about the original.

Like all other phrase-level elements, each <hi> element must be entirely contained by an <s> element. This implies that where, for example, a bolded passage contains more than one sentence, or an italicised phrase begins in one verse line and ends in another, the <hi> element must be closed at the end of the enclosing element, and then re-opened within the next.

 <s n="2211">
  <hi rend="it">
   <w c5="NN1" hw="apple" pos="SUBST">Apple </w>
  </hi>
  <w c5="VBZ" hw="be" pos="VERB">is </w>
  <w c5="PRP" hw="to" pos="PREP">to </w>
  <hi rend="it">
   <w c5="NN0" hw="fruit" pos="SUBST">fruit </w>
  </hi>
  <w c5="CJS-PRP" hw="as" pos="CONJ">as </w>
  <hi rend="it">
   <w c5="NN1" hw="dog" pos="SUBST">dog </w>
  </hi>
  <w c5="VBZ" hw="be" pos="VERB">is </w>
  <w c5="PRP" hw="to" pos="PREP">to </w>
  <hi rend="it">
   <w c5="ZZ0" hw="x" pos="SUBST">X </w>
  </hi>
  <c c5="PUN">.</c>
 </s>
<!-- FAC -->
For example, in the following four lines of verse, the first three are rendered in italics, and the rend attribute is therefore specified for each <l> element. In the fourth line, only the first few words are in italics: a <hi> element must be used within the <l> to carry this information.
 <l rend="it">
  <s n="394">
   <w c5="PNP" hw="it" pos="PRON">It </w>
   <w c5="VBD" hw="be" pos="VERB">was </w>
   <w c5="CRD" hw="one" pos="ADJ">one </w>
   <w c5="PRF" hw="of" pos="PREP">of </w>
   <w c5="AT0" hw="a" pos="ART">a </w>
   <w c5="NN0" hw="pair" pos="SUBST">pair</w>
   <c c5="PUN">.</c>
  </s>
  <s n="395">
   <w c5="DPS" hw="it" pos="PRON">Its </w>
   <w c5="AJ0" hw="precious" pos="ADJ">precious </w>
   <w c5="NN1" hw="twin" pos="SUBST">twin</w>
  </s>
 </l>
 <l rend="it">
  <s n="396">
   <w c5="VBD" hw="be" pos="VERB">was </w>
   <w c5="VVN" hw="steal" pos="VERB">stolen </w>
   <w c5="PRP" hw="by" pos="PREP">by </w>
   <w c5="AT0" hw="the" pos="ART">the </w>
   <w c5="NN2" hw="soldier" pos="SUBST">soldiers</w>
   <c c5="PUN">.</c>
  </s>
  <s n="397">
   <w c5="DT0" hw="all" pos="ADJ">All </w>
   <w c5="AT0" hw="the" pos="ART">the </w>
   <w c5="NN1" hw="time" pos="SUBST">time</w>
  </s>
 </l>
 <l rend="it">
  <s n="398">
   <w c5="DPS" hw="she" pos="PRON">her </w>
   <w c5="NN1" hw="uncle" pos="SUBST">uncle </w>
   <w c5="VVD" hw="stand" pos="VERB">stood </w>
   <w c5="AV0" hw="there" pos="ADV">there </w>
   <w c5="VVG" hw="clutch" pos="VERB">clutching </w>
   <w c5="DT0" hw="this" pos="ADJ">this </w>
   <w c5="PNI" hw="one" pos="PRON">one </w>
   <w c5="AVP-PRP" hw="in" pos="ADV">in</w>
  </s>
 </l>
 <l>
  <s n="399">
   <hi rend="it">
    <w c5="DPS" hw="he" pos="PRON">his </w>
    <w c5="AJ0" hw="big" pos="ADJ">big </w>
    <w c5="NN1" hw="fist" pos="SUBST">fist </w>
   </hi>
   <c c5="PUN"></c>
   <w c5="AV0" hw="so" pos="ADV">so</w>
   <c c5="PUN">!</c>
  </s>
  <s n="400">
   <w c5="PNP" hw="she" pos="PRON">She </w>
   <w c5="VDZ" hw="do" pos="VERB">does </w>
   <w c5="AT0" hw="a" pos="ART">a </w>
   <w c5="AJ0" hw="little" pos="ADJ">little </w>
   <w c5="NN1" hw="mime" pos="SUBST">mime</w>
   <c c5="PUN">.</c>
  </s>
 </l>
<!-- C8X -->

Up: Contents Previous: 2 Basic structure Next: 4 Spoken texts



edited by Lou Burnard. Date: January 2007
This page is copyrighted