[bnc] 6 The Query menu - Reference Guide to the SARA Windows Client Quick Reference Guide to the SARA Windows Client

6.1 Editing a query

Selecting the Edit command from the Query menu will redisplay whichever dialogue box it was that launched the query whose results are currently displayed. The command is grayed out and unavailable if no results are being displayed.

The query dialogue box will be displayed as it was when the query was sent to the server by pressing the OK button. You can change any part of the dialogue box, and resubmit it by pressing the OK button again, or press Cancel to close the dialogue box and start again.

6.2 Sorting results

By default, the results of a query are displayed in their order of appearance within the corpus, alphabetically ordered by its three character filenames. This is rarely of any particular significance, except to group solutions from the same text, and so it is generally desirable to reorder a line mode (concordance) display. This is done by selecting the Sort command from the Query menu, which displays the Sort dialogue box.

You can use the radio buttons in this dialogue box to specify either one or two keys for the sort, and a single collating sequence, applicable to both keys. The keys determine which part of each hit is to be used to sort the results; the collating sequence determines how these keys are to be compared when deciding on their relative order.

The Primary keys for all the context lines are compared first, according to the collating sequence indicated. If any duplicates are found, the Secondary keys are used to order them. Note that the same collating method must be used for both keys.

The Span box indicates how many words make up the key in each case. The Left, Centre or Right radio buttons indicate the position of the key relative to the query focus (i.e. the hit word in the context). If the Left radio button is selected, and the Span is 1, the key will be the word to the left of each query focus. If the Centre radio button is selected, and the Span is 1, the key will be the first word of the query focus itself. If the Right radio button is selected and the Span is 1, the key will be the first word following the query focus.

The Ascending and Descending radio buttons indicate whether the keys are to be sorted into ascending or descending alphabetical order.

The collating method used for both keys is indicated by the radio buttons to the right of the dialogue box. With theASCII radio button selected, keys are compared according to the ASCII character sequence, in which all uppercase letters precede all lower case ones: `Zebra' precedes `antelope'. With theIgnore case button selected, case distinctions are ignored, so that `Zebra' and `zebra' are regarded as the same key. With the Ignore accents button selected, accented letters are treated as if they were unaccented, so that `[eacute]l[egrave]ve' and `[eacute]lev[eacute]' are regarded as the same key.

If the results being sorted are being displayed in either POS or SGML mode (see further section 6.4.1 ), then the POS Label button is available for selection. Selecting it causes keys to be sorted not by their orthographic form but the alphabetical order of their part of speech code. This has the effect of grouping together keys with the same POS code. You can use it, for example, to sort a set of results by the POS code of the word following the query focus.

6.3 Thinning results

Selecting the Thin command from the Query menu opens up a sub-menu from which four selections are available, each of which allows you to reduce the number of displayed solutions in the current result set. The commands available are:

Selection discards from the result set all the solutions which have not previously been selected (i.e. all those solutions which do not appear on the screen in reverse video are discarded);
Reverse Selection discards from the result set all the solutions which have previously been selected(i.e. all those solutions which appear on the screen in reverse video are discarded);
Random solutions are discarded at random until the number of solutions matches the number you specify in a subwindow;
One per text discards from the result set all but the first solution from any one text.

The current item in a displayed list can be selected either by double clicking on it, or by pressing the space bar.

Each time you request a random selection from a given set of results, you will get a different random sequence. The only way to get the same random selection more than once is to save the query after thinning it. When a thinned query is saved, any thinning is saved at the same time.

6.4 Options for displaying results

The results of a query can be displayed in one of two modes and in one of four different formats). You can also vary the amount of context or scope displayed for each result. Which options are in effect for a particular query will depend on the initial settings specified by the User Preferences dialogue box (see 7.5 ). The display mode can be changed by using the Concordance command on the Query menu, or by toggling theConcordance button; the format can be changed for a particular set of results by selecting Options from the Query menu. This also determines the amount of context displayed for each result.

6.4.1 Display mode

In line mode, each occurrence of the item searched for is displayed as a single line on the screen; in page mode, each occurrence is displayed in full on the screen, taking as many lines as necessary.

The Concordance button is used to switch between one mode and the other. The initial mode is set by theConcordance checkbox in the User Preferences dialogue box (see 7.5 ): if this is checked, line mode is used; otherwise page mode is used. Selecting the Concordance command from the Query menu or clicking on the concordance button, enables you to switch modes for a particular set of results.

The usual Windows controls are available to enable you to display different parts of a large set of results. In line mode, you can use the vertical scroll bar to the right of the window to scroll up and down the results; in either mode, you can use the arrow buttons in the tool bar to step through the solutions one at a time. You can also use the cursor keys, PgUp andPgDn, Home and End, to move through the result set in the usual way.

6.4.2 Display format

Select the Options command from the Query menu to display the Options dialogue box. The radio buttons selected here determine the format used to display the current results and the amount of context (or scope) visible to either side of the query focus, as further discussed in section 6.4.3 .

The following four display formats are available:

plain only the words and punctuation of each hit are displayed, optionally with the query focus in a different colour or typeface (as determined by the fonts and colour selected: see7.4 );
POS part of speech information for any word on the screen can be displayed by clicking on it with the right mouse button; in addition, words may be displayed in different colours depending on their part of speech, as determined by the colour scheme in use (see 7.4 ); selecting this format also makes it possible to sort the solutions by their part of speech code (see 6.2 );
SGML each hit is displayed with its full SGML markup;
Custom each hit is displayed according to a user-defined format, as further discussed in section 6.4.4 ; the Configure button can be used to change this format if the default is inappropriate.

Changing any of these options will affect the display of the current query only. To change the display of all subsequent queries, changes must be made in the User Preferences dialogue box (see 7.5 ). Note also that changing the format of the display will usually require that the results be downloaded again.

6.4.3 Display scope

The maximum amount of context which can be displayed for each hit is set by the Max download length specified in the User Preferences dialogue box (7.5 ). This sets an upper limit, as a number of characters. Setting it very high will result in long download times; setting it too low will limit the usefulness of what can be displayed on the screen.

Within this overall limit, there are four options for determining the amount of context displayed on the screen by default:

automatic the whole of the smallest unit (larger than a<w>) within which the query focus appears;
sentence the whole of the <s> element within which the query focus appears
paragraph the whole of the <p> or<u> element within which the query focus appears;
maximum as many <s> elements as possible on either side of the query focus, up to the limit imposed by the maximum download length;

If the scope setting results in less than the maximum download length being displayed, you can always expand what is displayed up to that maximum by double-clicking on the display with the right mouse button. This will expand the context up to what would have been obtained if the Maximum scope setting were in force, for the current hit only.

The query focus is that part of a downloaded hit which is normally highlighted within the display. In a simple word, patterm, or phrase query, it is the whole of the word or phrase found which matched the query. In an SGML query, it is the SGML start- or end-tag which matched the query. In a Query Builder query, it is the part of the text which was matched by the last content node, i.e. that nearest the bottom of the screen.

6.4.4 Custom display mode

In custom mode, hits are displayed according to a format which you can tailor to your own liking. You can specify whether or not particular SGML elements should be displayed starting on a new line, whether or not their associated attributes should be displayed, and also specify additional characters to be displayed in association with them.

Two such specifications can be supplied; one, held in a file called linefmt.txt determines how hits should be displayed in line mode displays; the other, held in a file calledpagefmt.txt, determines how hits should be displayed in page mode displays. These are ordinary ASCII files which can be edited and displayed by any editor (such as Notepad), or by pressing the Configure button on the Options dialogue box. The files must be held in the working directory used by the SARA client on your system, and must be writable. (See further section10 ).

The syntax of these files is fairly self explanatory. Each line specifies how a particular element type is to be displayed: if no line is supplied for any element, no special action is taken for it. A line begins with the name of an element (optionally followed by an attribute name) or entity. This is followed by a quoted string, which gives the replacement value for the named entity or for the element's start-tag. A second quoted string can also be supplied to provide a replacement for an element's end-tag. Within replacement strings, the string %s is used to represent the value of the attribute whose name was specified. Formats intended for use in page-mode displays can also use the string\n to indicate a new line and \t to indicate a tab indent.

For example, the default page format file contains the following lines:

div1 "\n" pause "..." event desc "[%s]" u who "\n%s>: "

The first line indicates that the display should start a new line at the start of each new <div1> element. The second line indicates that any <pause> element should be displayed as three dots. The third line indicates that any <event> element should be displayed as whatever value has been supplied for its desc attribute, enclosed in square brackets. Finally, the last line indicates that the content of every <u> element should be prefixed by the start of a new line, the value of its who attribute, and the string >:(i.e. angle bracket, colon, space).

Care should be taken in preparing custom format files, as no syntax checking is currently performed.

6.5 Additional components of the Query window

In addition to the display of results, the query window can contain two other components, each in a separate pane.

Query Text The Query Text command from the Query menu opens or closes a pane in which the CQL text of the current query appears. This cannot be changed, but is useful for documentary purposes. For the syntax of CQL, see section 3.8 . Any thinning options applied to the query are also displayed.
Annotation The Annotation command from the Query menu opens or closes a pane in which you can write any comment or annotation you wish. Such documentation may be useful for future reference when re-running a query.

Both query text and annotation are saved together with the query, along with any valid bookmarks you defined for it.

6.6 Saving results to a file

Selecting the Listing command from the Query menu opens a standard file dialogue box in which you can specify a name for the file in which the current result set is to be saved. The result set is saved in SGML format in a file with the same name as the query itself, with the suffix SGM.

Here is the start of a sample listing file, showing the results of a search for the word `corpuses'.

 <bncXtract> <hdr date='10-Nov-1996 00:03:29' user=lou
                  server='163.32.247' format=untagged> <source>This data is
                  extracted from the British National Corpus. All rights in the texts cited are
                  reserved. This data may not be reproduced or redistributed in any form, other than
                  as provided for by the Fair Use provisions of the Copyright
                  Act</source> <query><![CDATA["corpuses"]
                  ]></query> </hdr> <hit text=EWA
                  n=531><left> Where an absolute norm for English cannot be
                  relied on, the next best thing is to compare the corpus whose style is under
                  scrutiny with one or more comparable
                  <focus>corpuses<right>, thus establishing a relative
                  norm. </hit> <hit text=FRG n=1222><left>
                  These methodological difficulties are associated with a more general problem of
                  deriving generalizations from <focus>corpuses<right>.
                  </hit> </bncXtract>

Housekeeping information about the query itself is saved in a<hdr> element at the start of the file, giving the date the query was solved, the name of the user, and the server machine, as well as the actual text of the query. Each result in the query result set is saved as a separate hit element. The text attribute gives the three character identifier of the text in which the hit was found; the n attribute gives its sentence number. The query focus of the hit is represented as a <focus> element; its left context is represented as a <left> element, and its right context as a <right> element.

Results are saved in a listing file in the format in which they are displayed. Thus, if the result set included SGML tags (i.e. results are being displayed in POS or SGML format), these tags would also appear in the listing file, which would make it difficult to process by other SGML-aware software. To make this less problematic, any angle brackets appearing as content of a<hit> element are converted to square brackets before the listing file is produced. For example, the second<hit> element above would appear as follows if the same query were saved in SGML mode:

 <hit text=FRG n=1222><left>[s n=1222] [w DT0]These
                  [w AJ0]methodological [w NN2]difficulties [w VBB]are [w VVN]associated [w PRP]with
                  [w AT0]a [w AV0]more [w AJ0]general [w NN1]problem [w PRF]of [w AJ0-VVG]deriving
                  [w NN2]generalizations [w PRP]from [w
                  NN2]<focus>corpuses<right>[c PUN].
                  </hit>

Note that both the above examples have been reformatted to fit on the printed page: in an actual listing file, no extra line breaks are introduced within the body of a <hit> element. A full specification of the listing file format is included in .

6.7 Displaying bibliographic information and browsing

Selecting the Source command from the Query menu or clicking on the Source button on the tool bar will display a Bibliographic data window containing information about the text in which the currently selected result appears. It also gives an indication of the size of the text in words and s-units. The information presented is the same as that available from the reference list included in theBNC Users' Reference Guide. Further information about a text, for example its classification, is available only by inspecting elements in its header.

The Bibliographic data for a written text will generally specify its author, title and publisher. The Bibliographic data for a spoken text will identify the situation in which it was recorded, and will also supply demographic or descriptive details for each person speaking in a lower window. This window can be scrolled left to right or up and down as needed.

Click on the OK button to close the Bibliographic data window. Click on the Browse button to switch to Browse mode, enabling you to browse the whole of this text, as discussed above in section 5 .

6.8 The Collocation command

The Collocation command allows you to calculate how frequently words collocate i.e. appear together within the current results. For example, if your current query results show occurrences of the word `death', you might wish to see how often the word `die' appears within a certain number of words of the focus.

Selecting the Collocation command from the Query menu opens the Collocation dialogue box. The name of the current query is displayed, together with the number of hits. Enter the word, punctuation mark, or SGML start-tag for which a collocation score is required (the collocate) in the box labelledCollocate and press the Calculate button. Two counts appear in the box below, indicating how often this collocate appears within a specified span, and what proportion of the hits this represents. You can repeat this process as often as you like, with each new collocate appearing in the same results box. If the collocate appears more frequently than the query focus itself, it is displayed in the highlight colour.

Collocation scores are calculated within the span (i.e. number of L-words) indicated in the box at the bottom left of the dialogue box, by default one word to either side of the first word in the query focus. The span is always counted from the leftmost end of the query focus. Changing the span causes the scores for all words to be recalculated.

You can calculate collocation scores with respect either to the number of hits actually downloaded, or with respect to the number of hits present in the corpus, depending on the setting of theUse downloaded hits only checkbox in the top left corner of the dialogue box.

Note that it is not possible to find out which words collocate strongly with a given word other than by trial and error: you must specify the words for which a collocation score is required. It is also impossible to specify a pattern as a collocate.

You can print the contents of the Collocation dialogue box at any time, by clicking on the Print button. This is the only way of saving the results of a Collocation analysis in the current release of the software.