British National Corpus
Formatting SARA hitlists
You can print the results of a SARA search and you can cut and paste individual solutions to the clipboard. And you can also save all the current solutions into a file, by clicking on the "List solutions" button (or selecting Listing from the Query menu)
A Frequently Asked Question is: ‘Ah, but how do I format this XML file? It looks like gobbledegook when I open it in Word!’ . This document explains why, and what you can do to improve on the situation.
The first thing to understand is that the XML format used contains useful information as well as annoying pointy brackets. The start of the file tells you when the query was run, by whom on which SARA server, and what the actual Query syntax used was. If you added a note to your query, that also appears at the start of the file. Each result in the query file is separately tagged as a <hit> element, with attributes giving the text identifier and the sentence number where it was found. The actual bit of text which SARA decided matched your query is also tagged as a <kw> element.
<foo burble="wibble">drone drone</foo>what you are looking at is an instance of the <foo> element, the content of which is ‘drone drone’ , and which carries a burble attribute whose value is wibble. All clear now?)
Why don't we just save all the files in plain text? Because then you would have to reformat them manually to separate out the additional information like text numbers etc, or highlight the hit words (or do without that information). That's not what computers are for. Why don't we just save all the files in RTF or Word or insert name-of-favourite-word-processor here format? Because we want to make it possible for you to use these files on any computer system on any platform, not just the ones that people in Redmond think you should buy.
We use XML because it was designed as the language for interchange of information on the web. There are already dozens of programs using XML in various ways, and there will be plenty more. The recipes below describe how you can, cost free, get your XML query files into a nicely formatted shape with some tools I happen to know about. Feel free to experiment with others -- and there will be plenty of others to experiment with. Check out the TEI Software page for links to some I've found useful.
Since release of the BNC World edition, we've produced a new version of the client software which provides a couple of new features to improve handling of listing files. You'll need the new version to carry out the recipes described in this note. If you don't know which version you're using, carry out the following:
You should see a dialogue box that looks like the figure below; if you do, you can skip the next step. If however your dialogue box lacks the button labelled XML setup in the bottom left corner, then you have an old version of the client which you need to update. Close SARA down.
Listing dialogue box
XML Setup dialogue
CSS is short for ‘Cascading Style Sheets’ . It is a W3C-defined language for specifying how any XML or HTML document should be formatted. It allows you to state formatting properties (such as font, colour, size etc.) for any XML element, and also (within some limits) to attach additional text to one. Unfortunately, the current generation of web browsers vary greatly in their abilities to handle CSS, but most of them can make a reasonable stab at displaying a SARA listing file.
In this example, I'm using a CSS stylesheet with the name bnchits.css but you can use any filename you like. You can download the text of my stylesheet from this web site (click on this link using the right mouse button): feel free to tinker with it if you don't like my choice of layout properties.
If you copy this stylesheet into the same directory as any XML listing file produced by SARA, you should be able to view the listing file with a web browser such as Opera directly. Here's a screen shot of Opera viewing this sample listing file, using this stylesheet:
<?xml-stylesheet type="text/css" href="bnchits.css"?>Make sure you get the syntax right (question-marks and all). Press OK.
(In my opinion, at present IE5's support for CSS is not brilliant. In particular it's not very good at handling attribute values, so it cannot display the text and line numbers. Netscape 6 and Amaya I have not tried, but I am told they work better. Opera on the other hand does a very good job of displaying CSS.)
XSLT is a powerful general purpose stylesheet language, also defined by the W3C Consortium, which does rather more with an XML file than CSS does. You can do just about any kind of transformation imaginable using this language: converting an XML listing file to HTML with it is rather like using an Arabian scimitar to clip your nails, but none the less effective for that.
You can use an XSLT stylesheet in the same way as you used the CSS stylesheet in the previous recipe, by simply supplying its name to a web browser and having the web browser reformat the XML under its control. However, web browsers vary very greatly in the ways that they do this, and results are hardly reliable.
An alternative approach is to use one of the many XSLT engines available to translate the XML file into another format, such as HTML or plain text, which we can then load into a web browser or word processor with more predictable results. Suitable engines include xt, xalan, and saxon, but there are many others, each of them working in more or less the same way. We will use saxon in this example.
<?xml-stylesheet type="text/xsl" href="bnchits.xsl"?>but it's not necessary to do so. As we will be experimenting with several different stylesheets, it will be more convenient to specify the stylesheet on the command line.
saxon results.xml bnchits.xsl > results.html
As you have probably realised, you could use XSLT for many things. Here are a couple more examples, each of which was made by running this sample XML listing file through the XSLT stylesheet specified: