tag if they
are not.
\(\(]*>\)\|[_^J]\)*\(_s. The regular expression detects
s
immediately followed by another
or by a
tag, possibly with intervening
elements.
[_^J]*[a-z]
_S_p_u_r_i_o_u_s <_p>_s? This is a simple check for para-
graphs starting with a lower-case letter, a cir-
cumstance which may be legal in some cases.
[_^J]*[^<]
<_p>_s _m_i_s_s_i_n_g _a_f_t_e_r ? is used in
CDIF only for block quotations. As such, a new
paragraph may generally be expected to start
immediately a has finished. This need not
always be true, however, and must be checked.
Sun Release 4.1Last change: TGCW54: 10 August, 1993 5
OUPCONV(1l) MISC. REFERENCE MANUAL PAGES OUPCONV(1l)
<[^>]+_r=\w\w\w
_U_n_r_e_c_o_g_n_i_z_e_d _r_e_n_d_i_t_i_o_n (_r=) _v_a_l_u_e_s _p_r_e_s_e_n_t. The
test fails to catch rendition values with bad
two-character names.
-_s _p_r_e_c_e_d_e_d _b_y _h_y_p_h_e_n_s _p_r_e_s_e_n_t. As a point of
style, check-out is
preferable to check-out as, in
the latter, the content of the element is
something that would generally be regarded as a
whole word, whereas, in the former, it is not.
However, fix-ups of this type are generally too
time-consuming to be worthwhile.
ele-
ments, as part of the transduction process. Those
which escape this net should be examined to see if
they too should be replaced with s.
_s. This warning appears for any text
which, after transduction, contains ele-
ments. Some of these may be spurious, in that the
enclosed word represents a valid British spelling;
and some may be candidates for rewriting as
s, where, for example, some intentionally-
used variant form of a word appears in the origi-
nal text.
A number of warnings such as _C_h_e_c_k _d_a_t_e have been omitted
from the list - see CAVEATS below.
OPTIONS
-v The -v option causes each file name to be printed on
standard error immediately before the file is pro-
cessed.
-z The -z option results in the documentation (Z_) file
corresponding to each input file being created. The
first line gives the text name; the second is blank;
and the third gives the date, the login name of the
user running oupconv, and the command line used. The
fourth and subsequent lines carry the warnings produced
by the examination phase.
If the file exists, it is overwritten.
In the absence of the -z option, warnings about the
contents of the file are sent to standard error.
Sun Release 4.1Last change: TGCW54: 10 August, 1993 6
OUPCONV(1l) MISC. REFERENCE MANUAL PAGES OUPCONV(1l)
DIAGNOSTICS
Various error conditions associated with non-existent,
badly-named, or unreadable input files, with input files
which are not valid OUP transfer format documents, and with
output files which cannot be created, result in ``file
skipped'' messages on standard error.
Under the -z option, a warning is given on standard error if
the documentation (Z_) file cannot be created. The main
output file is still written under these circumstances.
More serious error conditions result in immediate termina-
tion with a diagnostic message.
The return status of the program is zero only if no file
warning or error conditions were encountered. Warnings
about file contents generated during the examination phase
do not affect output status.
CAVEATS
The program guesses that a file represents a book if its BNC
name is identical to its OUP name. If the names differ only
in their last character, the file is presumed to represent a
periodical or other non-book material. If the names differ
in more characters than the last, the program does not
express an opinion on the type of the source material. The
guesses determine the contents of the dummy CDIF header, the
values of the attributes of the tag, and whether a
word count greater than 40,000 elicits a comment. The
guesses are not always correct, and should be checked.
The program is unable to pick issue dates or author names
out of OUP prologues because of the inconsistent manner in
which this information is presented.
FILES
/home/natcorp/bin/oupconv
The program itself. It contains both this manual page
(use nroff -man /home/natcorp/bin/oupconv or similar to
print it), and the tables used to define the various
transductions and tests applied to the text.
AUTHOR
Dominic Dunlop
SEE ALSO
perl(1), TGCW04: _E_n_c_o_d_i_n_g _a_n_d _m_a_r_k_u_p _f_o_r _t_h_e _O_x_f_o_r_d _P_i_l_o_t
_C_o_r_p_u_s; TGCW25: _M_a_r_k_u_p _f_o_r _n_o_n-_I_S_O _6_4_6 _i_n_v_a_r_i_a_n_t _p_a_r_t _c_h_a_r_-
_a_c_t_e_r_s; TGCW30: _C_o_r_p_u_s _D_o_c_u_m_e_n_t _I_n_t_e_r_c_h_a_n_g_e _F_o_r_m_a_t, _v_1._2;
TGCW33: _B_N_C _d_a_t_a _c_a_p_t_u_r_e: _O_U_P _f_o_r_m_a_t _d_e_f_i_n_i_t_i_o_n _f_o_r _t_e_x_t
_h_a_n_d_o_v_e_r _t_o _O_U_C_S; TGCW35: _C_o_r_p_u_s _t_e_x_t _p_r_o_c_e_s_s_i_n_g: _d_i_r_e_c_t_o_r_y
_s_t_r_u_c_t_u_r_e _a_n_d _f_i_l_e_n_a_m_e_s.
Sun Release 4.1Last change: TGCW54: 10 August, 1993 7
OUPCONV(1l) MISC. REFERENCE MANUAL PAGES OUPCONV(1l)
BUGS
The transductions and subsequent examinations applied to the
input may not be correct in every case, nor do they cover
every possibility. Let Dominic know if you encounter a cir-
cumstance applying to more than one input file which is han-
dled incorrectly, or not handled at all.
It would be nice if the regular expressions used in the
examination phase where easily available for use in editors,
in order that they could subsequently be used to locate the
source of a problem. Sadly, they are perl-format regular
expressions, which differ subtly but annoyingly from emacs-
and vi-format regular expression, making manual conversion
is necessary. The emacs versions shown above under DESCRIP-
TION are untested approximations.
If confronted with very large individual files (several hun-
dred thousand words), the program may grow very large, and
ultimately run out of memory. In such cases, it will fail
with a message stating that it is out of memory, or that it
is unable to fork. If the failure occurs after oupconv has
processed several files, it may be possible to overcome it
by processing the offending file on its own; if this fails,
the only solution is to break the file into a number of
files with valid OUP prologues by hand, process these
separately, then reconstitute. oupconv could circumvent the
problem by breaking up the file itself, but the problem is
that there is no safe place to break. For example, breaking
ahead of s may prevent spurious s from being
detected. So the work is left to a human.
Even in the normal course of events, oupconv grows rather
large: it seems that perl's regular expression evaluator is
memory-hungry.
Sun Release 4.1Last change: TGCW54: 10 August, 1993 8