MINIMIZE(1l) MISC. REFERENCE MANUAL PAGES MINIMIZE(1l)
NAME
minimize - minimize CDIF tagging and rearrange problematic
elements
SYNOPSIS
minimize [-cEedhlmnrs] [[-fz] -o suffix] [-v cdif_version]
[CDIF_file...]
DESCRIPTION
minimize takes as its input one or more valid CDIF files, or
a valid CDIF text from standard input, and sends to its
standard output, or to new files, reformatted versions of
its input with minimized tagging (that is, with end tags
supplied even where the DTD says that they may be omitted).
Minimization is achieved by passing the input files through
the sgmls program, and post-processing the output to re-
create a valid CDIF document.
Optionally, certain ``problematic'' elements are moved to
positions where they are not likely to cause problems during
during subsequent marking of segments. Such elements are
those which may themselves contain one or more segments, but
which are allowed by the DTD to occur at positions which may
turn out to be in the middle of a segment. The set of such
elements currently consists of
, and
. An empty tag is left at the original site of
each moved element. The t attribute of the has the
same value as the id attribute of the moved element, which
is repositioned at a ``safe'' point, such as after the end
of the paragraph origionally containing the element.
The program uses a conservative heuristic to decide whether
to move an element: if a ``problematic'' element appears
directly inside an element which may not itself contain seg-
ments (except as elements of subordinate elements), it is
not moved. If it appears before any other content (apart
from white-space not considered to be content) in an element
which may contain segments, it is not moved. If it appears
anywhere else - including inside an element which is itself
being moved - it is moved. The program is also conservative
about where to position moved elements: typical sites are
after the end of a
, or before the end of a .
Each text is reformatted so as to contain no blank lines, no
runs of spaces, and no leading or trailing space on any
line. Optionally, entity references can be modified so that
they are always terminated by a semi-colon, even where SGML
syntax does not require an explicit terminator; or so that
they are only terminated by a semi-colon where SGML requires
an explicit terminator. Blocks of text are not re-wrapped,
however. Rules stating which start and end tags should be
preceeded or followed by a new-line are also applied.
Sun Release 4.Last change: TGCW41: 2 February, 1993 1
MINIMIZE(1l) MISC. REFERENCE MANUAL PAGES MINIMIZE(1l)
Under the -r option, an attempt is made to eliminate as many
... pairs as possible by assigning to the attribute
list of a parent element the r (rendition) attribute of any
element which is the only content of that parent ele-
ment. Thus, for example,
Some text.
would become
Some text.
Apparently un-highlighted punctuation is also folded. Thus
&bquo;He's got it coming to him!&equo;
becomes
&bquo;He's got it coming to him!&equo;
This is a consequence of an apparent inability on the part
of OCR systems to recognise highlighted punctuation.
In a similar manner, the -l option results in the rewriting
of