MINIMIZE(1l) MISC. REFERENCE MANUAL PAGES MINIMIZE(1l) NAME minimize - minimize CDIF tagging and rearrange problematic elements SYNOPSIS minimize [-cEedhlmnrs] [[-fz] -o suffix] [-v cdif_version] [CDIF_file...] DESCRIPTION minimize takes as its input one or more valid CDIF files, or a valid CDIF text from standard input, and sends to its standard output, or to new files, reformatted versions of its input with minimized tagging (that is, with end tags supplied even where the DTD says that they may be omitted). Minimization is achieved by passing the input files through the sgmls program, and post-processing the output to re- create a valid CDIF document. Optionally, certain ``problematic'' elements are moved to positions where they are not likely to cause problems during during subsequent marking of segments. Such elements are those which may themselves contain one or more segments, but which are allowed by the DTD to occur at positions which may turn out to be in the middle of a segment. The set of such elements currently consists of , and . An empty tag is left at the original site of each moved element. The t attribute of the has the same value as the id attribute of the moved element, which is repositioned at a ``safe'' point, such as after the end of the paragraph origionally containing the element. The program uses a conservative heuristic to decide whether to move an element: if a ``problematic'' element appears directly inside an element which may not itself contain seg- ments (except as elements of subordinate elements), it is not moved. If it appears before any other content (apart from white-space not considered to be content) in an element which may contain segments, it is not moved. If it appears anywhere else - including inside an element which is itself being moved - it is moved. The program is also conservative about where to position moved elements: typical sites are after the end of a

, or before the end of a . Each text is reformatted so as to contain no blank lines, no runs of spaces, and no leading or trailing space on any line. Optionally, entity references can be modified so that they are always terminated by a semi-colon, even where SGML syntax does not require an explicit terminator; or so that they are only terminated by a semi-colon where SGML requires an explicit terminator. Blocks of text are not re-wrapped, however. Rules stating which start and end tags should be preceeded or followed by a new-line are also applied. Sun Release 4.Last change: TGCW41: 2 February, 1993 1 MINIMIZE(1l) MISC. REFERENCE MANUAL PAGES MINIMIZE(1l) Under the -r option, an attempt is made to eliminate as many ... pairs as possible by assigning to the attribute list of a parent element the r (rendition) attribute of any element which is the only content of that parent ele- ment. Thus, for example,

Some text.

would become

Some text.

Apparently un-highlighted punctuation is also folded. Thus

&bquo;He's got it coming to him!&equo;

becomes

&bquo;He's got it coming to him!&equo;

This is a consequence of an apparent inability on the part of OCR systems to recognise highlighted punctuation. In a similar manner, the -l option results in the rewriting of