OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) NAME overnight - performs overnight housekeeping for corpus files SYNOPSIS overnight [-aclnu ] [-D database] [-d directory] [-m user] DESCRIPTION Overnight is intended to be run automatically by the user natcorp towards the end of each working day (say, at 23:00), although it can be run interactively by other users at other times. The program reviews the files found in corpus- related directories, checking - Whether they should be there at all; - Whether their ownership and permissions are correct; - Whether they should be archived; - Whether they supersede other files, which can conse- quently be compressed or deleted; and - Whether they should be forwarded to Lancaster for further processing. The actual state of the corpus directories is compared with the status of corpus texts as recorded in the BNC database, and arrangements are made to update the database so that it correctly reflects disk contents. Five output items are produced: four are perl programs which, when run by users with adequate permisions: - Amend permissions and ownerships of valid corpus files; delete files which are clearly of no further use (old editor back-ups, core dumps, and corpus work files superseded by later versions); and touch files identi- fied as needing further work; - Queue corpus files for archiving on the VAX, while observing the disk space limitations of the natcorp account on that system; and compress superseded corpus files; - Copy minimized versions of new B_ files and related Z_ (documentation) files to Lancaster; and - Update the database. These programs embody lists of affected files, which can be examined if necessary. Sun Release 4.1Last change: TGCW50 - 12 May, 1993 1 OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) The fifth output goes to the user's terminal if the program is being run interactively, or is mailed to the user if not. (See also the -m option.) It lists files requiring user intervention. These include files with names which do not look as though they belong in the corpus directories; and newly-bounced texts. In most cases, no action is taken by the program on any file until a grace period (usually of two working days) has elapsed since the file was last changed. Thus, work files and directories with arbitrary names can exist indefinitely, provided that their contents change frequently: it is only when they stop changing that the program takes an interest. Similarly, newly-created B_, C_ and E_ files are not acted upon by the program until after the grace period has elapsed. This allows the creator of a file to have second thoughts and amend its contents before its presence is recorded in the database and other appropriate follow-up actions (such as archiving or forwarding to Lancaster) are taken. B_ files are forwarded only if they can successfully be minimized - that is, if they parse as valid CDIF, and satisfy a number of additional checks on character set and (lack of) line-end hyphenation. The documentation for minimize gives further details. If minimize fails, mail listing the errors is sent to the person responsible for the file in question. Arrangements are made to touch the file, so as to give that person a grace period in which to correct the problem before overnight next attempts to forward it. An attempt is made to minimize each C_ file on receipt. If this succeeds, a minimized D_ working file is created; if it does not, mail is sent to the person responsible for the file. No further attempt is made to create a D_ file: this must be done by hand if the automatic procedure fails. In order to conserve disk space on the BNC Suns, overnight arranges that no more than two ``archiveable'' versions of a text (that is, dot, B_, C_, E_ and F_ files), possibly together with a ``working'' version (A_ or D_), are kept on-line. Of these files, the earliest is compressed. (Or, rather, zipped using the gzip utility.) OPTIONS -a Send files to VAX for archiving at once, rather than producing a script which will do the job later. -c Clean up the corpus directories at once, rather than producing a script that will do the job later. -Ddatabase Sun Release 4.1Last change: TGCW50 - 12 May, 1993 2 OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) Use database instead of the default database, bnc. -ddirectory Deposit the generated scripts in directory, rather than the default directory. (See FILES below.) -l Send files to Lancaster at once, rather than producing a script which will do the job later. -muser Send report(s) produced as mail to user, rather than to standard output (if standard output is a terminal) or as mail to the user running the program (if standard output is not a terminal). -n Arrange that generated scripts do not carry out any actions when run. (For debugging.) -u Update the database at once, rather than producing a script that will do the job later. ENVIRONMENT CORPPATH If defined, over-rides the value given in ~natcorp/.fileids. (See FILES below.) FILES ~natcorp/.fileids Used to set up corppath, a list of top-level direc- tories under which corpus files may be found without traversing symbolic links. ~natcorp/overnight The directory in which generated files are deposited. May be over-ridden with the -d switch. archiveyymmdd Perl script which, when run, will copy files to the VAX in preparation for archiving, and will compress files identified as having been superseded. cleanupyymmdd Perl script which, when run with root privileges, will correct permissions and ownerships of corpus files, remove old files from corpus directories, and touch B_ files which have failed to minimize. forwardyymmdd Perl script which, when run by natcorp, sends files to Lancaster. updatesyymmdd Sun Release 4.1Last change: TGCW50 - 12 May, 1993 3 OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) Perl script which, when run against the BNC database, will update it according to status changes found by the program. ~natcorp/bin/overnight The program itself. ~natcorp/perl/overnightArchive Program which does the work involved in archiving corpus files. ~natcorp/perl/overnightCleanup Program which does the work involved in cleaning up corpus directories. ~natcorp/perl/overnightForward Program which does the work involved in forwarding files to Lancaster. ~natcorp/perl/overnightUpdate ~natcorp/perl/updateSubs ~natcorp/perl/updateData Program which does the work involved in updating the database, plus associated con- trol data AUTHOR Dominic Dunlop SEE ALSO perl(1), TGCW35 - Corpus Directory Structure and File Names, TGCW36 - The new BNC database, TCGW41 - minimize(1l), gzip(1l), touch(1). DIAGNOSTICS Diagnostics, which are intended to be self-explanatory, are sent to standard error. (They are never explicitly redirected to a mailer, even if standard output is so redirected. However, if the program is run by cron, that program will arrange that diagnostics are mailed to the user.) NOTES The program is table-driven. The control tables may be found at the end of the source file. The algorithms used to decide which corpus files to compress or delete presume that a file will have been archived if necessary before it becomes eligible for compression or deletion. If files are processed very rapidly, it is con- ceivable that a file scheduled for archiving may not exist. In such a case, a warning will be produced when the archiv- ing script is run. Sun Release 4.1Last change: TGCW50 - 12 May, 1993 4 OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) BUGS The program does not know when X_ and Y_ files with names that do not correspond to those of any text should be archived. The program has a voracious appetite for memory and proces- sor resources. Sun Release 4.1Last change: TGCW50 - 12 May, 1993 5