OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) NAME overnight - performs overnight housekeeping for corpus files SYNOPSIS overnight [-aclnu ] [-D _d_a_t_a_b_a_s_e] [-d _d_i_r_e_c_t_o_r_y] [-m _u_s_e_r] DESCRIPTION _O_v_e_r_n_i_g_h_t is intended to be run automatically by the user natcorp towards the end of each working day (say, at 23:00), although it can be run interactively by other users at other times. The program reviews the files found in corpus- related directories, checking whether they should be there at all; whether their ownership and permissions are correct; whether they should be archived; and whether they should be forwarded to Lancaster for further processing. The actual state of the corpus directories is compared with the status of corpus texts as recorded in the BNC database, and arrangements are made to update the database so that it correctly reflects disk contents. Five output items are produced: four are perl programs which, when run by users with adequate permisions: - Amend permissions and ownerships of valid corpus files, and delete files which are clearly of no further use (old editor back-ups and core dumps); - Queue corpus files for archiving on the VAX, while observing the disk space limitations of the natcorp account on that system; - Copy new B_ files and related Z_ (documentation) files to Lancaster; and - Update the database. These programs embody lists of affected files, which can be examined if necessary. The fifth output goes to the user's terminal if the program is being run interactively, or is mailed to the user if not. (See also the -m option.) It lists files requiring user intervention. These include files with names which do not look as though they belong in the corpus directories, but which are not old editor back-ups; and newly-bounced texts. In most cases, no action is taken by the program on any file until a grace period (usually of two working days) has elapsed since the file was last changed. Thus, work files and directories with arbitrary names can exist indefinitely, provided that their contents change frequently: it is only when they stop changing that the program takes an interest. Similarly, newly-created B_ and E_ files are not acted upon Sun Release 4.1Last change: TGCW39 - 3 November, 1992 1 OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) by the program until after the grace period has elapsed. This allows the creator of a file to have second thoughts and amend its contents before its presence is recorded in the database and other appropriate follow-up actions (such as archiving forwarding to Lancaster) are taken. OPTIONS -a Send files to VAX for archiving at once, rather than producing a script which will do the job later. -c Clean up the corpus directories at once, rather than producing a script that will do the job later. -D_d_a_t_a_b_a_s_e Use _d_a_t_a_b_a_s_e instead of the default database, bnc. -d_d_i_r_e_c_t_o_r_y Deposit the generated scripts in _d_i_r_e_c_t_o_r_y, rather than the default directory. (See FILES below.) -l Send files to Lancaster at once, rather than producing a script which will do the job later. -m_u_s_e_r Send report(s) produced as mail to _u_s_e_r, rather than to standard output (if standard output is a terminal) or as mail to the user running the program (if standard output is not a terminal). -n Arrange that generated scripts do not carry out any actions when run. (For debugging.) -u Update the database at once, rather than producing a script that will do the job later. ENVIRONMENT CORPPATH If defined, over-rides the value given in ~natcorp/.fileids. (See FILES below.) FILES ~natcorp/.fileids Used to set up corppath, a list of top-level direc- tories under which corpus files may be found without traversing symbolic links. ~natcorp/filenames,~natcorp/names_and_dirs Checked against database contents for omissions and extraneous matter. ~natcorp/overnight The directory in which generated files are deposited. Sun Release 4.1Last change: TGCW39 - 3 November, 1992 2 OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) May be over-ridden with the -d switch. archive_y_y_m_m_d_d _P_e_r_l script which, when run, will copy files to the VAX in preparation for archiving. cleanup_y_y_m_m_d_d _P_e_r_l script which, when run with root privileges, will correct permissions and ownerships of corpus files, and remove old editor back-up files from corpus direc- tories. forward_y_y_m_m_d_d _P_e_r_l script which, when run by natcorp, sends files to Lancaster. updates_y_y_m_m_d_d _P_e_r_l script which, when run against the BNC database, will update it according to status changes found by the program. ~natcorp/bin/overnight The program itself. ~natcorp/perl/overnightArchive Program which does the work involved in archiving corpus files. ~natcorp/perl/overnightCleanup Program which does the work involved in cleaning up corpus directories. ~natcorp/perl/overnightForward Program which does the work involved in forwarding files to Lancaster. ~natcorp/perl/overnightUpdate Program which does the work involved in updating the database. AUTHOR Dominic Dunlop SEE ALSO perl(1), TGCW35 - Corpus Directory Structure and File Names, TGCW36 - The new BNC database. DIAGNOSTICS Diagnostics, which are intended to be self-explanatory, are sent to standard error. (They are never explicitly redirected to a mailer, even if standard output is so redirected. However, if the program is run by cron, that Sun Release 4.1Last change: TGCW39 - 3 November, 1992 3 OVERNIGHT(8l) MISC. REFERENCE MANUAL PAGES OVERNIGHT(8l) program will arrange that diagnostics are mailed to the user.) BUGS The current version does not preen corpus directories or take any action on OUP classification files. X_ and Y_ files with names that do not correspond to those of any text are currently reported as suspicious, and the program does not know if or when they should be archived. This program has a voracious appetite for memory and proces- sor resources. Sun Release 4.1Last change: TGCW39 - 3 November, 1992 4