Date: Mon, 20 Jan 92 16:24:24 MST From: Terry Langendoen Organization: Dept Linguistics, U Arizona, Tucson AZ 85721 Subject: possible analysis of new BNC tagset To: Lou Burnard cc: Steven Zepp , Shaun O'Connor Sender: LANGENDT@edu.arizona.ccit.arizvm1 [This is BNC document TGDW09 -- DD] Here is a straight alphabetic listing of the proposed noncompound BNC form-class tags, including the punctuation tags, followed by a listing of the compound (disjunctive) tags. Following that is a proposed analysis of the tagset using the current TEI version of linguistic markup. It is written in a form that should make it straightforward both to figure out an appropriate feature-structure declaration (FSD) and also how to construct the actual feature-structures themselves. AJ0 adjective (unmarked) (e.g. GOOD, OLD) AJC comparative adjective (e.g. BETTER, OLDER) AJS superlative adjective (e.g. BEST, OLDEST) AT0 article (e.g. THE, A, AN) AV0 adverb (unmarked) (e.g. OFTEN, WELL, LONGER, FURTHEST) AVP adverb particle (e.g. UP, OFF, OUT) AVQ wh-adverb (e.g. WHEN, HOW, WHY) CJC coordinating conjunction (e.g. AND, OR) CJS subordinating conjunction (e.g. ALTHOUGH, WHEN) CJT the conjunction THAT CRD cardinal numeral (e.g. 3, FIFTY-FIVE, 6609) (excluding ONE) DPO possessive determiner form (e.g. YOUR, THEIR) DT0 general determiner (e.g. THESE, SOME) DTQ wh-determiner (e.g. WHOSE, WHICH) EX0 existential THERE ITJ interjection or other isolate (e.g. OH, YES, MHM) NN0 noun (neutral for number) (e.g. AIRCRAFT, DATA) NN1 singular noun (e.g. PENCIL, GOOSE) NN2 plural noun (e.g. PENCILS, GEESE) NP1 proper noun (e.g. LONDON, MICHAEL, MARS) ONE the word ONE (including numeral and non-numeral uses) ORD ordinal (e.g. SIXTH, 77TH, LAST) PNI indefinite pronoun (e.g. NONE, EVERYTHING) PNP personal pronoun (e.g. YOU, THEM, OURS) PNQ wh-pronoun (e.g. WHO, WHOEVER) PNX reflexive pronoun (e.g. ITSELF, OURSELVES) POS the possessive (genitive) morpheme 'S or ' PRO the preposition OF PRP preposition (except for OF) (e.g. FOR, ABOVE, TO) PUL left bracket (i.e. ( or [ ) PUN punctuation mark - normal (i.e. . ! , : ; - ? ... ) PUQ quotation mark (i.e. `' " " ) PUR right bracket (i.e. ) or ] ) TO0 infinitive marker (i.e. TO) UNC "unclassified" items which are not words of the English lexicon or do not belong to any recognized category. E.g.: formulae, such as XX61; foreign words; BOTH when correlative with AND; etc. VBB the base forms of the verb "BE", i.e. BE, AM, ARE (exc infinitive) VBD past form of the verb "BE", i.e. WAS, WERE VBG -ing form of the verb "BE", i.e. BEING VBI infinitive of the verb 'BE' VBN past participle of the verb "BE", i.e. BEEN VBZ -s form of the verb "BE", i.e. IS, 'S VDB base form of the verb "DO", i.e. DO (exc infinitive) VDD past form of the verb "DO", i.e. DID VDG -ing form of the verb "DO", i.e. DOING VDI infinitive of the verb 'DO' VDN past participle of the verb "DO", i.e. DONE VDZ -s form of the verb "DO", i.e. DOES VHB base form of the verb "HAVE", i.e. HAVE (exc infinitive) VHD past tense form of the verb "HAVE", i.e. HAD, 'D VHG -ing form of the verb "HAVE", i.e. HAVING VHI infinitive of the verb 'HAVE' VHN past participle of the verb "HAVE", i.e. HAD VHZ -s form of the verb "HAVE", i.e. HAS, 'S VM0 modal auxiliary verb (e.g. CAN, COULD, WILL, 'LL) VVB base form of lexical verb (e.g. TAKE, LIVE) (exc infinitive) VVD past tense form of lexical verb (e.g. TOOK, LIVED) VVG -ing form of lexical verb (e.g. TAKING, LIVING) VVI infinitive of lexical verb VVN past participle form of lexical verb (e.g. TAKEN, LIVED) VVZ -s form of lexical verb (e.g. TAKES, LIVES) XX0 the negative NOT or N'T ZZ0 alphabetical symbol (e.g. A, B, c, d) AJ0-AV0 & AV0-AJ0 adjective or adverb AJ0-NN1 & NN1-AJ0 adjective or singular common noun AJ0-VVD & VVD-AJ0 adjective or past tense verb AJ0-VVG & VVG-AJ0 adjective or -ing form of the verb AJ0-VVN & VVN-AJ0 adjective or past participle AVP-PRP & PRP-AVP adverb particle or preposition CJS-PRP & PRP-CJS subordinating conjunction or preposition NN1-NP1 & NP1-NN1 singular common noun or proper noun NN1-VVG & VVG-NN1 singular common noun or -ing form of the verb VVD-VVN & VVN-VVD past tense verb or past participle We can use as the main feature for dividing the simple tags into classes. The possible values for this feature are as follows. When forms with the given value are not further subdivided according to the BNS tagging scheme, their corresponding tag name is indicated. adjective adverb alphabetic-symbol (ZZ0) article (AT0) cardinal (CRD) conjunction determiner existential (EX0) infinitive-marker (TO0) interjection (ITJ) negative (XX0) noun one (ONE) ordinal (ORD) possessive (POS) preposition pronoun punctuation verb no.claim (UNC) If value=adjective, then is defined, with the following values: comparative (AJC) superlative (AJS) no.claim (AJ0) If value=adverb then is defined, with the following values: plus (AVQ) minus If , then is defined, with the following values: plus (AVP) minus (AV0) If value=conjunction then is defined, with the following values: plus (CJT) minus If , then is defined, with the following values: coordinating (CJC) subordinating (CJS) If value=determiner, then is defined, with the following values: plus (DTQ) minus If , then is defined, with the following values: plus (DPO) minus (DT0) If value=noun, then is defined, with the following values: plus (NP1) minus If , then is defined, with the following values: singular (NN1) plural (NN2) no.claim (NN0) If value=preposition, then is defined, with the following values: plus (PRO) minus (PRP) If value=pronoun, then is defined, with the following values: plus (PNQ) minus If , then is defined, with the following values: indefinite (PNI) personal (PNP) reflexive (PNX) If value=punctuation, then is defined, with the following values: left-bracket (PUL) right-bracket (PUR) quotation (PUQ) normal (PUN) If value=verb, then is defined, with the following values: be do have lexical modal (VMO) If value=be, then is defined, with the following values: base (VBB) infinitive (VBI) s (VBS) past (VBD) past-participle (VBN) ing (VBG) If value=do, then is defined, with the following values: base (VDB) infinitive (VDI) s (VDS) past (VDD) past-participle (VDN) ing (VDG) If value=have, then is defined, with the following values: base (VHB) infinitive (VHI) s (VHS) past (VHD) past-participle (VHN) ing (VHG) If value=lexical, then is defined, with the following values: base (VVB) infinitive (VVI) s (VVS) past (VVD) past-participle (VVN) ing (VVG) For example, we can consider the tag VVD to be equivalent to: Regarding the compound tags, if we put aside the matter of "likelihood", they can be defined as elements. For example, VVD-VVN can be defined as: The VVN-VVD tag can be defined by switching the order of occurrence of the contained s, but something else is needed to make explicit the notion of "likelihood". Postscript on ONE. It seems to me to be preferable not to have a special tag for the form "one" and a corresponding exclusion for applicability of the CRD tag (and also for the other classes that "one" can belong to (e.g. PNI and NN1)). Rather, if one is unsure of the analysis of ONE, one can assign it a compound tag, depending on the uncertainty involved.