From: OXVAX::LOU "Lou Burnard" 20-AUG-1991 12:53:37.57 To: NATCORP CC: LOU Subj: Queries for meeting of TGA on Spoken texts 15 Aug 91 1. Situational classification might need some more thought. What about semi-formal but unscripted situations such as seminars or discussions, interviews, phone-ins, press conferences etc. Are these excluded from this sample but gathered elsewhere? 2. Presumably the genre "on bus/train" is meant to include conversations at say bus stops and airport terminals too. What about conversations recorded in cars or taxis? with other passengers? with the driver? 3. Should monologue and dialogue be distinguished? 4. A sense of general uneasiness came over me as I scanned the questions by which the context of each conversation is to be described. This is *not* a market research exercise! Wouldn't a simple narrative in which people could briefly set the conversation in its context be likely to elicit more useful information with substantantially less effort? CGA suggests as an example: "I was walking along the street coming back from the shops and I met my friend Mary who lives on the other side of town pushing her twins in the pram. I hadn't seen her since she came out of hospital, though I had met her husband at a party." This tells us a lot more about the shared knowledge of participants and what sort of relevancies they have in starting talk. In the example provided, a narrative description might (for instance) tell us something of who Derek, Christine and Graham are - it won't tell us everything, but it may make the talk slightly less opaque. Moreover, the sort of sociological description proposed (class/education etc., not to mention occupation or full name) is just not going to be available for talk with strangers in the street or shop assistants. About all that people will be able to reliably state in such cases will be age, sex and accent (and that using rather impressionistic categories). Of course, in some cases people who have been tape-recorded will willingly supply such information when their consent is being obtained. But it seems rather a lot to expect of those carrying the microphone to get such detail out of everyone. My guess is that they won't. Will we then discard the recorded material, What about conversations overheard or barely participated in? 5. Some way must be found of identifying each different participant uniquely, so that conversations with well-known individuals (partners, colleagues, milkmen etc) can be identified. The kind of 'cast list' proposed by the TEI WG AI2 for inclusion in the document header seems relevant here. 6. Several people have questioned the wisdom of timing pauses etc. in seconds rather than some metric related to the individual's speech rhythm (some people do speak v-e-r-y sl-ow- ly!). If this is felt to be too difficult or impressionistic, it does seem that 5 seconds is really quite a long time for the shortest pause, in most cases. There is a lot of difference between a pause of 1-2 seconds and one of 5 seconds (in the latter, we effectively have a suspension of the talk, while in the former, a speaker can be "hanging on"), just as there is a lot of difference between 1-2 seconds of unclear speech and 5 seconds - the latter is long enough for almost anything to happen (try blanking out chunks with these lengths at intervals and then trying to follow the gist!). CGA proposes: count seconds from 2 up to 5, then have bands of 5-10, 10-20, 20-60sec, then do the same with minutes (count up to 5, bands 5-10, 10-20, 20-60). 7. Overlap. How will the proposed method cater for overlaps involving 3 or more speakers? Strongly recommend attention to the TEI proposals in this context, or the simpler version of them proposed by Jane Edwards. 8. How is uncertainty as to speaker to be described? What if the speaker is either A or B but definitely not C? What if the speaker is completely unknown? 9. For purposes of anonymity, we will probable need to replace surnames as well as addresses or phone numbers and maybe even phrases such as ("who works in the dictionary department at Longman"). 10. Will it be possible to distinguish participants who were aware of the recording at the time it was made from those who were not? 11. The example is inconsistent as to the significance of the hyphen. It seems to indicate restart or interruption, irrespective of whether or not this occurs in the middle of a word, though the text claims that only the latter case applies. Some way of indicating truncated words which is independent of whether or not they indicate a restart or interruption might be useful. 12. There is a handy check list of features of spoken text in the current AI2 working paper. Many of these are addressed by the current proposals, but some are not. Someone should check these systematically.