TALC 96

Lancaster University 9-12 August 1996

Claire Warwick, Oxford University Computing Services


Introduction

Following the very successful conference on Teaching and Language Corpora held at Lancaster University in 1994, a second TALC was also held at Lancaster, from the ninth to the twelfth of August 1996. The conference's chief focus was on practical applications of language corpora, particularly in the field of English language teaching. Papers given at the conference not only discussed the creation and use of corpora to devise and inform teaching materials, but also the direct exploration of corpora by students.

Participants

There were about a hundred delegates at the conference, about half of whom were from the UK. The majority of the other delegates came from Europe, with Italy being particularly well represented. There were also five delegates from Hong Kong, but only a very few from the USA.

Those attending the conference can be divided into two broad areas of specialism and interest: teachers of English as a Foreign Language, or Translation Studies, and academics working on corpus linguistics within university Linguistics Departments (There was of course a certain amount of overlap between the two categories). A few of the participants were also researchers into natural language processing (NLP)

Papers Presented

Papers given at the conference covered many different aspects of the use of corpora in teaching, but the sessions covered the following broad themes:

The following is a brief summary of papers which we attended.

Stephen Magee and Michael Rundell (St. Andrews) gave a paper entitledThe Role of the Corpus based `Phrasicon' in English Language teaching. They used the spoken part of the BNC to investigate the way in which native speakers express indirectness and understatement, concentrating on a discussion of the phrase 'not exactly'. They investigated ways in which this can be used to teach foreign students an otherwise complex form of idiomatic speech presenting both cultural and linguistic difficulties for non-native speakers. For example, non-native speakers may not understand why native speakers will think it unacceptably rude to say to the boss 'You were drunk last night' as opposed to 'you were not exactly sober'.

Lynne Flowerdew, (Hong Kong University of Science and Technology) gave a paper titled CALL Materials Derived from Integrating 'Expert' and 'Interlanguage' Corpora Findings which described how she had compared one of the mini corpora supplied with the Microconcord package, with a corpus she had created of English written by her students. She then used this to detect common problems they had encountered with learning written English.

Francine Roussel (University of Nancy) discussed Multilingual concordance-based exercise types. She had constructed exercises to teach French to foreign learners using Multiconcord software to create parallel corpora of English and French written language.

In his paper Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests, David Coniam, (Chinese University of Hong Kong) described how he had used word frequency data derived from the Bank of English corpus to generate multiple choice tests for students of English automatically.

In Parallel Texts in Language Teaching, Michael Barlow, (Rice) demonstrated the use of his parallel text analyzing software, Paraconc, and its use as a resource for teaching EFL.

In their paper Corpora and Terminology: Software for the Translation Programme at Goteborg University, Pernilla Danielson and Daniel Ridings, (Goteborg) discussed the Pedant project, and how they had developed a parallel concordance, as part of their department's involvement in the PAROLE project. This concordance will be used to teach translation studies and will concentrate primarily on business and technical language.

In her paper Parallel and Comparable Bilingual Corpora in Language Teaching and Learning, Carol Peters, (CNR, Pisa) described a new project designed to align parallel corpora. Unlike other alignment software, this programme uses a lexicon to make linguistic judgements about how to match words. Although the project is still at the early stages of development this appeared to be an exciting development from an already very well-respected team.

Tony McEnery (Lancaster) gave a paper about Cybertutor. This is a programme to teach English grammar to first year students of linguistics. One group of students had been taught by traditional methods and their test results compared to another group which had been taught solely by computer using Cybertutor. Students taught by computer have achieved significantly better results. Lancaster therefore hope that those who provide funding will not think that human teachers are no longer needed.

Glyn Holmes, (Western Ontario) described An Experiment in the Learning of French through Corpus Linguistics. He had used various corpora of written and spoken french to teach grammar to his students. Each student had been asked to research a specific grammatical problem using the corpus, and had then presented their findings to the rest of the students. This had been a very valuable and enjoyable experience for all concerned, although the results in examinations have not yet improved.

A. Berber Sardinha (Liverpool) described A Corpus for Teaching Portuguese He has constructed his own corpus of Brazilian Media Portuguese, which contains data from both written and spoken media. He uses this to teach not only business, but also colloquial Portuguese to adult learners.

In her paper Research into the Functions of Particles in a Corpus, Marta Fernandez-Villaneuva (Barcelona) described how a corpus of German language had been used to teach advanced students how to deal with particles.

In Encouraging Students to Explore Language and Culture in Early Modern English Pamphlets, Josef Schmied, (Chemnitz) discussed the Lampeter Corpus of seventeenth and eighteenth century pamphlets and its use in teaching historical linguistics. This is a particularly important century in the development of the English language, and the collection comprising 120 texts, (12 from each decade) allows students to investigate language change using the corpus to provide examples.

In his paper The Ideology of Science as a Collocation: how Corpus Linguistics can Expand the Boundaries of Genre Analysis Chris Gledhill, (Aston) described how he has created a corpus of scientific articles about cancer research. He has used this to investigate the nature of scientific language, both in terms of its linguistic content and the nature the ideology encoded by its language.

In his paper Corpora, Genre Analysis and Dissertation Writing: An Evaluation of the Potential of Corpus-Based Techniques in the Study of Academic Writing, Chris Carne, (Reading) discussed the use of the Reading Academic Corpus in the study of Academic writing. This should enable students to improve dissertation writing, by studying features of good academic texts in the corpus.

In his paper Investigating Grounding Across Narrative and Oral Discourse with Students, Tony Jappy (Perpignan) discussed ways in which French students could investigate the relatively complex concept of 'grounding', that is the way in which a narrative text tends to contain a contrast between background and foreground events expressed in different verbal forms. The corpus will consist of different types of narrative, including pure and mixed narrative and dialogue.

Roberta Facchinetti (Verona) discussed The exploration of English diachronic corpora by foreign language students. She described how she has used diachronic corpora to teach students of the history of the English Language who are not native speakers. She compared the relative usefulness to students of the Helsinki Corpus of English Texts and The Dream of the Rood.

Paul Bowden and Mark Edwards (Nottingham Trent) presented a paper on Knowledge extraction from corpora for pedagogical applicationswhich was a highly technical discussion of natural language processing and seemed to have been presented to the wrong conference.

In her presentation Using Authentic Corpora and Language Tools for Adult-Centered Learning, Mary-Ellen Okurowski (US Defense Department) described the use of Oleada, a computer based system to teach adult learners of ten different languages. Her presentation was impressive in style, but told us very little about what the software could actually do.

Sylviane Granger, (Université Catholique de Louvain) gave a paper on Exploiting Learner Corpus Data in the Classroom: Form Focused Instruction and Data Driven Learning. She compared her corpus, containing the writing of native English speakers, with that of language learners. This enabled her to identify the most common mistakes made by those learning English as a second language.

In their paper Approaching the Assessment of Performance Unit Archive of Schoolchildren's Writing from the Point of View of Corpus Linguistics M. Shimazumi & A Berber Sardinha, (Liverpool) discussed a corpus of children's writing which they had collected. They then compared the frequencies of common words in this corpus to those in the BNC and in the Guardian.

Teaching L1 and L2 composition in a multicultural environment, Robert Faingold, (Tulsa). This paper compared the writing of native and non-native speakers of Spanish.

Guy Aston, (Bologna) and Lou Burnard, (OUCS) gave a joint presentation on The British National Corpus in which they discussed its usefulness as a language learner resource and gave An Introduction to Retrieval from the BNC Using Sara..They gave a demonstration of the use of SARA to access the BNC and discussed the main features of its functionality. Guy then described the way in which he had tested SARA to teach English to a group of advanced language learners. The presentation was very well received and generated a great deal of discussion.

In her paper Teaching Terminology Using Corpora, Jennifer Pearson, (Dublin City) described the use of Language corpora to teach translation studies. The area she particularly needs information about is that of specialist terminology. To this end she has constructed a concordance with a parallel as well an equivalent component of French and English which she is then able to use in teaching student translators. She is hoping to use BNC for this in future, but has not yet been able to have it installed on her university network.

In her paper A Textual Clues Approach for Generating Metaphors as Explanations by an Intelligent Tutoring System, V. Prince (LIMSI-CNRS) described her work in Natural Language Processing. She is conducting an enquiry in to the way in which metaphor and analogy may be recognised by a computer, using the patterns within which they occur in language. In order to do this, she has used a small corpus of French articles to collect her data.

Designing a CALL System Using Corpora for Speakers of Cantonese, was a demonstration given by John Milton, (City University Hong Kong) of the software which he has designed to help Hong Kong students to learn English. He has used a comparable corpus of examination scripts produced by native speakers and Hong Kong students to determine which mistakes are most often made by non-native speakers.

In his paper Marrying VERBALIST to concordance data, John Higgins, (Stirling) described how he has used part of the Cobuild corpus to create a search tool for verb forms. He has created his own lexicon of verbs using the Mitton dictionary from the OTA as a resource and attaching a front end of his own design called the Stirling Search Tool.

Marina Dossena, (Bergamo) discussed the problem of Evaluating Corpora - are we Asking the Right Questions? She stressed that if students and teachers were to use various different corpora they must be aware of the criteria by which they may be compared in terms of their function and the type of information to be extracted from them. She introduced a form used by her students to conduct this evaluation. She also discussed the relative usefulness of different engines for concordancing and searching corpora, and of the World Wide Web.

Ourania Hatzidaki, (Birmingham) gave a paper on Corpus Linguistics as an Academic Subject, in which she asked why there are so few academic courses on corpus linguistics, despite its popularity amongst researchers. She suggested that this may be because of the large technical element in the investigation of corpora, which humanists may find too complex and discouraging. This could be remedied, she felt, if a course in corpus linguistics were designed which would not only teach linguistics, but also technical aspects of the subject. This would train the next generation of corpus linguists and ensure the future of the discipline.

In his paper A Corpus Based Description of Headline Grammar, John Morley (Sienna ), posed the important question 'Is France Hexagonal' which he answered with 'Well, it depends' He then described the creation of what he felt was probably the smallest corpus being discussed at TALC. He has built a 400,000 word corpus of newspaper headlines, and used it to teach the ways in which their grammar differed from that of standard English

Conclusion

My conclusion will necessarily be a highly subjective one, since I am basing it on first impressions of the discipline of Corpus Linguistics at work, rather than on personal knowledge.

It appeared that there is a feeling within the discipline that Corpus Linguistics is something of a closed community with a closed discourse, and this assumption informed the proceedings of the conference. Despite this, most speakers seemed to avoid simply preaching to the converted, or becoming embattled and defensive. Indeed many papers communicated a real sense of enthusiasm and excitement engendered by the possibilities of finding new uses for corpora.

The EFL teachers, who were probably in the majority at the conference, were keen to use corpora to detect common errors made by their students. They were enthusiastic about the potential use of corpora to improve both their performance as teachers, and that of their students.

There also seemed to be a desire amongst corpus linguists at once to promote Corpus Linguistics, and to ensure that the subject was correctly taught. There was a feeling that the integrity of the subject should be preserved and not compromised by non-expert users of corpora. This paradoxical view was particularly clearly expressed in the paper given by Ourania Hatzidaki.

The conference was very useful to me personally, not only as an opportunity to learn about a field which is new to me, but also to meet one of the communities of potential users of the BNC. This enabled me to discover what demands might be made of it, and what diverse uses academics in this particular field might put it to. It also, of course, allowed us to promote the BNC while at the same time introducing it to an even wider community of users.