Applied Linguistics Series

The Ideal Dictionary - a Utopian Reality?


Ciarán P. McCarthy
ciaran@mindless.com
Director of Studies at The Salesian English Language Centre, Celbridge, Co. Kildare, Ireland.

Introduction

"Words are the common currency of our communication. We use them, exchange them, discard them. We draw on a rich treasury of words, gathered up over the centuries and millennia of human experience; accumulated, adapted, applied and transformed in use. Too often we speak and write them without much regard for what they are, or how the should be used." (O'Donnell 1990: ix ).

In the first section of this paper, we shall look at four of the major pedagogical English dictionaries available, and note some of the quasi-ideal features of each, and, hopefully, find the core features of this utopian ideal dictionary. Then, we shall consider some theoretical and practical problems in the following sections. I initially thought that a paper of this nature would be easily put together - after all, all that is really needed is, to cut and paste the best bits of all the dictionaries together and voilà an ideal dictionary. This is true to some extent, however, as Crystal (1986: 72) notes: "the problem facing lexicography as a branch of applied linguistics... [is] how to predict the performance limitations which constrain both parties to the enterprise -lexicographer and user - and to resolve them, so that we obtain an ideal product, which satisfies everyone's criteria at a minimal cost in effort, time and money".

Pedagogical Dictionaries

The Cambridge International Dictionary of English (CIDE) proudly boasts about its "Guide Words which help to distinguish immediately between the different senses of the same word", "a strictly controlled Defining Vocabulary of 2000 words [that] ensures that the 50,000 definitions are within the ability of the learner" and a "unique Phrase Index [that] gives instant access to the 30,000 phrases, idioms and compounds in the dictionary", which, in my opinion is a most useful asset. Also, of interest to the learner are the "clear, simple grammatical codes", "the 45,000 collocations highlighted" and "Unique False Friend information for 16 languages", (Cambridge ELT Catalogue 1996: 21). With such a large number of collocations cited, it seems unthinkable that this work has been compiled without the use of a substantial corpus, though nothing in the literature even remotely suggests one has been used.

The Oxford Advanced Learner's Dictionary (OALD) claims to "ensure students gain the most complete and accurate picture of today's English", giving "a dictionary that is even more student friendly - more authoritative, more relevant and even easier to use". Corpora have just been used to collate the OALD's 5th edition. 100 million words from the British National Corpus have been used, as well as over 40 million words from the Oxford American English Corpus which certainly gives the student a "more representative picture of the written and spoken English they need in order to communicate in today's world" (Oxford ELT Catalogue 1996: 57), as the 90,000 examples, for the 65,000 definitions, used are taken directly from these corpora. The defining vocabulary of the OALD is substantially larger, at 3,500 words, which, I would argue, allows for more precise definitions, while, perhaps making life a little more difficult for the weaker student, at the same time!

The Longman Dictionary of Contemporary English (LDCE) opts for a Defining Vocabulary of 2000 and, also boasts the use of two corpora: the British National Corpus and the Longman Corpus Network ? the use of which makes the LDCE "most reliable in terms of accuracy [...] and coverage, of the whole of the English language without the traditional bias towards the written language", (Longman ELT Catalogue 1996: 106-7). However, the corpora, in this case, only amount to a meagre 15 million words in total, which pales into insignificance when compared to those of the OALD. It should come as no surprise, then, that the LDCE can only provide 25,000 fixed phrases and collocations. Information, however, is provided for the 3000 most frequently used words in spoken and written English, which is provided neither in the CIDE nor in the OALD.

The Collins COBUILD dictionaries and the associated works are based on the most impressive corpus of all - the Bank of English, which at the time of publication of the Heinemann ELT 1996 Catalogue, contained 200 million words; of these 200 million words, 15 million were of natural spoken discourse ? a number of words equal to the entire corpus for the LDCE. Its enormity, of course, allows for a much broader analysis of the English language, on many levels, as can be seen, for instance, in the newly added frequency information in the COBUILD English Dictionary, on all words, based on an easily understood five point scale of frequency. The definitions used by COBUILD are almost conversational in style, compared with those in the OALD. This, in addition to the easily located syntactic and pragmatic information, in the smaller "Extra Columns" to the side, which do not clutter up the text, provide the best individual pedagogical dictionary for my money. The COBUILD English Dictionary has over 75,000 references, and all 100,000 examples are new in this latest edition. Importantly, in practical terms, it does not weight substantially more than any of the other three dictionaries mentioned above.

Where to now?

Each of the above dictionaries has its own strengths, and its own weaknesses, as do their respective users. "There is a sense in which biggest has to be best with dictionaries, as with any reference books. On the other hand, it is obvious that to meet the needs of individual people and circumstances, information has to be selected and presented in usable form." (Crystal 1986:75). Modern scientific lexicography is generally considered to have begun in 1747, when Johnson published his Plan of a Dictionary of the English Language, things, of course, have changed beyond any recognition since then. Johnson's work, and the thousand of others that have appeared since then, leave one with a sense that, perhaps, lexicography is a science in which there are only a limited number of interesting problems. Crystal (1986:78) notes that "because we know what 'should' be in a dictionary, as good linguists and lexicographers, we ask questions relating only to these notions  questions to do with lexical relationships, form, class, etymology and so on. But, an ideal lexicographer should always be striving to go beyond this, to discover whether there are other parameters of relevance to the user." (Italics mine). As I mentioned above, weight, as every dictionary carrying student knows, is a legitimate practical consideration; there has to be a happy medium between the power of a substantial learners dictionary, and the portability of a pocket dictionary.

Computers and Information Technology

We, as users of dictionaries, have notions about what a dictionary looks like, what it is for and , perhaps even why the particular information contained within one, is there. Each of the dictionaries discussed above is intended for pedagogical use - by students, or by teachers. The vocabulary, clearly, is very carefully chosen to suit the specific needs of language learners; this is possible now, as computers are widely used in many spheres of life. The powerful concordancing programmes used by the lexicographer allow him or her to see the frequency of use of particular lemmas across a whole corpus, or in particular parts of a corpus, such as natural informal spoken discourse in Ireland, North-American newspapers, golfing magazines or the complete works of William Shakespeare. This is most useful particularly in L.S.P. situations, where the particular field of discourse may be quite narrow in technical terms, but vast in supporting non-technical terms. Most importantly, information becomes available on the differences between formal and informal language use, and those areas between the two extremes.

But, it is not only frequency information that is available. Collocational data are available in large amounts, and are one of the most important developments of the last ten years as far as pedagogical use is concerned. In my ideal dictionary, I would demand vast amounts of collocational data; the COBUILD English Dictionary provides enough example sentences for the main usages of a particular word, categorised under headwords and superheadwords, but does not show every possible usage ? nor need it present any excuses for this. For example, sesquipedalophobia may be common among learners of English, but they are unlikely to ever have need of this word (See Schur 1987: 364-5 for discussion of this word*).

The reader, by now, will have guessed that my ideal dictionary is to be computer based. With CD ROM technology, a single disk can, at present, hold up to 600 MB of information; by the end of the year CD ROMs will be able to hold at least three times that amount of information. Newer technologies now under development promise virtually unlimited storage capacity - so, the amount of information that could be included in the Ideal Dictionary is also virtually limitless. Another possibility that is already commercially available is the hand-held electronic dictionary, such as the Cannon Word Tank, which is powered by ROM chips; the problem with these is the limited screen capacity, which prevents the user from scanning an entry for the information required. On the other hand, it allows the user access to larger amounts of data than are practical carry in a book, without the expense involved in buying a computer. Hand-held dictionaries, as the name suggests, are portable.

Crystal above (1986:75) noted that the "information has to be selected and presented in a usable form". The ever better Graphical User Interfaces, as used, for example, in accessing the World Wide Web, allow the user to be presented with exactly the information required, in the font, or fonts desired, laid out exactly as the user wishes, without any extraneous material to confuse the situation. So, if definitions are provided using defining vocabularies of various different sizes, it permits the user to select the one most suitable to his or her level, and if this is unsuitable they may try one with a larger or smaller defining vocabulary.

Needless to say a large, and balanced, corpus of general English is a must, for inclusion as standard. This will permit the user to peruse at his or her leisure the various collocations, and frequency information, for the language as a whole, or of a particular register or style, as appropriate. Additional corpora will be available on specific areas of the sociolinguistics of language and society, such as phrasal verbs, fixed expressions, English for medicine, English for the often cited "airline pilots" and so on. This grants the student, and teacher, the ability to select the most appropriate usages for any given purpose, and more importantly, to use the dictionary not only as a reference tool, but also as a study aid. Among the newer titles in the COBUILD series are specific corpora on CD ROM for specific purposes and tasks.

Many dictionaries choose not to include dialect words in their listings, for practical reasons, though allowance is made for national varieties, such as American, or Australian in some dictionaries. My ideal dictionary would allow these dialect words to be included, or excluded, by the user, as appropriate. In this way, the user may tailor the dictionary to his or her specific needs. For instance, if a learner were studying Hiberno-English literature, it is quite possible that he or she may come across some unusual words, word usages, collocations or constructions, that are peculiar to Ireland, and as such, would be unavailable to the learner in his or her pedagogical dictionary, as dialectalisms.

Another useful feature, that is easily included in a computer based dictionary, is a thesaurus facility; from this a student may see synonyms, antonyms, hyponyms and so on, and how they collocate. Importantly, the student may see how two seemingly synonymous words, in fact, do not function in exactly the same way. Carter (1987:137-8) notes that words are rarely monosemous, apart from some strictly technical vocabulary, and cites Moon (1984) in asserting that often the sense of a word is heavily context dependent. A problem for the lexicographer is "not to force polysemy onto words by over specifying the context, and undergeneralising the word". The computer allows this problem to be overcome to some extent, though, as definitions and their senses are written down in a dictionary, they are necessarily influenced by the lexicographer. It is only by constructing meaning, and senses, for him or herself, that the learner may get the full picture  hence the practical use of a collocational thesaurus.

All four of the pedagogical dictionaries discussed above make an attempt to use an I.P.A. meta-transcription of the sounds used in speech, with varying degrees of success, and clarity. The capacity of multi-media computers to allow the learner to actually hear a word, or phrase, spoken, not just by one speaker, but, by many and with varying regional accents, is a most useful one. This eliminates the absolute necessity for the user to have a good knowledge of the phonetic alphabet, in order to make any sense of the pronunciation of a new, or difficult, word. However, I do not deny the usefulness of the phonetic alphabet, as limited as it is. Again, I would intend that the learner would be able to tailor the audio output of the dictionary to his or her specific needs on any particular occasion.

Many dictionaries provide additional information that is neither lexical nor grammatical. The OALD's 5th edition provides 16 pages of cultural information on various aspects of life in English speaking countries, for example, Pages D6 and D7 provide information on the legal systems of England and Wales, and the United States, respectively; these pages are cross-referenced with the pertinent vocabulary contained within that dictionary. This is one area in which the CD-ROM based dictionary can excel, as all the multi-media options of a computer can be made available to the user. Three of the four dictionaries discussed above also provide at least some pictures and diagrams for clarification of concepts and definitions. After all, if, as the adage suggests, a picture is worth a thousand words then an interactive picture, or a Quicktime Movie, must be worth at least a hundred thousand words.

Finally, as all manner of other information is provided in the Ideal Dictionary, there is a convincing argument for including an encyclopaedia in the package. This, too, may be tailored appropriately by the user, to give simple biographical information or, for example, to more extended information, that may be invaluable to the student of literature or translation, working within an area with which he or she is not familiar.

Conclusion

There is nothing novel in any of the suggestions made in this paper. The intractable theoretical problems facing the lexicographer, will always be present, failing a major breakthrough in linguistic theories. The practical problems involved are becoming less thorny with the advent of modern computer technology. Everything I have suggested for inclusion on the Ideal Dictionary CD-ROM is quite possible. We have seen how the use of concordancing software has allowed the lexicographer to see the "real" language in use, and to change his work as necessary. The availability of reasonably advanced home computers at a reasonable price, allow the user to access dictionaries of the "real" language in use in many different ways, and for many different purposes.

It is not just access to information that is important, it is also the ability to tailor that information to the needs of the situation, and use it appropriately, that makes the Ideal Dictionary ideal, and so, perhaps, the Ideal Dictionary can only ever be used to its best effect by the Ideal User. Meanwhile, the ordinary user may use the Ideal Dictionary to the best of his or her ability, and reap the benefits of its use.

Bibliography

Cambridge ELT Catalogue (1996) Cambridge: C.U.P.

Carter, Ronald (1987) Vocabulary: Applied Linguistic Perspectives. London: Allen & Unwin.

Crystal, David (1986) "The ideal dictionary, lexicographer and user." In Ilsen (Ed.) (1986).

Hartmann, Reinhard R. K. (Ed.) (1983) Lexicography: Principles and Practice. London: Academic Press.

Heinemann ELT Catalogue (1996). London: Heinemann.

Ilson, Robert (Ed.) (1986) Lexicography: an emerging international profession. (Proceedings of colloquia sponsored by the U.S.-U.K. Educational Commission: the Fulbright Commission, London). Manchester: Manchester University Press.

Johnson, Samuel (1747/1970) The Plan of a Dictionary of the English Language. Menston: Scholar Press.

Longman ELT Catalogue (1996). London: Addison Wesley Longman.

Moon, R. (1984) Monosemous Words and the Dictionary. m/s English Language Research, University of Birmingham.

O'Donnell, Jim (1990) Word Gloss. Dublin: Institute of Public Administration.

Oxford ELT Catalogue (1996). Oxford: O.U.P.

*Schur, Norman W. (1989) A Dictionary of Challenging Words. London: Penguin.

*Sesquipedalophobia is the fear of long words.


If you have comments or suggestions regarding this paper, e-mail me at ciaran@mindless.com

 [ Go Back ]