When researchers of literature wished to study the style of an author or grammarians wanted to investigate current or past usage, they have traditionally turned to books called concordances. A concordance is an alphabetical listing of all the words in a text. For each occurrence, the neighboring context is given. A concordance, being a book, is a static presentation of data. There are various limitations which result from the original design and production considerations.
Some concordances are sentence-based. This means that even if the word being cited is the first word in the sentence, no words from the previous sentence will be provided. Can you think of anything that is missed using this method?
You are to provide access to three English-language books in a similar fashion. You can download the data for free from Project Gutenberg. One of the books must be Alice's Adventures in Wonderland by Lewis Carroll (1832-1898). Use the zipped version: alice30h.zip.
Have a look at the beseda text corpus (a body of word data) at the Institute of Slovenian Language. Type in the word visokost to see what is found. You should use this web site as a model.
Your web page programming will have the following characteristics:
How is a web-based concordance program better than a concordance in a book?
Is the book concordance still useful? Does it fulfill a need for a researcher better than a computer program?