ACM Home Page
Please provide us with feedback. Feedback
Landmarks in information retrieval: the message out of the bottle
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Tampere, Finland
Pages: 1 - 1  
Year of Publication: 2002
ISBN:1-58113-561-0
Author
Keith Van Rijsbergen  University of Glasgow, UK
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564376.564377
What is a DOI?

ABSTRACT

For many years I have wanted to give a talk like this: look back on our subject, identify the high (and perhaps low) points, consider what worked, what did not work, and speculate a little about the future. Now that I at last have the opportunity to give such a talk the realisation has dawned just how difficult it is to do justice to the topic. The only way out of this difficulty for me is to emphasise that this is a personal account, based on my involvement with the field since 1968, and that errors of omission and commission are not deliberate but simply due to lack of knowledge and time on my part.

To talk of landmarks is easy but to say what they are in IR is not. They come in various shapes and sizes: events, publications, experiment, ideas, etc. In the course of this presentation I shall be judiciously mixing all of these. However, the emphasis will be on ideas and their subsequent modelling and testing through experimentation. The interaction between theory and experiment will be a recurring theme. I will try and associate these developments with key individuals, thereby running the risk of ignoring some; I apologise for this in advance.

The pre-history of our subject can be traced back to the work in the 19th century, perhaps even further, but I will pick it up at the middle of the last century (20th) starting with the work of Robert Fairthorne and Vannevar Bush. This early work emphasised the possibility of using mechanical devices to store and retrieve information. Of course the foundations of modern information retrieval were properly laid after 1945 with the pioneering work of Cleverdon, Salton, Sparck Jones, and others. This work gave rise to a strong experimental methodology for the evaluation of theoretical ideas, which has been sustained to this day. It has been a hallmark of IR research that theory is developed in the context of experimentation. There is no doubt that many disciplines are jealous of the success of TREC.

IR research has thrown up a number of successful models. These models have been based on some, often unstated, assumptions (or hypotheses). I will attempt to identify some of the underlying ideas, giving credit where is due, that led to the fruitful exploration of retrieval models. This will include system-oriented as well as user-oriented ideas, especially those concerned with the measurement of retrieval performance.

IR has been fortunate in that the subject has grown through the active collaboration between computer scientists and information scientists. This has meant that traditional approaches to the storage and retrieval of information emanating from the library world, for example, have always strongly influenced new developments. This tension between manual (human) processes and automatic computer-based processes in IR has always been fruitful. Even now with the evolution of ideas about meta-data and ontologies needed to enhance web retrieval, the debate about controlled vocabularies versus automatic indexing is relevant. Issues of scalability are particularly important here.

One of the strengths that have emerged in our subject is that many of our models can be deployed independently of medium or modality. For example, retrieving images or audio sequences can be handled in ways similar to those used to retrieve text data. This has proved to be great boon to IR. The development of web retrieval through the deployment of various kinds of search engines has been based on the considerable early work in IR although detailing the specific influences is not easy. It is clear that the underlying mathematical and statistical models in IR have been ubiquitous in application. The extreme difficulty encountered in making NLP work for IR forced researchers to develop powerful statistical, probabilistic, geometrical, and logical techniques to complement linguistic ones. This is now paying off because of the similar difficulties encountered in other media.

Having given some account of how we got here I will spend a little time talking about where we go from here, how do we extract the message from the bottle?

Collaborative Colleagues:
Keith Van Rijsbergen: colleagues