ACM Home Page
Please provide us with feedback. Feedback
First large-scale information retrieval experiments on turkish texts
Full text PdfPdf (120 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
POSTER SESSION: Posters table of contents
Pages: 627 - 628  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Fazli Can  Bilkent University, Bilkent, Turkey
Seyit Kocberber  Bilkent University, Bilkent, Turkey
Erman Balcik  Bilkent University, Bilkent, Turkey
Cihan Kaynak  Bilkent University, Bilkent, Turkey
H. Cagdas Ocalan  Bilkent University, Bilkent, Turkey
Onur M. Vursavas  Bilkent University, Bilkent, Turkey
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 81,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148288
What is a DOI?

ABSTRACT

We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching functions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Altintas, K., Can, F., Patton, J. M. Language change quantification using time-separated parallel translations. Literary and Linguistic Computing (resubmitted after rev.).
2
 
3
Hafer, M. A., Weiss, S. F. Word segmentation by letter successor varieties. Infor. Stor. Retr. 10, 371--385, 1974.
 
4
 
5
Sever, H., Bitirim Y. FindStem: analysis and evaluation of a Turkish stemming algorithm. LNCS 2857: 238--251, 2003.
 
6
Sever, H., Tonta, Y. Truncation of content terms for Turkish. CICLing Feb. 2006, Mexico (to appear).
 
7
Solak, A., Can, F., Effects of stemming on Turkish text retrieval. ISCIS Conf., pp. 49--56, 1994.
 
8


Collaborative Colleagues:
Fazli Can: colleagues
Seyit Kocberber: colleagues
Erman Balcik: colleagues
Cihan Kaynak: colleagues
H. Cagdas Ocalan: colleagues
Onur M. Vursavas: colleagues