ACM Home Page
Please provide us with feedback. Feedback
A personalized search engine based on web-snippet hierarchical clustering
Full text PdfPdf (514 KB)
Source International World Wide Web Conference archive
Special interest tracks and posters of the 14th international conference on World Wide Web table of contents
Chiba, Japan
SESSION: Industrial and practical experience track paper session 1 table of contents
Pages: 801 - 810  
Year of Publication: 2005
ISBN:1-59593-051-5
Authors
Paolo Ferragina  Dipartimento di Informatica, Pisa
Antonio Gulli  Dipartimento di Informatica, Pisa
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 56,   Downloads (12 Months): 331,   Citation Count: 19
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1062745.1062760
What is a DOI?

ABSTRACT

In this paper we propose a hierarchical clustering engine, called snaket, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy offers a complementary view to the flat-ranked list of results returned by current search engines. Users can navigate through the hierarchy driven by their search needs. This is especially useful for informative, polysemous and poor queries.SnakeT is the first complete and open-source system in the literature that offers both hierarchical clustering and folder labeling with variable-length sentences. We extensively test SnakeT against all available web-snippet clustering engines, and show that it achieves efficiency and efficacy performance close to the best known engine Vivisimo.com.Recently, personalized search engines have been introduced with the aim of improving search results by focusing on the users, rather than on their submitted queries. We show how to plug SnakeT on top of any (un-personalized) search engine in order to obtain a form of personalization that is fully adaptive, privacy preserving, scalable, and non intrusive for underlying search engines.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
G. Attardi, A. Gulli, and F. Sebastiani. Theseus: categorization by context. In WWW8, 1999.
 
4
5
 
6
7
 
8
P. A. Chirita, D. Olmedilla, and W. Nejdl. PROS: A personalized ranking platform for web search. In Int. Conf. on Adaptive Hypermedia and Web-based Syst., 2004.
 
9
SnakeT Dataset. http://roquefort.di.unipi.it/gulli/listAllowed/testSnakeT/.
 
10
B. Fung, K. Wang, and M. Ester. Large hierarchical document clustering using frequent itemsets. In SDM03.
 
11
F. Giannotti, M. Nanni, and D. Pedreschi. Webcat: Automatic categorization of web search results. In SEBD03.
 
12
 
13
14
15
16
 
17
Z. Jiang, A. Joshi, R. Krishnapuram, and L. Yi. Retriever: Improving web search engine results using clustering. In Managing Business with Electronic Commerce 02.
18
19
 
20
21
 
22
Y. S. Maarek, R. Fagin, I. Z. Ben-Shaul, and D. Pelleg. Ephemeral document clustering for web applications. Technical Report RJ 10186, IBM Research, 2000.
 
23
M. Meila. Comparing clusterings. Technical Report 418, University of Washington, 2002.
 
24
Javed Mostafa. Seeking better web searches. Scientific American, February 2005.
 
25
S. Osinski and D. Weiss. Conceptual clustering using lingo algorithm: Evaluation on open directory project data. In IIPWM04, 2004.
 
26
SnakeTTest Results. http://roquefort.di.unipi.it/gulli/listAllowed/testing/.
 
27
 
28
 
29
 
30
 
31
D. Weiss and J. Stefanowski. Web search results clustering in polish: Experimental evaluation of carrot. In IIS03.
 
32
Y. Wu and X. Chen. Extracting features from web search returned hits for hierarchical classification. In IKE03.
 
33
34
 
35
D. Zhang and Y. Dong. Semantic, hierarchical, online clustering of web search results. In WIDM01

CITED BY  19


REVIEW

"Anthony Joseph Duben : Reviewer"

Searching the Web for information can be very frustrating. Search engines are based on different models, ranging from the very structured, in which Web sites are cataloged according to predefined hierarchical categories, to very amorphous reports   more...

Collaborative Colleagues:
Paolo Ferragina: colleagues
Antonio Gulli: colleagues