ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Accelerated focused crawling through online relevance feedback
Full text PdfPdf (590 KB)
Source International World Wide Web Conference archive
Proceedings of the 11th international conference on World Wide Web table of contents
Honolulu, Hawaii, USA
SESSION: Crawling table of contents
Pages: 148 - 159  
Year of Publication: 2002
ISBN:1-58113-449-5
Authors
Soumen Chakrabarti  IIT Bombay
Kunal Punera  IIT Bombay
Mallela Subramanyam  University of Texas, Austin
Sponsors
ACM: Association for Computing Machinery
: WWW'02
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 136,   Citation Count: 42
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/511446.511466
What is a DOI?

ABSTRACT

The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that best satisfy their information need. Can an automatic program emulate this human behavior and thereby learn to predict the relevance of an unseen HREF target page w.r.t. an information need, based on information limited to the HREF source page? Such a capability would be of great interest in focused crawling and resource discovery, because it can fine-tune the priority of unvisited URLs in the crawl frontier, and reduce the number of irrelevant pages which are fetched and discarded.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
 
6
7
 
8
 
9
 
10
11
 
12
 
13
P. M. E. De Bra and R. D. J. Post. Searching for arbitrary information in the WWW: The fish search for Mosaic. In Second World Wide Web Conference '94: Mosaic and the Web, Chicago, Oct. 1994. Online at http://archive.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/debra/article.html and http://citeseer.nj.nec.com/172936.html.
 
14
 
15
W. A. Gale, K. W. Church, and D. Yarowsky. A method for disambiguating word senses in a large corpus. Computer and the Humanities, 26:415--439, 1993.
 
16
 
17
T. Joachims, D. Freitag, and T. Mitchell. WebWatcher: A tour guide for the web. In IJCAI, Aug. 1997. Online at http://www.cs.cmu.edu/~webwatcher/ijcai97.ps.
 
18
H. Leiberman. Letizia: An agent that assists Web browsing. In International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Aug. 1995. See Website at http://lieber.www.media.mit.edu/people/lieber/Lieberary/Letizia/Letizia.html.
19
 
20
A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. Software available from http://www.cs.cmu.edu/~mccallum/bow/, 1998.
 
21
A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41--48. AAAI Press, 1998. Online at http://www.cs.cmu.edu/~knigam/.
 
22
A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41--48. AAAI Press, 1998. Also technical report WS-98-05, CMU; online at http://www.cs.cmu.edu/~knigam/papers/multinomial-aaaiws98.pdf.
 
23
F. Menczer. Links tell us about lexical and semantic Web content. Technical Report Computer Science Abstract CS.IR/0108004, arXiv.org, Aug. 2001. Online at http://arxiv.org/abs/cs.IR/0108004.
 
24
25
 
26
 
27
T. Mitchell. Mining the Web. In SIGIR 2001, Sept. 2001. Invited talk.
 
28
 
29
 
30
 
31
 
32
M. Subramanyam, G. V. R. Phanindra, M. Tiwari, and M. Jain. Focused crawling using TFIDF centroid. Hypertext Retrieval and Mining (CS610) class project, Apr. 2001. Details available from manyam@cs.utexas.edu.
 
33

CITED BY  42

Collaborative Colleagues:
Soumen Chakrabarti: colleagues
Kunal Punera: colleagues
Mallela Subramanyam: colleagues