|
ABSTRACT
Searchers on the blogosphere often have a need to identify other key bloggers with similar interests to their own. However, a main difference of this blog distillation task from normal adhoc or Web document retrieval is that each blog can be seen as an aggregate of its constituent posts. On the other hand, we show that the task is similar to the expert search task, where a person's expertise is derived from the aggregate of their publications or emails. In this paper, we investigate several aspects of blog retrieval: Firstly, we experiment whether a blog should be represented as a whole unit, or as by considering each of its posts as indicators of its relevance, showing that expert search techniques can be adapted for blog search; Secondly, we examine whether indexing only the XML feed provided by each blog (and which is often incomplete) is sufficient, or whether the full-text of each blog post should be downloaded; Lastly, we use approaches to detect the central or recurring interests of each blog to increase the retrieval effectiveness of the system. Using the TREC 2007 Blog dataset, the results show that our proposed expert search paradigm is indeed useful in identifying key bloggers, achieving high retrieval effectiveness.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Amati. Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Univ. of Glasgow, 2003.
|
| |
2
|
J. Arguello, J. Elsas, J. Callan, and J. Carbonell. Document Representation and Query Expansion Models for Blog Recommendation. In Proceedings of ICWSM 2008, 2008.
|
 |
3
|
|
| |
4
|
N. Craswell and D. Hawking. Overview of TREC-2004 Web track. In Proceedings of TREC-2004, 2004.
|
| |
5
|
N. Craswell, D. Hawking, A.-M. Vercoustre, and P. Wilkins. Panoptic expert: Searching for experts not just for documents. In AusWeb-2001 Poster Proceedings, 2001.
|
| |
6
|
J. Elsas, J. Arguello, J. Callan, and J. Carbonell. Retrieval and Feedback Models for Blog Distillation. In Proceedings of TREC-2007, 2008.
|
| |
7
|
D. Hannah, C. Macdonald, B. He, J. Peng, and I. Ounis. University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier. In Proceedings of TREC 2007, 2008.
|
| |
8
|
B. He. Term Frequency Normalisation for Information Retrieval. PhD thesis, Univ. of Glasgow, 2007.
|
| |
9
|
|
| |
10
|
A. Java, P. Kolari, T. Finin, A. Joshi, and T. Oates. Feeds That Matter: A Study of Bloglines Subscriptions. In Proceedings of ICWSM 2007, 2007.
|
| |
11
|
S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.
|
| |
12
|
P. Kolari, T. Finin, A. Java, and A. Joshi. Spam in Blogs and Social Media, Tutorial. In Proceedings of ICWSM 2007, 2007.
|
 |
13
|
|
| |
14
|
C. Lioma, C. Macdonald, V. Plachouras, J. Peng, B. He and I. Ounis. University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier. In Proceedings of TREC 2006, 2007.
|
| |
15
|
C. Macdonald and I. Ounis. The TREC Blogs06 collection : Creating and analysing a blog test collection. Technical Report TR-2006-224, Univ. of Glasgow, 2006.
|
 |
16
|
|
 |
17
|
|
| |
18
|
C. Macdonald and I. Ounis. Searching for Expertise: Experiments with the Voting Model. In Special Issue of the Computer Journal on Expertise Profiling. 2008; doi: 10.1093/comjnl/bxm112
|
| |
19
|
C. Macdonald, I. Ounis, and I. Soboroff. Overview of the TREC-2007 Blog Track. In Proceedings of TREC-2007, 2008.
|
| |
20
|
G. Mishne and M. de Rijke. A study of blog search. In Proceedings of ECIR 2006, pages 289--301, 2006.
|
| |
21
|
I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A high performance and scalable information retrieval platform. In Proceedings of the OSIR Workshop 2006, pages 18--25, 2006.
|
| |
22
|
I. Ounis, M. de Rijke, C. Macdonald, G. Mishne, and I. Soboroff. Overview of the TREC-2006 Blog Track. In Proceedings of TREC-2006, 2007.
|
| |
23
|
|
| |
24
|
S. E. Robertson, S. Walker, M. Hancock-Beaulieu, A. Gull, and M. Lau. Okapi at TREC-2. In Proceedings of TREC-2, pages 21--34, 1994.
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
Terrier 2.1 documentation: Examples of using Terrier to index TREC collections: WT2G and Blogs06, 2008. http://ir.dcs.gla.ac.uk/terrier/doc/trec_examples.html.
|
| |
29
|
M. Thelwall. Bloggers during the London attacks: Top information sources and topics. In Proceedings of WWW Workshop on the Weblogging Ecosystem, 2006.
|
|