ACM Home Page
Please provide us with feedback. Feedback
Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia
Full text PdfPdf (563 KB)
Source
International Conference on Digital Libraries archive
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries table of contents
Austin, TX, USA
SESSION: 11 table of contents
Pages 295-304  
Year of Publication: 2009
ISBN:978-1-60558-322-8
Authors
Daniel Hasan Dalip  Federal University of Minas Gerais, Belo Horizonte, Brazil
Marcos André Gonçalves  Federal University of Minas Gerais, Belo Horizonte, Brazil
Marco Cristo  FUCAPI - Analysis, Research and Tech. Innovation Center, Manaus, Brazil
Pável Calado  Instituto Superior Técnico/INESC-ID, Porto Salvo, Portugal
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 136,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1555400.1555449
What is a DOI?

ABSTRACT

The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
C. Björnsson. Lesbarkeit durch Lix. 1968.
5
 
6
 
7
R. Cassel. Selection criteria for internet resources. College and Research Libraries News, 56(2):92--93, 1995.
8
 
9
C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines, 2001.
 
10
Y. Chu. Trust management for the world wide web. Master's thesis, MIT, USA, 1997.
 
11
M. Coleman and T. L. Liau. A computer readability formula designed for machine scoring. 60(2):283--284, 1975.
 
12
P. Dondio, S. Barrett, S. Weber, and J. Seigneur. Extracting trust from domain analysis: A case study on the wikipedia project. pages 362--373. 2006.
 
13
 
14
T. R. (Ed). Online Collaborative Learning: Theory and Practice. Idea Group Pub, USA, 2004.
 
15
R. Flesch. A new readability yardstick. pages 221--235, 1948.
16
 
17
R. Gunning. The Technique of Clear Writing. McGraw-Hill International Book Co, 1952.
18
19
 
20
N. Korfiatis, M. Poulos, and G. Bokos. Evaluating authoritative sources using social networks: An insight from wikipedia. Online Information Review, 30(3):252--262, 2006.
 
21
A. Krowne. Building a digital library the commons-based peer production way. D-Lib magazine, 9(1082), 2003.
 
22
G. H. McLaughlin. Smog grading: A new readability formula. pages 639--646, 1969.
 
23
B. Mingus. personal communication, 2008.
 
24
 
25
S. B. P. Dondio and S. Weber. Calculating the trustworthiness of a wikipedia article using dante methodology. In IADIS e Society Conference, Dublin, Ireland, 2006.
 
26
L. Rassbach, T. Pincock, and B. Mingus. Exploring the feasibility of automatically rating online article quality. http://upload.wikimedia.org/wikipedia/wikimania2007/d/d3/RassbachPincockMingus07.pdf.
 
27
 
28
E. A. Smith and R. J. Senter. Automated readability index. 1967.
 
29
B. Stvilia, M. B. Twidale, L. C. Smith, and L. Gasser. Assessing information quality of a community-based encyclopedia. In Proc. of the ICIQ 2005, pages 442--454, 2005.
 
30
 
31
K. H. Veltman. Access, claims and quality on the internet -- future challenges. Progress in informatics : PI, 2:17--40, 2005.
 
32
F. Wilcoxon. Individual comparisons by ranking methods. Biometrics, pages 80--83, 1945.

Collaborative Colleagues:
Daniel Hasan Dalip: colleagues
Marcos André Gonçalves: colleagues
Marco Cristo: colleagues
Pável Calado: colleagues