|
ABSTRACT
Readability is a crucial presentation attribute that web summarization algorithms consider while generating a querybaised web summary. Readability quality also forms an important component in real-time monitoring of commercial search-engine results since readability of web summaries impacts clickthrough behavior, as shown in recent studies, and thus impacts user satisfaction and advertising revenue. The standard approach to computing the readability is to first collect a corpus of random queries and their corresponding search result summaries, and then each summary is then judged by a human for its readabilty quality. An average readability score is then reported. This process is time consuming and expensive. Besides, the manual evaluation process can not be used in the real-time summary generation process. In this paper we propose a machine learning approach to the problem. We use the corpus as described above and extract summary features that we think may characterize readability. We then estimate a model (gradient boosted decision tree) that predicts human judgments given the features. This model can then be used in real time to estimate the readability of new (unseen) web search summaries and also be used in the summary generation process. We present results on approximately 5000 editorial judgments collected over the course of a year and show examples where the model predicts the quality well and where it disagrees with human judgments. We compare the results of the model to previous models of readability, most notably Collins-Thompson-Callan, Fog and Flesch-Kincaid, and see that our model shows substantially better correlation with editorial judgments as measured by Pearson's correlation coefficient. The learning algorithm also provides us with the relative importance of the features used.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The R project for statistical computing. http://r-project.org.
|
 |
2
|
Eugene Agichtein , Carlos Castillo , Debora Donato , Aristides Gionis , Gilad Mishne, Finding high-quality content in social media, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341557]
|
| |
3
|
A. Aula. Enhancing the readability of search result summaries. In Proc. of HCI, 2004.
|
 |
4
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
| |
5
|
Jill Burstein , Karen Kukich , Susanne Wolff , Chi Lu , Martin Chodorow , Lisa Braden-Harder , Mary Dee Harris, Automated scoring using a hybrid feature identification technique, Proceedings of the 17th international conference on Computational linguistics, August 10-14, 1998, Montreal, Quebec, Canada
|
 |
6
|
|
| |
7
|
K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL, 2004.
|
| |
8
|
J. H. Friedman. Greedy function approximation: A graidient boosting machine. Annals of Statistics, 29:1189--1232, 2001. http://www-stat.stanford.edu/~jhf/ftp/trebst.pdf.
|
| |
9
|
|
| |
10
|
R. Gunning. The technique of clear writing. McGraw-Hill, 1952.
|
| |
11
|
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Sringer-Verlag, New York, NY, 2001.
|
| |
12
|
|
 |
13
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148212]
|
 |
14
|
|
| |
15
|
M. D. Kickmeier and D. Albert. The effects of scanability on information search: An online experiment. In Proc. of HCI, 2003.
|
| |
16
|
J. P. Kincaid, R. P. Fishburn, R. L. Rogers, and B. S. Chissom. Derivation of new redability formulas for navy enlisted personnel. Technical report, Milington, Tenn, Naval Air Station, 1975. Tech Report Research Branch Report 8-75.
|
| |
17
|
G. Legge. Psychophysics of Reading in Normal and Low Vision. Lawrence Erlbaum Associates, 2006.
|
| |
18
|
P. Li, C. J. Burges, and Q. Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. In Proc. 21st Proc. of Advances in Neural Information Processing Systems, 2007.
|
| |
19
|
S. F. Liang, S. Delvin, and J. Tait. Evaluating web search result summaries. In European Conference in IR Research, pages 96--106, 2006.
|
| |
20
|
G. H. McLaughlin. SMOG grading: A new readability formula. Journal of Reading, 12:639--646, 1969.
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
K. Rayner. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124:372--422, 1998.
|
| |
25
|
G. Ridgeway. Generalized boosted models: A guide to the gbm package. http://i-pensieri.com/gregr/papers/gbm-vignette.pdf.
|
| |
26
|
G. Ridgeway. The state of boosting. Computing Science and Statistics, 31:172--181, 1999. http://www.i-pensieri.com/gregr/papers/interface99.pdf.
|
 |
27
|
|
| |
28
|
K. Ryan. Fathom. http://search.cpan.org/dist/Lingua-EN-Fathom.
|
 |
29
|
|
 |
30
|
|
| |
31
|
W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Sringer-Verlag, New York, NY, 2002.
|
 |
32
|
|
| |
33
|
Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, and G. Sun. A general boosting method and its application to learning ranking functions for web search. In Proc. 21st Proc. of Advances in Neural Information Processing Systems, 2007.
|
|