| What's in a session: tracking individual behavior on the web |
| Full text |
Pdf
(539 KB)
|
Source
|
Conference on Hypertext and Hypermedia
archive
Proceedings of the 20th ACM conference on Hypertext and hypermedia
table of contents
Torino, Italy
SESSION: Tracking and exploiting user behavior
table of contents
Pages 173-182
Year of Publication: 2009
ISBN:978-1-60558-486-7
|
|
Authors
|
|
Mark Meiss
|
Indiana University, Bloomington, IN, USA
|
|
John Duncan
|
Indiana University, Bloomington, IN, USA
|
|
Bruno Gonçalves
|
Indiana University, Bloomington, IN, USA
|
|
José J. Ramasco
|
ISI Foundation, Torino, Italy
|
|
Filippo Menczer
|
Indiana University and ISI Foundation, Bloomington and Torino, IN, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 33, Downloads (12 Months): 113, Citation Count: 0
|
|
|
ABSTRACT
We examine the properties of all HTTP requests generated by a thousand undergraduates over a span of two months. Preserving user identity in the data set allows us to discover novel properties of Web traffic that directly affect models of hypertext navigation. We find that the popularity of Web sites--the number of users who contribute to their traffic--lacks any intrinsic mean and may be unbounded. Further, many aspects of the browsing behavior of individual users can be approximated by log-normal distributions even though their aggregate behavior is scale-free. Finally, we show that users' click streams cannot be cleanly segmented into sessions using timeouts, affecting any attempt to model hypertext navigation using statistics of individual sessions. We propose a strictly logical definition of sessions based on browsing activity as revealed by referrer URLs; a user may have several active sessions in their click stream at any one time. We demonstrate that applying a timeout to these logical sessions affects their statistics to a lesser extent than a purely timeout-based mechanism.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. Technical report, arXiv:0706.1062v1 {physics.data--an}, 2007.
|
| |
6
|
|
 |
7
|
Jeffrey Erman , Anirban Mahanti , Martin Arlitt , Carey Williamson, Identifying and discriminating between web and peer-to-peer traffic in the network core, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242692]
|
| |
8
|
S. Fortunato, A. Flammini, F. Menczer, and A. Vespignani. Topical interests and the mitigation of search engine bias. Proc. Natl. Acad. Sci. USA, 103(34):12684--12689, 2006.
|
| |
9
|
B. Goncalves, M. Meiss, J. J. Ramasco, A. Flammini, c and F. Menczer. Remembering what we like: Toward an agent-based model of web traffic. In WSDM (Late-breaking papers), 2009.
|
| |
10
|
B. Goncalves and J. J. Ramasco. Human dynamics revealed through web analytics. Phys. Rev. E, 78:026123, 2008.
|
 |
11
|
Yuting Liu , Bin Gao , Tie-Yan Liu , Ying Zhang , Zhiming Ma , Shuyuan He , Hang Li, BrowseRank: letting web users vote for page importance, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390412]
|
| |
12
|
J. Luxenburger and G. Weikum. Query-Log Based Authority Analysis for Web Information Search, volume 3306 of Lecture Notes in Computer Science, pages 90--101. Springer Berlin / Heidelberg, 2004.
|
 |
13
|
Mark R. Meiss , Filippo Menczer , Santo Fortunato , Alessandro Flammini , Alessandro Vespignani, Ranking web sites with real user traffic, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341543]
|
 |
14
|
|
| |
15
|
M. Meiss, F. Menczer, and A. Vespignani. Structural analysis of behavioral networks from the Internet. Journal of Physics A, 2008.
|
 |
16
|
|
| |
17
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University Database Group, 1998.
|
| |
18
|
Sandeep Pandey , Sourashis Roy , Christopher Olston , Junghoo Cho , Soumen Chakrabarti, Shuffling a stacked deck: the case for partially randomized ranking of search engine results, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
19
|
F. Qiu, Z. Liu, and J. Cho. Analysis of user web traffic with a focus on search activities. In A. Doan, F. Neven, R. McCann, and G. J. Bex, editors, Proc. 8th International Workshop on the Web and Databases (WebDB), pages 103---108, 2005.
|
| |
20
|
C. Viecco, A. Tsow, and L. J. Camp. Privacy--aware architecture for sharing web histories. IBM Systems Journal, publication pending.
|
| |
21
|
|
|