|
ABSTRACT
Modern forensic analytics applications, like network traffic analysis, perform high-performance hypothesis testing, knowledge discovery and data mining on very large datasets. One essential strategy to reduce the time required for these operations is to select only the most relevant data records for a given computation. In this paper, we present a set of parallel algorithms that demonstrate how an efficient selection mechanism -- bitmap indexing -- significantly speeds up a common analysis task, namely, computing conditional histogram on very large datasets. We present a thorough study of the performance characteristics of the parallel conditional histogram algorithms. As a case study, we compute conditional histograms for detecting distributed scans hidden in a dataset consisting of approximately 2.5 billion network connection records. We show that these conditional histograms can be computed on interactive time scale (i.e., in seconds). We also show how to progressively modify the selection criteria to narrow the analysis and find the sources of the distributed scans.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Bellman, R. 1961. Adaptive Control Processes: A Guided Tour. Princeton University Press.
|
 |
3
|
|
| |
4
|
Berchtold, S., Jagadish, H. V., and Ross, K. A. 1998. Independence diagrams: A technique for visual data mining. In Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, KDD, AAAI Press, R. Agrawal, P. E. Stolorz, and G. Piatetsky-Shapiro, Eds., 139--143.
|
| |
5
|
Bethel, E. W., Campbell, S., Dart, E., Stockinger, K., and Wu, K. 2006. Accelerating network traffic analysis using query-driven visualization. In IEEE Symposium on Visual Analytics Science and Technology, IEEE Computer Society Press.
|
| |
6
|
Brun, R., and Rademarkers, F. 1997. Root -- an object oriented data analysis framework. In Proceedings of the AIHENP 1996 Workshop, 81--86.
|
| |
7
|
Burrescia, J., and Johnston, W., 2005. Esnet status update. Internet2 International Meeting.
|
 |
8
|
|
 |
9
|
|
| |
10
|
Experiment, B., 2006. The babar experiment. http://wwwpublic.slac.stanford.edu/babar/.
|
 |
11
|
|
| |
12
|
Fisk, M., Smith, S. A., Weber, P., Kothapally, S., and Caudell, T. 2003. Immersive Network Monitoring. In Proceedings of the 2003 Passive and Active Measurement Workshop.
|
| |
13
|
|
| |
14
|
Carrie Gates , Michael Collins , Michael Duggan , Andrew Kompanek , Mark Thomas, More Netflow Tools for Performance and Security, Proceedings of the 18th USENIX conference on System administration, November 14-19, 2004, Atlanta, GA
|
| |
15
|
|
| |
16
|
Grinstein, G., Keim, D., and Ward, M., 2002. Information visualization, visual data mining, and its application to drug design. IEEE Visualization 2002 Course #1 Notes, October.
|
| |
17
|
|
| |
18
|
Ioannidis, Y. 2003. The history of histograms (abridged). In International Conference on Very Large Data Bases.
|
| |
19
|
Jacobsen, V., Leres, C., and McCanne, S., 1989. tcpdump. ftp://ftp.ee.lbl.gov/.
|
| |
20
|
|
| |
21
|
|
| |
22
|
Kindlmann, G. 1999. Semi-Automatic Generation of Transfer Functions for Direct Volume Rendering. Master's thesis, Cornell University.
|
| |
23
|
Kitware, Inc. 2003. The Visualization Toolkit User's Guide, January.
|
| |
24
|
|
| |
25
|
|
| |
26
|
Kornexl, S., Paxson, V., Dreger, H., Feldmann, A., and Sommer, R. 2005. Building a time machine for efficient recording and retrieval of high-volume network traffic. In Internet Measurement Conference.
|
| |
27
|
Eleftherios E. Koutsofios , Stephen C. North , Russell Truscott , Daniel A. Keim, Visualizing large-scale telecommunication networks and services (case study), Proceedings of the conference on Visualization '99: celebrating ten years, p.457-461, October 1999, San Francisco, California, United States
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
Livnat, Y., Agutter, J., Moon, S., Erbacher, R., and Foresti, S. 2005. A visual paradigm for network intrusion detection. In IEEE Workshop on Information Assurance And Security.
|
 |
32
|
|
| |
33
|
|
| |
34
|
McCanne, S., Leres, C., and Jacobsen, V., 1994. libpcap. ftp://ftp.ee.lbl.gov/.
|
| |
35
|
|
 |
36
|
Jonathan McPherson , Kwan-Liu Ma , Paul Krystosk , Tony Bartoletti , Marvin Christensen, PortVis: a tool for port-based detection of security events, Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security, October 29-29, 2004, Washington DC, USA
[doi> 10.1145/1029208.1029220]
|
| |
37
|
|
| |
38
|
Oetiker, T., 2006. Multi router traffic grapher. http://mrtg.hdl.com/.
|
| |
39
|
Oetiker, T., 2006. Round robin database tool. http://oss.oetiker.ch/rrdtool/.
|
 |
40
|
|
| |
41
|
|
| |
42
|
Paxson, V. 1998. Bro: A system for detecting network intruders in real-time. In Proceedings of the 7th USENIX Security Symposium.
|
| |
43
|
|
| |
44
|
|
| |
45
|
Products, E. S., 2006. The fast light toolkit. http://www.fltk.org.
|
| |
46
|
R3vis, 1999-2006. OpenRM Scene Graph. http://www.openrm.org.
|
| |
47
|
Scientific Data Management Group, L. B. N. L., 2006. Fastbit. http://sdm.lbl.gov/fastbit.
|
| |
48
|
Shoshani, A., Bernardo, L., Nordberg, H., Rotem, D., and Sim, A. 1999. Multidimensional indexing and query coordination for tertiary storage management. In International Conference on Scientific and Statistical Database Management, IEEE Computer Society. 1998. Proceedings of the 1998 ACM SIGMOD: International Conference on Management of Data, ACM Press, New York, NY, USA.
|
| |
49
|
|
| |
50
|
Kurt Stockinger , Kesheng Wu , Scott Campbell , Stephen Lau , Mike Fisk , Eugene Gavrilov , Alex Kent , Christopher E. Davis , Rick Olinger , Rob Young , Jim Prewett , Paul Weber , Thomas P. Caudell , E. Wes Bethel , Steve Smith, Network Traffic Analysis With Query Driven Visualization SC 2005 HPC Analytics Results, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.72, November 12-18, 2005
[doi> 10.1109/SC.2005.47]
|
| |
51
|
Stockinger, K., Shalf, J., Wu, K., and Bethel, E. W. 2005. Query-driven visaulization of large data sets. In Proceedings of IEEE Visualization.
|
| |
52
|
Stockinger, K., Wu, K., Brun, R., and Canal, P. 2006. Bitmap indices for fast end-user physics analysis in root. Nuclear Instruments and Methods in Physics Research, Section A - Accelerators, Spectrometers, Detectors and Associated Equipment 559, 99--102.
|
| |
53
|
Systems, C., 2005. Cisco netflow collection engine. http://www.cisco.com/en/US/products/sw/netmgtsw/ps1964/.
|
| |
54
|
Thomas, J. J., and Eds., K. A. C. 2005. Illuminating the Path -- The Research and Development Agenda for Visual Analytics. IEEE Computer Society Press.
|
| |
55
|
Uphoff, B., and Criscuolo, P. 2004. A framework for collection and management of intrusion detection data sets. In Proceedings of the 16th Annual FIRST Conference on Computer Security Incident Handling.
|
| |
56
|
|
 |
57
|
|
| |
58
|
Wu, K., Otoo, E., and Shoshani, A. 2004. On the performance of bitmap indices for high cardinality attributes. In Proceedings of the International Conference on Very Large Data Bases.
|
 |
59
|
|
 |
60
|
Xiaoxin Yin , William Yurcik , Michael Treaster , Yifan Li , Kiran Lakkaraju, VisFlowConnect: netflow visualizations of link relationships for security situational awareness, Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security, October 29-29, 2004, Washington DC, USA
[doi> 10.1145/1029208.1029214]
|
CITED BY 5
|
|
|
|
|
Oliver Rübel , Prabhat , Kesheng Wu , Hank Childs , Jeremy Meredith , Cameron G. R. Geddes , Estelle Cormier-Michel , Sean Ahern , Gunther H. Weber , Peter Messmer , Hans Hagen , Bernd Hamann , E. Wes Bethel, High performance multivariate visual data exploration for extremely large data, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
Oliver Rübel , Prabhat , Kesheng Wu , Hank Childs , Jeremy Meredith , Cameron G. R. Geddes , Estelle Cormier-Michel , Sean Ahern , Gunther H. Weber , Peter Messmer , Hans Hagen , Bernd Hamann , E. Wes Bethel, High performance multivariate visual data exploration for extremely large data, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
|
|
|
|
|