|
ABSTRACT
There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices. However, a fundamental challenge in dealing with this newly available observational data describing networks is that the data is often of dubious quality -- it is noisy and incomplete -- and before any analysis method can be applied, the data must be cleaned, and missing information inferred. In this paper, we introduce the notion of graph identification, which explicitly models the inference of a "cleaned" output network from a noisy input graph. It is this output network that is appropriate for further analysis. We present an illustrative example and use the example to explore the types of inferences involved in graph identification, as well as the challenges and issues involved in combining those inferences. We then present a simple, general approach to combining the inferences in graph identification and experimentally show the utility of our combined approach and how the performance of graph identification is sensitive to the inter-dependencies among these inferences.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1:1--36, 2007.
|
| |
2
|
I. Bhattacharya, S. Godbole, and S. Joshi. Structured entity identification and document categorization: two tasks with one joint model. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 25--33, 2008.
|
| |
3
|
M. Bilgic and L. Getoor. Effective label acquisition for collective classification. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--51, 2008.
|
| |
4
|
M. Bilgic, G. M. Namata, and L. Getoor. Combining collective classification and link prediction. In ICDMW '07: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pages 381--386, Washington, DC, USA, 2007. IEEE Computer Society.
|
| |
5
|
Y. Choi, E. Breck, and C. Cardie. Joint extraction of entities and relations for opinion recognition. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 431--439. Association for Computational Linguistics, 2006.
|
| |
6
|
W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of IJCAI-03 Workshop on Information Integration, pages 73--78, August 2003.
|
| |
7
|
L. Getoor and C. P. Diehl. Link mining: a survey. SIGKDD Explorations Newsletter, 7:3--12, 2005.
|
| |
8
|
L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link structure. Machine Learning, 3:679--707, 2003.
|
| |
9
|
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery Data, 1(1):2, 2007.
|
| |
10
|
D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In Intl. Conf. on Information and Knowledge Management, 2003.
|
| |
11
|
Q. Lu and L. Getoor. Link-based classification. In Proceedings of the International Conference on Machine Learning, 2003.
|
| |
12
|
L. McDowell, K. M. Gupta, and D. W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, 2007.
|
| |
13
|
G. M. Namata, P. Sen, M. Bilgic, and L. Getoor. Collective classification for text classification. In M. Sahami and A. Srivastava, editors, Text Mining: Classification, Clustering, and Applications. Taylor and Francis Group, 2009.
|
| |
14
|
M. J. Rattigan, M. Maier, and D. Jensen. Exploiting network structure for active inference in collective classification. Technical report, University of Massachusetts Amherst, 2007.
|
| |
15
|
M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62:107--136, 2006.
|
| |
16
|
D. Roth, K. Small, and I. Titov. Sequential learning of classifiers for structured prediction problems. In Artificial Intelligence STAT, April 2009.
|
| |
17
|
D. Roth and W. Yih. A linear programming formulation for global inference in natural language tasks. In Proc. of CoNLL-2004, pages 1--8. Boston, MA, USA, 2004.
|
| |
18
|
B. Taskar, A. Pieter, and D. Koller. Discriminative probabilistic models for relational data. In Conf. on Uncertainty in Artificial Intelligence, 2002.
|
| |
19
|
B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In Advances in Neural Information Processing Systems, 2003.
|
| |
20
|
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999.
|
|