ACM Home Page
Please provide us with feedback. Feedback
A genetic algorithm for segmentation and information retrieval of SEC regulatory filings
Full text PdfPdf (807 KB)
Source
dg.o; Vol. 289 archive
Proceedings of the 2008 international conference on Digital government research table of contents
Montreal, Canada
SESSION: Research papers and management, case study & policy papers: regulations and laws table of contents
Pages 44-52  
Year of Publication: 2008
ISBN:978-1-60558-099-9
Authors
Joshua Carroll  University of Pennsylvania, Philadelphia, PA
Thomas Y. Lee  University of Pennsylvania, Philadelphia, PA
Sponsors
: Routledge
: Elsevier
: Springer
: Cefrio
NCDG : National Center for Digital Government
Publisher
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 69,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

A principal mechanism by which the SEC fulfills its missions of investor protection and market efficiency is the widespread dissemination of the information that publicly traded firms submit for disclosure. The continuing evolution of reporting standards like the International Financial Reporting Standards (IFRS) and the global convergence on XBRL as a syntax for sharing data address the quantitative dimension of reporting. This work complements the ongoing research on financial disclosure by helping investors learn from the textual, narrative portions of the filing. Our objective is to automatically segment SEC 10-K financial regulatory filings to facilitate structured retrieval and querying. In structured retrieval, terms are differentially weighted based upon the document segments in which a term appears. We leverage the regulatory instructions provided by the SEC to identify a set of semantic labels such as "Legal Proceedings" or "Management's Discussion and Analysis" that segment a 10-K annual report. We frame the problem of document segmentation as a search for semantic labels and use a genetic algorithm to segment each filing. We evaluate the genetic algorithm on a test set of 112 randomly selected regulatory filings and compare those results to a simple, greedy approach for information extraction and segmentation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Banko, M., Brill, E., Dumais, S. and Lin, J. AskMSR: Question Answering Using the Worldwide Web. AAAI Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002.
4
 
5
 
6
U.S. EPA, Accidental Release Information Program (ARIP), 1999.
 
7
 
8
9
 
10
 
11
Hsu, C.-N. and Chang, C.-C. Finite-State Transducers for Semi-Structured Text Mining. IJCAI Workshop on Text Mining: Foundations, Techniques and Application, 1999.
 
12
Kushmerick, N., Weld, D. S. and Doorenbos, R. Wrapper Induction for Information Extraction. IJCAI, 1997.
 
13
Lee, T. Using Regulatory Instructions for Information Extraction. AAAI Workshop on IIWeb, 2007.
 
14
 
15
 
16
Pathak, P., Gordon, M. D. and Fan, W. Effective Information Retrieval Using Genetic Algorithms Based Matching Functions Adaptation. HICSS, 2000.
 
17
Rylander, B. and Foster, J. Computational Complexity and Genetic Algorithms. Conference on Soft Computing, Advances in Fuzzy Systems and Evolutionary Computation, 2001.
 
18
U.S. SEC The Investor's Advocate: How the SEC Protects Investors, Maintains Market Integrity, and Facilitates Capital Formation. 2007.
 
19
U.S. SEC, EDGAR Full-Text Search FAQ. 2007.
 
20
U.S. SEC, Annual Report Pursuant to Section 13 or 15(d) (Form 10-K) General Instructions. 2005.
 
21
U.S. SEC, FAQ: XBRL Voluntary Filing Program. 2006.
 
22
Soderland, S. Learning to Extract Text-based Information from the World Wide Web. KDD, 1997.
 
23
White, J. W. Drilling for Disclosure: The Powerful Tool of Interactive Data. AAPG/SPE International Multidisciplinary Reserves Conference (Washington, DC, June 25, 2007).
 
24
Yang, J.-J., Korfhage, R. and Rasmussen, E. Query Improvement in Information Retrieval Using Genetic Algorithms. TREC, 1992.

Collaborative Colleagues:
Joshua Carroll: colleagues
Thomas Y. Lee: colleagues