Heuristic Methods for Inference of XML Schemas: Lessons Learned and Open Issues
Volume 24, Issue 4 (2013), pp. 577–602
Pub. online: 1 January 2013
Type: Research Article
Received
1 June 2011
1 June 2011
Accepted
1 October 2012
1 October 2012
Published
1 January 2013
1 January 2013
Abstract
In this paper we focus on a specific class of XML schema inference approaches – so-called heuristic approaches. Contrary to grammar-inferring approaches, their result does not belong to any specific class of grammars and, hence, we cannot say anything about their features from the point of view of theory of languages. However, the heuristic approaches still form a wider and more popular set of approaches due to natural and user-friendly strategies. We describe a general framework of the inference algorithms and we show how its particular phases can be further enhanced and optimized to get more reasonable and realistic output. The aim of the paper is (1) to provide a general overview of the heuristic inference process and existing approaches, (2) to sum up the improvements and optimizations we have proposed so far in our research group, and (3) to discuss possible extensions and open problems which need to be solved. Hence, it enables the reader to get acquainted with the field fast.