<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
			<journal-title-group>
				<journal-title>Informatica</journal-title>
			</journal-title-group>
			<issn pub-type="epub">0868-4952</issn>
			<issn pub-type="ppub">0868-4952</issn>
			<publisher>
				<publisher-name>VU</publisher-name>
			</publisher>
		</journal-meta>
		<article-meta>
			<article-id pub-id-type="publisher-id">inf24405</article-id>
			<article-id pub-id-type="doi">10.15388/Informatica.2013.05</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Research article</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>Heuristic Methods for Inference of XML Schemas: Lessons Learned and Open Issues</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="Author">
					<name>
						<surname>Mlýnková</surname>
						<given-names>Irena</given-names>
					</name>
					<email xlink:href="mailto:mlynkova@ksi.mff.cuni.cz">mlynkova@ksi.mff.cuni.cz</email>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
					<xref ref-type="corresp" rid="fn1">∗</xref>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Nečaský</surname>
						<given-names>Martin</given-names>
					</name>
					<email xlink:href="mailto:necasky@ksi.mff.cuni.cz">necasky@ksi.mff.cuni.cz</email>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<aff id="j_INFORMATICA_aff_000">Department of Software Engineering, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic</aff>
			</contrib-group>
			<author-notes>
				<corresp id="fn1">
					<label>∗</label>Corresponding author.</corresp>
			</author-notes>
			<pub-date pub-type="epub">
				<day>01</day>
				<month>01</month>
				<year>2013</year>
			</pub-date>
			<volume>24</volume>
			<issue>4</issue>
			<fpage>577</fpage>
			<lpage>602</lpage>
			<history>
				<date date-type="received">
					<day>01</day>
					<month>06</month>
					<year>2011</year>
				</date>
				<date date-type="accepted">
					<day>01</day>
					<month>10</month>
					<year>2012</year>
				</date>
			</history>
			<abstract>
				<p>In this paper we focus on a specific class of XML schema inference approaches – so-called heuristic approaches. Contrary to grammar-inferring approaches, their result does not belong to any specific class of grammars and, hence, we cannot say anything about their features from the point of view of theory of languages. However, the heuristic approaches still form a wider and more popular set of approaches due to natural and user-friendly strategies. We describe a general framework of the inference algorithms and we show how its particular phases can be further enhanced and optimized to get more reasonable and realistic output. The aim of the paper is (1) to provide a general overview of the heuristic inference process and existing approaches, (2) to sum up the improvements and optimizations we have proposed so far in our research group, and (3) to discuss possible extensions and open problems which need to be solved. Hence, it enables the reader to get acquainted with the field fast.</p>
			</abstract>
			<kwd-group>
				<label>Keywords</label>
				<kwd>XML Schema inference</kwd>
				<kwd>regular-tree grammars</kwd>
				<kwd>heuristics</kwd>
				<kwd>integrity constraints</kwd>
			</kwd-group>
		</article-meta>
	</front>
</article>