TIKA SPATIAL - COUPLING APACHE TIKA WITH COMMON GEOSPATIAL DATA FORMATS
Apache Tika (http://lucene.apache.org/tika/) is a common parser framework which supports the extraction of document metadata. As subproject of the Lucene Search engine, its primary purpose is the parsing and preprocessing of any content for free web search. Currently supported file formats includes common text formats (e.g. open document format), XML, HTML, Audio and Image formats, and more (http://lucene.apache.org/tika/0.7/formats.html).
In the sapience project (http://purl.org/net/sapience/docs), we have tested the coupling of the Tika Parser API with open source parser tools for geospatial data formats (e.g. GeoTools), with focus on KML and ESRI Shapefiles. Parsing a geospatial resource results - besides the thematic coverage listed in the descriptive metadata - in a representation of the document's spatiotemporal coverage. Depending on the application, this may be simply a Point location, a bounding box, or more sophisticated geometries. This information can then be used to build a spatial index of the parsed documents.
We will present our implementation of Tika Spatial, which introduces the concept of a Geospatial Content Handler to the Tika API. We also discuss some of the experienced challenges, and outline potential applications which can benefit from Tika Spatial. As example we discuss Web Processing Services, which need to provide different data formats as input and output to the deployed processes. A first prototypical integration of Tika Spatial based on the 52° North WPS framework (http://www.52north.org/wps) will be presented.
Patrick Maue - Institute for Geoinformatics, University of Münster
Theodor Förster - University of Münster