This page describes the program and the files for the experimental evaluation of checking query completeness using completeness statements. The main reference of the experiment which contains the formalization, techniques, and discussion of the experimental results is under review for a journal.
We use the direct-statement fragment (i.e., with no qualifiers nor references) of Wikidata, consisting of around 110 mio triples: Wikidata dump in Feb 2016. In the experiment, we use Jena-TDB as the triple store. For convenience, the zipped TDB files that store the data graph for the experiment is available here.
The queries in the experiments are generated from the human-made queries openly available on Wikidata: Wikidata queries in April 2016
We extract the BGPs of those queries and transform the vocabulary of the queries to the direct-statement vocabulary. These BGPs (download here) act as a `base’ for generating our experiment queries. The experiment program will generate on the fly the corresponding queries to be checked for their completeness.
The completeness statements (and completeness templates) are generated on the fly by the experiment program based on the queries.
The reasoning program and experiment framework are implemented in Java using the Apache Jena library. The source code is available as an Eclipse workspace.
The ready-to-use JAR file of the program is available here.
To run the program, the folder of the TDB files (‘tdb-PREPROCESSED-wikidata-simple-statements-20160201’) and the query file ‘queries.txt’ must be put together in the folder of the JAR file. The command scheme is as follows:
java -jar data-aware-completeness-experiment.jar
Note the experiment output is buffered via ‘>’. The output summary is found in the last three output sections for completeness checking with all optimizations, with partial matching only, and no optimizations.
~this manual file is created by Fariz Darari, email: fariz.darari@stud-inf.unibz.it