Template Generation

Input is a question string and a language tag:

{ "string": " ", "language": " " }

Output is a list of templates that specify a query together with information about the slots and a score:

[ { "query": " ", "slots": [ { "s": " ", "p": " ", "o": " " } ], "score": 0.0 } ]

Example

For the example question How many students does the Free University in Amsterdam have? templates like the following ones could be generated:

[ { "query": "SELECT COUNT(?v3) WHERE {

?v1 ?p1 ?v2 .

?v1 ?p2 ?v3 .

?v3 ?p3 ?v4 . } " ,

"slots": [

  • {"s": "v2", "p": "verbalization", "o": "Amsterdam"},

  • {"s": "v2", "p": "is", "o": "owl:NamedIndividual" },

  • {"s": "p1", "p": "verbalization", "o": "in"},

  • {"s": "p1", "p": "is", "o": "owl:Property"},

  • {"s": "p2", "p": "verbalization", "o": "have"},

  • {"s": "p2", "p": "is", "o": "owl:ObjectProperty"},

  • {"s": "v4", "p": "verbalization", "o": "students"},

  • {"s": "v4", "p": "is", "o": "owl:Class"},

  • {"s": "p3", "p": "is", "o": "<http://lodqa.org/vocabulary/sort_of>"}

],

"score": 1.0

},

{ "query": "SELECT ?v2 WHERE { ?v1 ?p1 ?v2 . } ",

"slots": [

  • {"s": "v1", "p": "verbalization", "o": "Free University in Amsterdam"},

  • {"s": "v1", "p": "is", "o": "owl:NamedIndividual"},

  • {"s": "p1", "p": "verbalization", "o": "students"},

  • {"s": "p1", "p": "is", "o": "owl:DatatypeProperty"}

],

"score": 0.8

},

...

]

Approach

First, the question is linguistically analysed, annotating it with part-of-speech tags, dependency relations, and semantic role labels. Second, the resulting parse tree is transformed into a template, covering one possibility of the how natural language expressions correspond to constructs in the target SPARQL query. This is the template that is most faithful to the linguistic structure of the question.

In order to also account for structural differences between the question and the target query, the template is modified by a sequence of steps that collapse or expand triples, yielding additional templates.

The scoring of the templates follows a simple heuristics computing the number of nodes in the query body that are neither projection variables nor slots. In addition, each rewriting operation reduces the score by a predetermined factor.

Code

Implementations are available on GitHub:

Related work

The idea of template generation is also exploited in the following question answering systems: