Mulgara Project: Issueshttps://code.mulgara.org/https://code.mulgara.org/favicon.ico?15861924492009-08-13T22:22:01ZMulgara Project
Redmine Mulgara - Bug #191 (New): Search is slow when Lucene writes to diskhttps://code.mulgara.org/issues/1912009-08-13T22:22:01ZPaula Gearon
<p>Often, if a search is exceptionally slow, much of that time is wasted when Lucene decides that its results need to be sorted. If the quantity of data to be sorted does not fit into memory, then Lucene swaps it out to disk, radically slowing what would otherwise be a fast set of comparisons.</p>
<p>Most of this sorting is useless and can occur anywhere in the search process (e.g., when combining two datasets in the midst of a larger query, or sorting the entire result set before returning it).</p>
<p>Tuning Lucene (telling it to sort only when necessary) and/or changing the OQL used in searches may vastly improve response times in some cases.</p>
<p>Example: on a full corpus, enable date searching (currently disabled in <a class="wiki-page new" href="https://code.mulgara.org/projects/mulgara/wiki/SearchAction">SearchAction</a>.java lines 155 to 164 due to excessive query slowness) and search PLoS One for:</p>
<p>gene AND date:[2008-05-16 TO 2009-01-26]</p> Mulgara - Bug #189 (New): Cascading FILTERs ignores all FILTERs except the last onehttps://code.mulgara.org/issues/1892009-06-26T01:44:50ZPaula Gearon
<p>If multiple FILTERs are applied in a row, only the last one is recognized.</p> Mulgara - Feature #185 (New): Create HTTP testshttps://code.mulgara.org/issues/1852009-02-25T18:45:19ZPaula Gearon
<p>We need a framework for HTTP tests. This can be modeled on the JXUnit tests.</p> Mulgara - Feature #184 (New): Refactor Protocol Servlethttps://code.mulgara.org/issues/1842009-02-25T18:44:25ZPaula Gearon
<p>The protocol servlet is getting too many methods, and still needs to expand.</p>
<p>The proposed refactoring is to create classes based on the <em>resource type</em> of the request. Each class then manages operations on the type that it manages.</p>
<p>Resource types are identified by parameters in the request. Care must be taken with POST requests, as parameters may arrive in the content, and not the URL.</p>
<ul>
<li>Graphs. These are identified by the inclusion of a <strong>default-graph-uri</strong> parameter, and none of the other parameters that identify other types. A synonym for this parameter is <strong>graph</strong>.</li>
<li>Statements. These are identified by the inclusion of <strong>subject</strong>, <strong>predicate</strong>, <strong>object</strong>, and <strong>graph</strong> (or <strong>default-graph-uri</strong>) parameters. Synonyms include: <strong>subj</strong>/*s*, <strong>pred</strong>/*p* and <strong>obj</strong>/*o*. If only some of these are present, or duplicates occur, then this is an error.</li>
<li>Queries. These are not "resources", but are defined by the SPARQL spec.</li>
</ul>
The operations for these objects are:
<ul>
<li>Graph:
<ul>
<li>GET: Issue a CONSTRUCT for the entire graph. <em>TODO</em>.</li>
<li>PUT: Create a graph. <em>TODO: return 201</em></li>
<li>DELETE: Delete a graph.</li>
<li>POST: Load a file into a graph.</li>
<li>HEAD: Return the graph type, if known.</li>
</ul>
</li>
<li>Statement:
<ul>
<li>GET: An OK response if the statement exists. Otherwise return 404. <em>TODO</em>.</li>
<li>PUT: Create the statement. <em>TODO: return 201</em></li>
<li>DELETE: Remove the statement.</li>
<li>POST: Take an ID parameter and use this as a URI to reify the statement. <em>TODO.</em></li>
<li>HEAD: Get the Reification ID for the statement. <em>TODO.</em></li>
</ul>
</li>
<li>Query:
<ul>
<li>GET: Perform a read-only query.</li>
<li>PUT: N/A (should this do an insert/select for <strong>construct</strong> queries?)</li>
<li>DELETE: N/A (should this do an delete/select for <strong>construct</strong> queries?)</li>
<li>POST: Allows writable commands (TQL only. Possibly SPARQL Update.)</li>
<li>HEAD: Get the size of the result of a read-only query. <em>TODO</em>.</li>
</ul></li>
</ul> Mulgara - Bug #181 (New): Need Authorization for HTTPhttps://code.mulgara.org/issues/1812009-02-04T05:32:53ZPaula Gearon
<p>Now that Mulgara can be put on the web, the write operations need to be locked down, else it cannot be deployed. For HTTP we will need to employ authorization through standard means.</p>
<p>Several questions come out of this. First, how should authorization be handled? In the database? In an external file? Second, should it be done on an operation basis (GET is safe, but PUT/POST is not) or on a graph-by-graph basis like Mulgara security used to employ?</p> Mulgara - Bug #177 (New): Help does not work from jLine prompthttps://code.mulgara.org/issues/1772009-01-22T16:38:24ZPaula Gearon
<p>When "help;" is typed at the CLI provided by jLine, nothing happens.</p> Mulgara - Bug #174 (New): FILTERs external to an OPTIONAL applied inside the OPTIONALhttps://code.mulgara.org/issues/1742008-12-02T18:05:40ZPaula Gearon
<p>Based on Jim Irwin's report:<br />I wrote a SPARQL query that attempts to find all the root classes of an<br />ontology, i.e. those that are not subclasses of classes other than<br />owl:Thing and rdfs:Class.</p>
<p>Setting aside for the moment the question of whether this is the best<br />approach to finding the root classes, here is the SPARQL that I ran:</p>
<p>SELECT DISTINCT ?root<br />WHERE
{
{
{ ?root rdf:type owl:Class . }<br /> UNION
{ ?root rdf:type rdfs:Class . }<br /> FILTER ( (?root != owl:Thing) && (?root != rdfs:Class) )<br /> }<br /> OPTIONAL {<br /> ?root rdfs:subClassOf ?sup .<br /> FILTER ( (?sup != owl:Thing) && (?sup != rdfs:Class) )<br /> }<br /> FILTER ( !bound(?sup) )<br /> }<br />ORDER BY ?root<br />}}</p>
<p>I ran the query against the FOAF ontology, and got eight results:<br /> <a class="external" href="http://www.w3.org/2000/10/swap/pim/contact#Person">http://www.w3.org/2000/10/swap/pim/contact#Person</a><br /> <a class="external" href="http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing">http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing</a><br /> <a class="external" href="http://xmlns.com/wordnet/1.6/Agent">http://xmlns.com/wordnet/1.6/Agent</a><br /> <a class="external" href="http://xmlns.com/wordnet/1.6/Agent-3">http://xmlns.com/wordnet/1.6/Agent-3</a><br /> <a class="external" href="http://xmlns.com/wordnet/1.6/Document">http://xmlns.com/wordnet/1.6/Document</a><br /> <a class="external" href="http://xmlns.com/wordnet/1.6/Organization">http://xmlns.com/wordnet/1.6/Organization</a><br /> <a class="external" href="http://xmlns.com/wordnet/1.6/Person">http://xmlns.com/wordnet/1.6/Person</a><br /> <a class="external" href="http://xmlns.com/wordnet/1.6/Project">http://xmlns.com/wordnet/1.6/Project</a></p>
<p>However, there is actually another root class in FOAF, the<br />foaf:OnlineAccount class. That class is defined as</p>
<pre><code>&lt;owl:Class rdf:about="OnlineAccount"&gt;<br /> &lt;rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/&gt;<br /> &lt;vs:term_status&gt;unstable&lt;/vs:term_status&gt;<br /> &lt;rdfs:label&gt;Online Account&lt;/rdfs:label&gt;<br /> &lt;rdfs:comment&gt;An online account.&lt;/rdfs:comment&gt;<br /> &lt;rdfs:isDefinedBy rdf:resource=""/&gt;<br /> &lt;rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/&gt;<br /> &lt;/owl:Class&gt;</code></pre>
<p>Note the foaf:OnlineAccount rdfs:subClassOf owl:Thing triple.</p>
<p>Here's my question: In my SPARQL query's OPTIONAL group, I filter out<br />all the rdfs:subClassOf triples that contain owl:Thing or rdfs:Class as<br />the object. So I would expect that the ?sup variable would not be bound<br />outside that group, and that foaf:OnlineAccount would pass the<br />!bound(?sup) filter. However, it appears that it does not pass the<br />filter, as if the !bound filter were being applied inside the OPTIONAL<br />group rather than outside it.</p>
<p>Just for comparison, I ran the same query using Jena, and<br />foaf:OnlineAccount does appear in the results list when using Jena, as I<br />would expect it to.</p> Mulgara - Feature #172 (New): Add Talis changeset files as an uploadable file type on the SPARQL/...https://code.mulgara.org/issues/1722008-11-21T00:51:00ZPaula Gearon
<p>HTTP updates currently only allow a single TQL command.</p>
<p>SPARQL update will address some of this, but a more flexible approach will be to support Talis changeset files. Note that these files only permit modification on a single subject at a time, though they can include both inserts and deletes in the same file.</p>
<p><a class="external" href="http://vocab.org/changeset/schema">http://vocab.org/changeset/schema</a></p> Mulgara - Feature #170 (New): Default graph for TQLhttps://code.mulgara.org/issues/1702008-11-18T20:37:28ZPaula Gearon
<p>The SPARQL parser has a method to set the default graph on all queries. We would like the same to be enabled for TQL.</p> Mulgara - Feature #168 (New): Add collection support to RLog AND ruleshttps://code.mulgara.org/issues/1682008-11-10T17:37:38ZPaula Gearon
<p>RLog needs a collection syntax, which needs to be converted to the appropriate "walk" constraints in the rules engine.</p>
<p>For instance, see rule S35 in <a href="http://mulgara.org/trac/attachment/wiki/SKOS/skos.rlog" class="external">skos.rlog</a>:</p>
<pre></pre> Mulgara - Feature #166 (New): XA1.1 String pool needs cachehttps://code.mulgara.org/issues/1662008-11-04T16:33:00ZPaula Gearon
<p>We need caching for the XA1.1 string pool. This may not be needed when memory mapping is implemented.</p> Mulgara - Feature #165 (New): Memory Map large fileshttps://code.mulgara.org/issues/1652008-11-04T16:18:45ZPaula Gearon
<p>Large files used by the XA1.1 string pool are currently accessed through IO only, with some performance hit.</p>
<p>We want to include the option of using memory mapping. This will require a "common" interface for data, which we want to have look like a <a class="wiki-page new" href="https://code.mulgara.org/projects/mulgara/wiki/MappedByteBuffer">MappedByteBuffer</a>, only using long offsets instead of int. Like <a class="wiki-page new" href="https://code.mulgara.org/projects/mulgara/wiki/BlockFile">BlockFile</a>, the use of IO or mapping needs to be configurable.</p> Mulgara - Bug #159 (New): Correct license on recent sourcehttps://code.mulgara.org/issues/1592008-10-23T02:53:27ZPaula Gearon
<p>All recent files were supposed to be with the Apache license, but unfortunately my macros have still been creating OSL files. These need to be corrected.</p> Mulgara - Bug #154 (New): HTTP uploads of RDF/XML can give errorhttps://code.mulgara.org/issues/1542008-10-17T01:59:11ZPaula Gearon
<p>The Jena parser for RDF/XML can fail to register the final </rdf:RDF> tag if it is immediately followed by a new line. Loading from a file is OK, but an <a class="wiki-page new" href="https://code.mulgara.org/projects/mulgara/wiki/InputStream">InputStream</a> can lead to an error if this condition occurs.</p> Mulgara - Feature #153 (New): Nodetype resolver should allow access to datatypeshttps://code.mulgara.org/issues/1532008-10-17T01:35:31ZPaula Gearon
<p>Need to provide access to datatypes in constraints. This will need to be done with a resolver - probably the nodetype resolver.</p>
<p>Once done, the following TQL query should be possible:<br /><pre>
select $x
from <some:graph>
where $x <some:property> $value
and $value <rdf:type> <xsd:integer> in <sys:type>
</pre></p>
<p>In SPARQL, this will be:<br /><pre>
prefix some: <http://some.domain.com/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select $x
from <urn:some:graph
where {
$x some:property $value .
GRAPH <sys:type> { $value <rdf:type> <xsd:integer> }
}
</pre></p>
<p>This will be the equivalent (though more efficient) to the query:<br /><pre>
prefix some: <http://some.domain.com/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select $x
from <urn:some:graph
where {
$x some:property $value
FILTER (datatype($value) = xsd:integer)
}
</pre></p>