Posts Tagged ‘sparql’

Using Jena as a SPARQL endpoint

Monday, January 11th, 2010

I’ve been involved in a few projects at work over the last couple of years that have made use of Semantic Web technologies (triple stores, RDF, OWL, SPARQL etc). For most of these I’ve made of ARC, a really great PHP library by Ben Nowack for interacting with RDF and triple stores. As great as ARC is, it does have a few drawbacks such as being limited to MySQL triple stores, some issues with OPTIONAL queries and it doesn’t entirely support the SPARQL specification.

For these reasons and for general flexibility, my current project wanted to be able to easily swap the underlying triple store from ARC to Jena as needed so I needed to investigate how to expose a Jena triple store as a SPARQL endpoint. After working this out, I now really really appreciate how easy ARC makes this.

Jena doesn’t appear to ship with the ability to expose the ARQ SPARQL processor as a SPARQL endpoint and hence you need to make use of a separate piece of software called Joseki. The following is the list of things I needed to do to get this working in my environment. Note that your setup may have different requirements and also I may have completely misunderstood the best way of doing this!

  1. Setup a database to use as your triple store and get a JDBC driver so Joseki can interact with it from Java
  2. Download and extract Joseki
  3. Add the JDBC driver to the Joseki classpath (e.g. for Windows by adding the following line to bin\joseki_path.bat: set CP=%CP%;C:\my_jdbc_driver\my_jdbc_driver.jar)
  4. Add the following to joseki-config.ttl:
     <#myProjectUpdate>
       rdf:type            joseki:Service ;
       rdfs:label          "My Project SPARQL/Update" ;
       joseki:serviceRef   "sparql/myproject/update" ;
       joseki:dataset      <#myProject> ;
       joseki:processor    joseki:ProcessorSPARQLUpdate .
    
     <#myProjectRead>
       rdf:type            joseki:Service ;
       rdfs:label          "SPARQL" ;
       joseki:serviceRef   "sparql/myproject/read" ;
       joseki:dataset      <#myProject> ;
       joseki:processor    joseki:ProcessorSPARQL_FixedDS .
    
     <#myProject>
       rdf:type            ja:RDFDataset ;
       rdfs:label          "My Project" ;
       ja:defaultGraph     <#myProjectDB> .
    
     <#myProjectDB>
       rdf:type            ja:RDBModel ;
       ja:connection       [
                             ja:dbType "MySQL" ;
                             ja:dbURL           ;
                             ja:dbUser         "myproject-database-username" ;
                             ja:dbPassword     "myproject-database-password" ;
                             ja:dbClass        "com.mysql.jdbc.Driver"
                            ] ;
       ja:reificationMode    ja:minimal ;
       ja:modelName        "DEFAULT" .
        
  5. Set the JOSEKIROOT environment variable to the location you extracted Joskei
  6. Run Joseki (from it’s directory) by executing bin/rdfserver.bat

Note that I wanted to be able to make use of SPARUL to update data using the SPARQL endpoint. In ARC I can use SPARQL+ (which is effectively the same for my purposes) on the same endpoint as normal SPARQL queries. For Joseki however, I needed to expose two different endpoints, one for standard SPARQL queries and one for updating.

The one thing I haven’t yet worked out how to do it to be able to use named graphs in my Jena triple store when inserting data. I discovered that the SPARUL update specification requires you to create the graph first (unlike ARC’s SPARQL+) but executing e.g. CREATE GRAPH <http://mygraph/> seems to fail silently as any following INSERT INTO <http://mygraph/> statement fails saying that the graph doesn’t exist. Something to keep investigating. It may be something to do with support for the different types of Jena store (RDB, SDB, TDB, etc) which I don’t fully understand yet (I think my instructions above are using RDB which appears to be old but I couldn’t get TDB or SDB working at all).

So all in all I’m pleased to have worked out how to set this up but I will most certainly continue to use ARC where possible as Jena environments seem unnecessarily complex (although this might simply be because it tends to support the W3 specifications fully!).

Increasing SPARQL performance

Monday, April 7th, 2008

Recently I’ve been playing (for work, honest!) with the Semantic Web related technology SPARQL, a query language for RDF. I ended up creating some very complex queries for my OWL ontologies that were being executed through Jena, a Java framework for building Semantic Web apps. These queries were running so incredibly slowly that I thought we might have taken the completely wrong direction for the project and would have to start from scratch. Actually it turns out I was doing something wrong which is always good to hear…

After looking at some code by someone else in my department that had done some similar work before, we realised that they were setting a specification that defines the language and reasoner to use when creating the ontology model. Now they had no idea why they were setting that but doing the same in my code resulted in a 700% performance enhancement! Quite a productive day :D .

So if you’re ever using SPARQL and Jena and having some performance issues, remember to set the appropriate specification (in my case OWL_DL_MEM).