Posts Tagged ‘web’

Using Jena as a SPARQL endpoint

Monday, January 11th, 2010

I’ve been involved in a few projects at work over the last couple of years that have made use of Semantic Web technologies (triple stores, RDF, OWL, SPARQL etc). For most of these I’ve made of ARC, a really great PHP library by Ben Nowack for interacting with RDF and triple stores. As great as ARC is, it does have a few drawbacks such as being limited to MySQL triple stores, some issues with OPTIONAL queries and it doesn’t entirely support the SPARQL specification.

For these reasons and for general flexibility, my current project wanted to be able to easily swap the underlying triple store from ARC to Jena as needed so I needed to investigate how to expose a Jena triple store as a SPARQL endpoint. After working this out, I now really really appreciate how easy ARC makes this.

Jena doesn’t appear to ship with the ability to expose the ARQ SPARQL processor as a SPARQL endpoint and hence you need to make use of a separate piece of software called Joseki. The following is the list of things I needed to do to get this working in my environment. Note that your setup may have different requirements and also I may have completely misunderstood the best way of doing this!

  1. Setup a database to use as your triple store and get a JDBC driver so Joseki can interact with it from Java
  2. Download and extract Joseki
  3. Add the JDBC driver to the Joseki classpath (e.g. for Windows by adding the following line to bin\joseki_path.bat: set CP=%CP%;C:\my_jdbc_driver\my_jdbc_driver.jar)
  4. Add the following to joseki-config.ttl:
     <#myProjectUpdate>
       rdf:type            joseki:Service ;
       rdfs:label          "My Project SPARQL/Update" ;
       joseki:serviceRef   "sparql/myproject/update" ;
       joseki:dataset      <#myProject> ;
       joseki:processor    joseki:ProcessorSPARQLUpdate .
    
     <#myProjectRead>
       rdf:type            joseki:Service ;
       rdfs:label          "SPARQL" ;
       joseki:serviceRef   "sparql/myproject/read" ;
       joseki:dataset      <#myProject> ;
       joseki:processor    joseki:ProcessorSPARQL_FixedDS .
    
     <#myProject>
       rdf:type            ja:RDFDataset ;
       rdfs:label          "My Project" ;
       ja:defaultGraph     <#myProjectDB> .
    
     <#myProjectDB>
       rdf:type            ja:RDBModel ;
       ja:connection       [
                             ja:dbType "MySQL" ;
                             ja:dbURL           ;
                             ja:dbUser         "myproject-database-username" ;
                             ja:dbPassword     "myproject-database-password" ;
                             ja:dbClass        "com.mysql.jdbc.Driver"
                            ] ;
       ja:reificationMode    ja:minimal ;
       ja:modelName        "DEFAULT" .
        
  5. Set the JOSEKIROOT environment variable to the location you extracted Joskei
  6. Run Joseki (from it’s directory) by executing bin/rdfserver.bat

Note that I wanted to be able to make use of SPARUL to update data using the SPARQL endpoint. In ARC I can use SPARQL+ (which is effectively the same for my purposes) on the same endpoint as normal SPARQL queries. For Joseki however, I needed to expose two different endpoints, one for standard SPARQL queries and one for updating.

The one thing I haven’t yet worked out how to do it to be able to use named graphs in my Jena triple store when inserting data. I discovered that the SPARUL update specification requires you to create the graph first (unlike ARC’s SPARQL+) but executing e.g. CREATE GRAPH <http://mygraph/> seems to fail silently as any following INSERT INTO <http://mygraph/> statement fails saying that the graph doesn’t exist. Something to keep investigating. It may be something to do with support for the different types of Jena store (RDB, SDB, TDB, etc) which I don’t fully understand yet (I think my instructions above are using RDB which appears to be old but I couldn’t get TDB or SDB working at all).

So all in all I’m pleased to have worked out how to set this up but I will most certainly continue to use ARC where possible as Jena environments seem unnecessarily complex (although this might simply be because it tends to support the W3 specifications fully!).

Dojo checkboxes

Thursday, July 3rd, 2008

I recently hit this snag when working with Dojo. Basically I wanted to set the checked status of a checkbox on a webpage programmatically. Simple you might think? Apparently not as easy as it should be.

Interaction with checkboxes has changed slightly in Dojo 1.1 (apparently) but myCheckbox.setValue(true) should be valid. When calling dojo.byId('my-checkbox-id').setValue(true), I was getting an error saying the method didn’t exist. The object was definitely the checkbox as I could determine the correct checked state from that object (myCheckbox.checked) so I was very confused. I then remembered another way to access objects with Dojo is using the HTML attribute ‘jsId‘. This creates a global javascript variable referring to that object in the DOM. So I set something like jsId='myCheckbox' and then called myCheckbox.setValue(true) and it worked!

Very odd behaviour. I can only guess that the javascript object created by Dojo using jsId and dojo.byId() is a different bit of code and creates an object pointing to the DOM checkbox object in a different way. Very strange but at least there’s the workaround above…

Increasing SPARQL performance

Monday, April 7th, 2008

Recently I’ve been playing (for work, honest!) with the Semantic Web related technology SPARQL, a query language for RDF. I ended up creating some very complex queries for my OWL ontologies that were being executed through Jena, a Java framework for building Semantic Web apps. These queries were running so incredibly slowly that I thought we might have taken the completely wrong direction for the project and would have to start from scratch. Actually it turns out I was doing something wrong which is always good to hear…

After looking at some code by someone else in my department that had done some similar work before, we realised that they were setting a specification that defines the language and reasoner to use when creating the ontology model. Now they had no idea why they were setting that but doing the same in my code resulted in a 700% performance enhancement! Quite a productive day :D .

So if you’re ever using SPARQL and Jena and having some performance issues, remember to set the appropriate specification (in my case OWL_DL_MEM).

Web 2.gareth

Monday, December 31st, 2007

Ok, so given that I enjoy working with Web technologies and I’m interested in many of the Web 2.0 (or whatever you want to call it) ideas, I’ve been trying to get more involved. After trying to use Flickr a bit more socially, playing around with Twitter and even starting a blog, I realised I’ve never even bought a domain name. So now I have. This blog is now to be found at blog.garethj.com so please update your links (including the new feed by feedburner) as I’ll be removing the wordpress.com one at some point soon. I might have splashed out on a new domain but I’m still cheap so it’s being hosted for free. That may change if it all falls apart! Or I could host it myself at home…

I also thought while I’m at it I’d play around a bit more so I’ve setup a couple of subdomains, one for my little slug machine at home and playpen.garethj.com as a place for playing around with a few things (nothing useful to see here yet though).

My mail is now handled by Google Apps so I can have email addresses such as somecompany@garethj.com for any website/company I sign up to to see where spam might originate from. I might end up with problems as Roo did so I’ll probably end up manually add these company names to my filters and let the rest fall into the spam folder. Or see how Google manages it first – they might do alright!