querying RDF in Ruby with RDF.rb

I recently had to make a tool for work that allowed me to see the linked data graphs that BBC journalists are starting to create as they annotate news content. Ruby is my hacking language of choice so this blog post describes how I used @gkellog’s RDF.rb library to:

  • fetch RDF graphs from the BBC’s linked data platform’s HTTPS API (via Restclient)
  • parse the data with RDF::Turtle::Reader
  • query it with RDF::Query and process the resulting Solutions

Disclaimer – I’m an amateur programmer so some of this may look horribly hacky to a Ruby or RDF expert; in my defence all I can say is that it works 🙂

getting data from the API

The BBC’s linked data platform sits behind a REST API that uses HTTPS and requires RSA cert authentication (the guys working on it plan a public API sometime soon, bit for now its use is internal only). Using the restclient gem makes getting data from this kind of API pretty straightforward:

require 'restclient'

SSL = {
  :ssl_client_cert => OpenSSL::X509::Certificate.new(File.read("/path/to/my/client.crt")),
  :ssl_client_key => OpenSSL::PKey::RSA.new(File.read("/path/to/my/client.key")),
  }

def getThingGraph(guid)
  url = "https://api.live.bbc.co.uk/ldp-writer/thing-graphs?guid=" + guid
  data = RestClient::Resource.new(url, SSL).get({:accept => "application/rdf+turtle"})
end

so now I have a String object that contains some RDF/turtle graphs. For the sake of completeness here’s an example of what the API response looks like:

<http://www.bbc.co.uk/things/ffc9b446-97b0-4cec-9f4f-dbd5d8238dad#id>
      a       <http://www.bbc.co.uk/ontologies/cms/ManagedThing> , <http://www.bbc.co.uk/ontologies/news/Person> ;
      <http://www.w3.org/2000/01/rdf-schema#seeAlso>
              <http://www.chucknorris.com/> ;
      <http://www.bbc.co.uk/ontologies/coreconcepts/disambiguationHint>
              "Carlos Ray 'Chuck' Norris (born March 10, 1940) is an American martial artist and actor. After serving in the United States Air Force, he began his rise to fame as a martial artist, and has since founded his own school, Chun Kuk Do." ;
      <http://www.bbc.co.uk/ontologies/coreconcepts/preferredLabel>
              "Chuck Norris" ;
      <http://www.bbc.co.uk/ontologies/coreconcepts/sameAs>
              <http://dbpedia.org/resource/Chuck_Norris> .

<http://www.bbc.co.uk/contexts/85390773-6985-49c9-aef1-ec3763f258ab#id>
      a       <http://www.bbc.co.uk/ontologies/provenance/ThingGraph> ;
      <http://www.bbc.co.uk/ontologies/provenance/provided>
              "2013-11-07T17:20:39+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
      <http://www.bbc.co.uk/ontologies/provenance/provider>
              <mailto:jeremy.tarling@bbc.co.uk> .

The next step was to read the response into an RDF graph that can be queried – “get me all the objects (values) of triples with the predicate <core:sameAs>” sort of thing.

reading the API response into an RDF graph

This is the bit that I got a bit stuck on. There are some great examples linked to from the RDF.rb page but none of them seemed to do exactly what I wanted, namely to work with the in-memory String object that restclient had made for me.

I ended up with a two step process: first to read the string using RDF::Turtle::Reader

rdf_doc = RDF::Turtle::Reader.new(data)

and then to append the resulting RDF data to a RDF::Graph.new object so it could be queried with RDF::Query

graph = RDF::Graph.new << rdf_doc

The getThingGraph method now looks like this:

 def getThingGraph(guid)
  url = "https://api.live.bbc.co.uk/ldp-writer/thing-graphs?guid=" + guid
  data = RestClient::Resource.new(url, SSL).get({:accept => "application/rdf+turtle"})
  rdf_doc = RDF::Turtle::Reader.new(data)
  graph = RDF::Graph.new << rdf_doc
end

which results in an object that can now be queried.

querying the graph and processing the results

The RDF::Query class allows you to define a query pattern. In my example I’m going to define a simple query that looks for any triples that have the predicate rdf:type – a useful thing to get an idea of the sort of data you are dealing with:

@thingType = RDF::Query.execute(graph, {
  :thing => {RDF::URI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type") => :type}
})

Executing the query gets you an RDF::Query::Solutions object which has some nice methods for examining graph datasets. Note that that’s ‘Solutions’, not ‘Solution’ – in other words it’s a collection so you can iterate over each solution that matched your query. In my case I’m presenting the results in a Sinatra app so they surface via an erb template:

<% @thingType.each do |thing| %>
  <%= thing[:type] %>
<% end %>

And there you have it – in my example graph above the result tells me that Chuck has three types, a cms:ManagedThing, a news:Person and a provenance:ThingGraph.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s