I’ve recently been collaborating with some like-minded colleagues in the BBC and other media organisations on a model for story-telling in News. Building on the work that the BBC has been doing to utilise Linked Data driven content aggregations, we wanted to look at how we might model the relationship between events as told by journalists. As Michael Smethurst has pointed out, the who/what/where/when aspect of reporting events gets you so far but leaves out the more interesting elements of ‘why’ and ‘because’:
“The more interesting part (for me) is the dependencies and correlations that exist between events because why is always the most interesting question and because the most interesting answer. Getting the Daily Mail and The Guardian to agree that austerity is happening is relatively easy, getting them to agree on why, and on that basis what should happen next, much more difficult.”
I’m often reminded by colleagues at work that the BBC has a reputation for quality factual reporting and impartiality, as it were to suggest that the editorialisation of news is something that only goes on in newspapers. I don’t think the BBC is just a glorified wire service aggregator and publisher; while it’s true that a good deal of BBC content does come direct from the wires there’s an inevitable process of editorial selection: what to leave in or out, the order to present reported events in, and the links to make between events. Also a lot of content produced by BBC journalists doesn’t fit in to a neat event model: features, analysis, even republishing the the odd bit of celebrity gossip:
(As Jonathan Stray has pointed out this is not a new trend in journalsim but has been a developing theme over the past century.)
From a data architecture point of view I’m particularly interested in modelling news stories as data. For the past decade the BBC News website has been a flat, page-based website where a page is equal to an article. Not a story mind you, but an article about a story. You might find the odd article that’s a one-off but the great majority of articles on the BBC news website wil be multiple accounts of the same story, retold as new developments occurr. There are three problems with this approach:
- duplication of content – because each article stands alone it has to re-tell the events that make up the story so far
- duplication in search engines – search engines will index each article separately, so when someone searches for details about a story they may get the BBC’s latest account or they may not be so lucky – most likely they’ll see multiple articles about the same storyline
- link curation scaling – links between articles that are about the same story have to be manually created and curated and immediately decay from the moment an article is published
The BBC is in the process of migrating its News website from static page publishing to a dynamic publishing platform based on a typical three-tier architecture: presentation – service – data. This was done for the BBC Sport website last year, and it’s particularly exciting as the data tier consists of both a content store (for articles) and a triple store that holds semantic annotations about the articles in the content store. The opportunity for a BBC News website running on this platform is to move from a page-based model of multiple articles about the same story to a story-driven model where journalists publish updates about storylines to the same (persistent) URL for that story. This was one of the motivations for us to collaborate on the Storyline Ontology.
So what is an update in this story-driven approach? From a web perspective I see an update as a fragment of a story, a development if you like. Physically it’s an asset: some text, an image, an audio clip, a video clip, a social media status update, etc. Updates might be represented in a URL structure as bbc.co.uk/news/storylineID#updateID, which could be a useful pattern for a few reasons:
- users of the website could share individual updates via social media
- updates could be presented in context of the wider narrative – an item in a timeline for example
- search engines should ignore the fragment identifier (the hash and everything after it) thereby only indexing the story page and removing the duplication that I mentioned above.
But coming back to Michael’s point at the start of this post, it’s not just the updates about reported events that are interesting in a storyline, it’s the selection that drives the narrative thread and points to things like causality – the ‘why’ rather than the ‘what’.
There’s been a fair bit of buzz lately about how some news outlets are paring back news to it’s bare bones – a ‘just the facts’ approach, and that these ‘facts’ can be treated like objects and instanced into news accounts. Object-oriented news is not a new idea, and I can see the attraction in a short-form-social-media-status-update driven world. But I think there’s a risk in this approach that if we overemphasise these fact-objects out of the context of a narrative thread then they take on a life of their own.
Building facts into a storyline involves an editorial process that (should) ensure provenance, attribution and maybe one day even openness about the editorial process that the journalist went through. I was workshopping up in Birmingham last week with the England editorial crew and Eileen Murphy used this phrase that has stuck in my mind: ‘a window on the newsroom’. Anything that increases the transparency of our journalism can only be a good thing.
I blogged previously my early thinking about how we needed a core news model to describe basic real-world concepts (people, places, organisations) in the context of a news event, and that those events could be organised into stories. A couple of months on and we have got that corenews model installed in the BBC’s linked data platform, and journalists are now able to annotate news content with these concepts.
The URI for the model is http://bbc.co.uk/ontologies/news/ and you can get HTML or RDF/turtle from that address. The updated ontology diagram looks like this:
You might notice that the stories class has been pulled out of here; there’s a lot of interest at the BBC and other news organisations in to how linked data and story-telling can work. We kicked off a project to collaborate with The Guardian and PA on an open model for this, which I will blog more about soon.
BBC News desktop pages are now carrying a minimal set of Open Graph Protocol metadata, for example:
<meta property=”og:title” content=”Facebook’s Graph targets Google”/>
<meta property=”og:type” content=”article”/>
<meta property=”og:url” content=”http://www.bbc.co.uk/news/technology-21040363″/>
<meta property=”og:site_name” content=”BBC News”/>
<meta property=”og:image” content=”http://news.bbcimg.co.uk/media/images/65316000/jpg/_65316583_65315074.jpg”/>
Facebook’s data model is defined here as RDF/TTL: http://ogp.me/ns/ogp.me.ttl, which is really just a bunch of properties; I’ve drawn out the diagram below to show what I think the idea is:
Facebook don’t publish the domain of their properties so I’ve made up a class called og:Thing but you get the idea. It looks like Google will render Rich Snippets from OGP mark-up, treating it as a single RDFa node. But there’s not very much open about this open graph.
Over the past couple of weeks I’ve been putting together a basic data model for BBC News. The purpose of the model is to allow us to make typed associations between real-world concepts and creative works published by journalists. We are interested in four classes of real-world concepts:
- intangibles (topics or themes)
Additionally we have events, which are really the intersection of people/organisations doing things at a particular place and time, as described in the much-used event ontology. We also have a sixth class called ‘story’ – a collection of events, drived from the stories ontology.
The typed associations that are allowed between the above concepts and the published works are currently:
and soon I hope we can add ‘took place in’ or something similar for location-based associations (most news events happen in a place but are not usually about that place).
Here’s a v0.2 representation of this model (I left 0.1 on a bit of paper in the pub):
The idea is that journalists will apply instances of these classes, together with their typed relationship, as part of the publishing process for BBC News online. We can then expose these instances as navigation routes (like the tags on the Guardian’s website) to allow users to browse more news about that person, organisation, place or event. At the same time publishing indexes (aggregation pages) for these instances will help improve BBC News’s Search Engine Optimisation and help drive traffic to the site.
RDF of this model is here.
I migrated the web server that runs this blog to my new 512MB Pi, so was looking for a use for my original 256MB one. Although I have a perfectly good radio in my bedroom I like the idea of streaming radio to the command line, and maybe using the Pi as an alarm clock with some cron scripts.
A bit of Googling suggested that mpg123 and their like will struggle with the sort of playlist files and stream encoding used by the BBC, and so mplayer looked like the best candidate to have working stream with minimal fuss:
sudo apt-get install mplayer
mplayer -playlist "http://bbc.co.uk/radio/listen/live/r4.asx"
The above saw mplayer throwing some errors about unavailable pulse audio drivers. I added an argument to force use of the alsa drivers:
mplayer -ao alsa -playlist "http://bbc.co.uk/radio/listen/live/r4.asx"
Now mplayer seemed to fetch the stream ok and looked like it was playing, but I heard no sound. (I’m using a portable speaker connected to the Pi’s headphones socket). I checked the volume level in Alsamixer, and that was fine. Back to Google again and the Raspberry Pi troubleshooting guide suggested that I could force audio to route via a particular output – in this case the one I wanted for the headphone socket was:
sudo amixer cset numid=3 1
Fired up mplayer again and yay, Radio 4 Next step is to add some cron scripts to start and stop the radio in the mornings.
For the past two days I’ve been on a BBC training course about how to do better presentations. Not the Powerpoint kind mind you but the standing-up-in-front-of-a-bunch-of-strangers kind. We had a great coach called Sandie Miller who trained us from her acting experience in things like breath control and eye contact. Here’s a short clip of me rambling on about linked data after some pointers from her (yes I know should have tucked my shirt in):