I’ve recently been collaborating with some like-minded colleagues in the BBC and other media organisations on a model for story-telling in News. Building on the work that the BBC has been doing to utilise Linked Data driven content aggregations, we wanted to look at how we might model the relationship between events as told by journalists. As Michael Smethurst has pointed out, the who/what/where/when aspect of reporting events gets you so far but leaves out the more interesting elements of ‘why’ and ‘because’:
“The more interesting part (for me) is the dependencies and correlations that exist between events because why is always the most interesting question and because the most interesting answer. Getting the Daily Mail and The Guardian to agree that austerity is happening is relatively easy, getting them to agree on why, and on that basis what should happen next, much more difficult.”
I’m often reminded by colleagues at work that the BBC has a reputation for quality factual reporting and impartiality, as it were to suggest that the editorialisation of news is something that only goes on in newspapers. I don’t think the BBC is just a glorified wire service aggregator and publisher; while it’s true that a good deal of BBC content does come direct from the wires there’s an inevitable process of editorial selection: what to leave in or out, the order to present reported events in, and the links to make between events. Also a lot of content produced by BBC journalists doesn’t fit in to a neat event model: features, analysis, even republishing the the odd bit of celebrity gossip:
(As Jonathan Stray has pointed out this is not a new trend in journalsim but has been a developing theme over the past century.)
From a data architecture point of view I’m particularly interested in modelling news stories as data. For the past decade the BBC News website has been a flat, page-based website where a page is equal to an article. Not a story mind you, but an article about a story. You might find the odd article that’s a one-off but the great majority of articles on the BBC news website wil be multiple accounts of the same story, retold as new developments occurr. There are three problems with this approach:
- duplication of content – because each article stands alone it has to re-tell the events that make up the story so far
- duplication in search engines – search engines will index each article separately, so when someone searches for details about a story they may get the BBC’s latest account or they may not be so lucky – most likely they’ll see multiple articles about the same storyline
- link curation scaling – links between articles that are about the same story have to be manually created and curated and immediately decay from the moment an article is published
The BBC is in the process of migrating its News website from static page publishing to a dynamic publishing platform based on a typical three-tier architecture: presentation – service – data. This was done for the BBC Sport website last year, and it’s particularly exciting as the data tier consists of both a content store (for articles) and a triple store that holds semantic annotations about the articles in the content store. The opportunity for a BBC News website running on this platform is to move from a page-based model of multiple articles about the same story to a story-driven model where journalists publish updates about storylines to the same (persistent) URL for that story. This was one of the motivations for us to collaborate on the Storyline Ontology.
So what is an update in this story-driven approach? From a web perspective I see an update as a fragment of a story, a development if you like. Physically it’s an asset: some text, an image, an audio clip, a video clip, a social media status update, etc. Updates might be represented in a URL structure as bbc.co.uk/news/storylineID#updateID, which could be a useful pattern for a few reasons:
- users of the website could share individual updates via social media
- updates could be presented in context of the wider narrative – an item in a timeline for example
- search engines should ignore the fragment identifier (the hash and everything after it) thereby only indexing the story page and removing the duplication that I mentioned above.
But coming back to Michael’s point at the start of this post, it’s not just the updates about reported events that are interesting in a storyline, it’s the selection that drives the narrative thread and points to things like causality – the ‘why’ rather than the ‘what’.
There’s been a fair bit of buzz lately about how some news outlets are paring back news to it’s bare bones – a ‘just the facts’ approach, and that these ‘facts’ can be treated like objects and instanced into news accounts. Object-oriented news is not a new idea, and I can see the attraction in a short-form-social-media-status-update driven world. But I think there’s a risk in this approach that if we overemphasise these fact-objects out of the context of a narrative thread then they take on a life of their own.
Building facts into a storyline involves an editorial process that (should) ensure provenance, attribution and maybe one day even openness about the editorial process that the journalist went through. I was workshopping up in Birmingham last week with the England editorial crew and Eileen Murphy used this phrase that has stuck in my mind: ‘a window on the newsroom’. Anything that increases the transparency of our journalism can only be a good thing.