When is Linked Data not Linked Data? – A summary of the debate

One of the activities identified during last December’s Semantic Technology Working Group meeting to be taken forward by CETIS was the production of a briefing paper that disambiguated some of the terminology for those that are less familiar with this domain. The following terms in particular were highlighted:

  • Semantic Web
  • semantic technologies
  • Linked Data
  • linked data
  • linkable data
  • Open Data

I’ve finally started drafting this briefing paper and unsurprisingly defining the above terms is proving to be a non-trivial task! Pinning down agreed definitions for Linked Data, linked data and linkable data is particularly problematic. And I’m not the only one having trouble. If you look up Semantic Web and Linked Data / linked data on wikipedia you will find entries flagged as having multiple issues. It does rather feel like we’re edging close to holy war territory here. But having said that I do enjoy a good holy war as long as I’m watching safely from the sidelines.

So what’s it all about? As far as I can make out much of the debate boils down to whether Linked Data must adhere to the four principles outlined in Tim Berners Lee’s Linked Data Design Issues, and in particular whether use of RDF and SPARQL is mandatory. Some argue that RDF is integral to Linked Data, other suggest that while it may be desirable, use of RDF is optional rather than mandatory. Some reserve the capitalized term Linked Data for data that is based on RDF and SPARQL, preferring lower case “linked data”, or “linkable data”, for data that uses other technologies.

The fact that the Linked Data Design Issues paper is a personal note by Tim Berners Lee, and is not formally endorsed by W3C also contributes to the ambiguity. The note states:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

I’ll refer to the steps above as rules, but they are expectations of behaviour. Breaking them does not destroy anything, but misses an opportunity to make data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the web. (Berners Lee, http://www.w3.org/DesignIssues/LinkedData.html)

In the course of trying to untangle some of the arguments both for and against the necessity of using RDF and SPARQL I’ve read a lot of very thoughtful blog posts which it may be useful to link to here for future reference. Clearly these are not the only, or indeed the most recent, posts that discuss this most topical of topics, these happen to be the ones I have read and which I believe present a balanced over view of the debate in such a way as to be of relevance to the JISC CETIS community.

Linked data vs. Web of data vs. …
– Andy Powell, Eduserv, July 2009

The first useful post I read on this particular aspect of the debate is Andy Powell’s from July 2009. This post resulted from the following question Andy raised on twitter;

is there an agreed name for an approach that adopts the 4 principles of #linkeddata minus the phrase, “using the standards (RDF, SPARQL)” ??

Andy was of the opinion that Linked Data “implies use of the RDF model – full stop” adding:

“it’s too late to re-appropriate the “Linked Data” label to mean anything other than “use http URIs and the RDF model”.”

However he is unable to provide a satisfactory answer to his own question, i.e. what do you call linked data that does not use the RDF model, and despite exploring alternative models he concludes by professing himself to be worried about this.

Andy returned to this theme in a more recent post in January 2010, Readability and linkability which ponders the relative emphasis given to readability and linkability by initiatives such as the JISC Information Environment. Andy’s general principles have not changed but he presents term machine readable data (MRD) as a potential answer to the question he originally asked in his earlier post.

Does Linked Data need RDF?
– Paul Miller, The Cloud of Data, July 2009

Paul Miller’s post is partially a response to Andy’s query. Paul begins by noting that while RDF is key to the Semantic Web and

“an obvious means of publishing — and consuming — Linked Data powerfully, flexibly, and interoperably.”

he is uneasy about conflating RDF with Linked Data and with assertions that

“‘Linked Data’ can only be Linked Data if expressed in RDF.”

Paul discusses the wording an status of Tim Berners Lee’s Linked Data Design Issues and suggest that it can be read either way. He then goes on to argue that by elevating RDF from the best mechanism for achieving Linked Data to the only permissible approach we risk barring a large group

“with data to share, a willingness to learn, and an enthusiasm to engage.”

Paul concludes by asking the question:

“What are we after? More Linked Data, or more RDF? I sincerely hope it’s the former.”

No data here – just Linked Concepts and Linked, open, semantic?
– Paul Walk, UKOLN, July & November 2009

Paul Walk has published two useful posts on this topic; the first summarising and commenting on the debate sparked by the two posts above, and the second following the Giant Global Graph session at the CETIS 2009 Conference. This latter post presents a very useful attempt at disambiguating the terms Open data , Linked Data and Semantic Web. Paul also tries to untangle the relationship between these three memes and helpfully notes:

  • data can be open, while not being linked
  • data can be linked, while not being open
  • data which is both open and linked is increasingly viable
  • the Semantic Web can only function with data which is both open and linked

So What Is It About Linked Data that Makes it Linked Data™?
- Tony Hirst, Open University, March 2010

Much more recently Tony Hirst published this post which begins with a version of the four Linked Data principles cut from wikipedia. This particular version makes no mention of either RDF or SPARQL. Tony goes on to present a very neat example of data linked using HTTP URI and Yahoo Pipes and asks

“So, the starter for ten: do we have an example of Linked Data™ here?”

Tony broadly believes the answer is yes and is of a similar opinion to Paul Miller that too rigid adherence to RDF and SPARQL

“will put a lot of folk who are really excited about the idea of trying to build services across distributed (linkable) datasets off…”

Perhaps more controversially Tony questions the necessity of universal unique URIs that resolve to content suggesting that:

“local identifiers can fulfil the same role if you can guarantee the context as in a Yahoo Pipe or a spreadsheet”

Tony signs off with:

“My name’s Tony Hirst, I like linking things together, but RDF and SPARQL just don’t cut it for me…”

Meshing up a JISC e-learning project timeline, or: It’s Linked Data on the Web, stupid
– Wilbert Kraan, JISC CETIS, March 2009

Back here at CETIS Wilbert Kraan has been experimenting with linked data meshups of JISC project data held in our PROD system. In contrast to the approach taken by Tony, Wilbert goes down the RDF and SPARQL route. Wilbert confesses that he originally believed that:

“SPARQL endpoints were these magic oracles that we could ask anything about anything.”

However his attempts to mesh up real data sets on the web highlighted the fact that SPARQL has no federated search facility.

“And that the most obvious way of querying across more than one dataset – pulling in datasets from outside via SPARQL’s FROM – is not allowed by many SPARQL endpoints. And that if they do allow FROM, they frequently cr*p out.”

Wilbert concludes that:

“The consequence is that exposing a data set as Linked Data is not so much a matter of installing a SPARQL endpoint, but of serving sensibly factored datasets in RDF with cool URLs, as outlined in Designing URI Sets for the UK Public Sector (pdf).”

And in response to a direct query regarding the necessity of RDF and SPARQL to Linked Data Wilbert answered

“SPARQL and RDF are a sine qua non of Linked Data, IMHO. You can keep the label, widen the definition out, and include other things, but then I’d have to find another label for what I’m interested in here.”

Which kind of brings us right back to the question that Andy Powell asked in July 2009!

So there you have it. A fascinating but currently inconclusive debate I believe. Apologies for the length of this post. Hopefully one day this will go on to accompany our “Semantic Web and Linked Data” briefing paper.