Archive for the ‘semantic technologies’ Category

Slides from Flexible Service Delivery Workshop Jan 2010

Thursday, January 28th, 2010

Here are the slides from my presentation at the Flexible Service Delivery Strategic Technologies Group meeting of Jan 25th 2010 entitled: Beating Information Mess (without SOA).

This is a high level description of some examples of use of resource-oriented and semantic web approaches drawn from existing published material.

Linked Data: Where is the Low-hanging Fruit?

Monday, December 14th, 2009

Here are my thoughts on some generic considerations, some mentioned in the recent SemTech meeting and some I jotted down following the CETIS Conference, on where low hanging fruit may be found. NB these are “generic” and not specific; the idea is that they might be useful in judging the likelihood of success of some specific good/cool/potential ideas. I am referring here to exposure of Linked Data on the public web. In no particular order:

  • Ariadne’s Thread. Does the current state of (poor) information management present a problem and is there resolve to find your way out of the maze? If it has become necessary to sit down and get your domain model straight and re-organise/re-engineer (some of) your information systems then you have done most of the hard work necessary for exposing Linked Data (i.e. Open to some degree) and you could usefully adopt Linked Data principles for private use.
  • Ox Pecker. Is there a mutual benefit between you and another data provider? Can this be amplified by explicit technical, financial or effort (etc) support one or both ways? This builds on the essential attribute of linking.
  • Sloping Shoulders. Can you avoid creating an ontology? No-one else will care about it if you do.
  • Aspirin. Does anyone have a headache that can be made better? Is there an institutional/business problem that can be solved? (this is not the same as Ariadne’s Thread)
  • Blue Peter. Is the creation or acquisition and processing and dissemination of information already something you do? Is the quality and availability of the information something you invest effort in? This is a ready-made candidate for Linked Data.
  • Cow Path. Is information you already make available (as web pages or PDF etc) used by others in ways you know about and understand?
  • UFO. Do people want to refer to something you have or do but don’t have an unambiguous way of identifying what they are referring to? Could you provide a URI for the thing and information about it?
  • 2+2=5. Is there clear value to be gained from linking the information that is to be exposed? Can people do something new, do they want to and will they continue to want to?
  • Chatham House. Avoid exposing data that identifies, or could identify, a person.

Universities and Colleges in the Giant Global Graph

Friday, November 13th, 2009

Earlier this week I facilitated a session at the 2009 CETIS Conference to explore some of the opportunities that Linked Data might offer to universities and colleges and to help us (CETIS in particular but also JISC and our peer “innovation support centre”, UKOLN) work out what work should be done to move us closer to realising benefits in the management and delivery of Higher and Further Education from adoption of Linked Data principles either by universities and colleges or by other public or private bodies that form part of the overall H/FE system. An approximately factual account of the session, relevant links etc is available from the Universities and Colleges in the Giant Global Graph session page. This post contains more personal rambling opinion.

Paul Walk seems to be the first of the session participants to blog. He captures one of the axes of discussion well and provides an attractive set of distinctions. It seems inevitable that the semantic vultures will circle about terminological discussions and it is probably best for us to spell out what we mean rather than use sweeping terms like “Linked Data” as I do in the opening paragraph. My inclination is not to be quite as hard-line as Paul about the necessity for RDF to be used to call it “Linked Data”. Vultures circle. In future I’ll draw more clear distinctions between the affordances of open data vs linked data vs semantic web, although I tried to put whitespace between linked data and semantic web in my session intro (PPT). Maybe it would be more clear for some audiences to consider the affordances of particular acts such as selecting a recognised data licence of various types, assigning persistent URIs to things, … but this is not useful for all discourse. Talking of “recognised data licence[s]” may also allow us to sidestep the “open” meme conflation: open access, open source, open process, open-for-reuse…

Actually, I’m rather inclined to insert a further distinction for and use the (fantasy) term “Minty Data” for linked-data-without-requiring-RDF (see another of Paul Walk’s posts on this). Why? Well: it seems that just posting CSV, while that might be better than nothing from an open data point of view doesn’t promise the kind of network effects that 4-rules linked data (i.e. Berners-Lee rules) offers. On the other hand it does seem to me that we might be able to get quite a long way without being hard core and are a lot less likely to frighten people away. I’m also aware that there is likely to be a paradigm shift for many in thinking and working with web architecture, in spite of the ubiquitousness of the web.

Minty Data rules, kind-of mongrel of ROA and 4-Rules Linked Data:

  1. Assign URIs to things people are likely to want to refer to
    • having first thought through what the domain model behind them is (draw a graph)
    • make the URIs hackable, predictable, structured
    • consider Logical Types (ref Bertrand Russell)
    • don’t change them until Hell freezes over
  2. use HTTP URIs for the usual reasons
  3. Return something machine-readable e.g. JSON, Atom
      • and something human-readable (but this ISN’T Minty Data)

      For Extra-strong mints:

      1. Link to other things using their URIs, especially if they were minted by you
      2. When returning information about a thing, indicate what class(es) of things it belongs to
      3. If the “thing” is also described in one or more encyclopedia or other compendium of knowledge, express that link in a well-known way.
        • and if it isn’t described but should be, get it added if you can

        There was a bit of discussion in the conference session about the perceived investment necessary to make Linked Data available. I rather felt that this shouldn’t necessarily be the case given software such as D2R and Triplify. At least, the additional effort required to make Minty Data available having first thought though the domain model (information architecture) shouldn’t be much. This is, of course, not a universally-trivial pre-requisite but it is an objective with quite a lot of literature to justify the benefits to be accrued from getting to grips with it. It would be a mistake to suggest boiling the ocean; the conclusion I make is that a readiness-criterion for anyone considering exposing Linked/Minty Data is that consideration of the domain model related to that data has been considered or is judged to be feasible or desirable for other reasons.

        The BBC approach, described in many places but quoted from Tom Scott and Michael Smethurst in Talis Nodalities here,  seems to reflect the above:

        “I’d like to claim that when we set out to develop [bbc.co.uk]/programmes we had the warm embrace of the semantic web in mind. But that would be a lie. We were however building on very similar philosophical foundations.

        In the work leading up to bbc.co.uk/programmes we were all too aware of the importance of persistent web identifiers, permanent URIs and the importance of links as a way to build meaning. To achieve all this we broke with BBC tradition by designing from the domain model up rather than the interface down. The domain model provided us with a set of objects (brands, series, episodes, versions, ondemands, broadcasts etc) and their sometimes tangled interrelationships.”

        On the other hand, I do perceive a threat arising from the ready availability of software to add a sprinkle of RDF or SPARQL endpoint to an existing web application or scrape HTML to RDF, especially if RDF is the focus of the meme. A sprinkle of RDF misses the point if it isn’t also based on a well-principled approach to URIs and their assignment and the value of links; a URI isn’t just the access point for a cranky API returning structured data. The biggest threat to the Linked Data meme may be a deluge of poor quality RDF rather than an absence of it.

        Objects in this Mirror are Closer than they Appear: Linked Data and the Web of Concepts

        Thursday, June 25th, 2009

        There is a whole collection of web technology that has been largely ignored or misunderstood. Sometimes we technical folk just made it over-complicated in great fits of excitement for the potential a new technology. This has probably been the case with a collection of technologies, both specifications and architectural practices, that can be grouped under the heading “semantic web”. But things are changing.

        The change is heralded by the meme of Linked Data which originated with Tim Berners-Lee in 2006. There are two really significant things about this meme: it is intelligible; it translated to real change. The really-really significant thing is that, although it is intelligible, it remains a solid foundation for some of the more pointy-headed technology; its adoption represents an important platform for change. It will affect how people think about and realise interoperability of data.

        The TED presentation by Tim Berners-Lee, “The Next Web” is a good motivational introduction to why this is a significant movement and includes a really succinct boiling-down of the technical ideas: assign URIs to concepts; relationships are links. There is nothing technically-new here. That is the point! It is intelligable.

        If Linked Data remained only an intelligable idea, it would not be so interesting. An idea that is acted upon is both more potent and, depending on the enacting agent, an indicator of changing practice. Tom Scott of BBC Earth provided an interview to PWC Technology Forecast recently, “Traversing the Giant Global Graph“, in which “Scott describes how the BBC is using Semantic Web technology and philosophy to improve the relevance of and access to content on the BBC Programmes and Music Web sites in a scalable way.” Adoption by such a high profile organisation gives those who, like CETIS, have been advocating a semantic-web-inspired approach to interoperability a real boost.

        In a completely different corner of human endeavour, the Royal Society of Chemistry has been doing things in the same flight-path. RSC Prospect enriches journal articles through chemical and biological ontology terms and the recently-acquired ChemSpider provides “access to almost 21.5 million unique chemical entities sourced from over 200 different data sources and integration to a multitude of other online services” organised according to chemical structure. These are not there yet, as Linked Data, but the direction of travel seems clear.

        When a major media player and the publishing arm of a professional society are making progress on what was esoterica only a few years ago, I think I’m safe in predicting change is afoot; sense and significance will be apparent to a wider set of people and I’m optimistic that members of the education sector will number highly in that set.

        Linked Data and the web of concepts is closer than it may appear.