Semantic web applications in higher education

Last week I was in Southampton for the second workshop on Semantic web applications in higher education (SemHE), organised by Thanasis Tiropanis and friends from Learning Societies Lab at Southampton University. These same people had worked with CETIS on the Semantic Technologies in Learning and Teaching (SemTech) project. The themes from the meeting seemed to be an emphasis on using this technology to solve real problems, i.e. the applications in the workshop title, and, to quote Thanasis in his introduction a consequent “move away from complex idiosyncratic ontologies not much used outside of the original developers” and towards a simpler “linked data field”.

First: the scope of the workshop, at least in terms of what was meant by “higher education” in the title. The interests of those who attended came under at least two (not mutually exclusive) headings. One was HE as an enterprise, and the application of semantic web applications to the running of the University, i.e. e-administration, resource and facility management, and the like. The other was the role semantic technologies in teaching and learning, one aspect of which was summed up nicely by Su White as identifying the native semantic technologies that would give the student an authentic learning experience to prepare them for a world where massive amounts data are openly available, e.g. preparing geography students to work with real data sets.

The emphasis on solving a real problem was nicely encapsulated by a presentation from Farhana Sarker where she identified ~20 broad challenges facing UK HE such as management, funding, widening participation, retention, contribution to economy, assessment, plagiarism, group formation in learning & teaching, construction of personal and group knowledge…. She then presented what you might call a factorisation of the data that could help address these challenges into about 9 thematic repositories (using that word in a broad sense) containing: course information, teaching material, student records, research output, research activities, staff expertise, infrastructure data, accreditation records (course/institnl accred), staff development programme details (I may have missed a couple). Of course each repository addresses more than one of the challenges, and to do so much of the data held in them needs to be shared outside of the institution.

A nice, concrete, example of using shared data to address a problem in resource management and discovery was provided by Dave Lambert showing how external linked data sources such as Dewey.info, Library of Congress, GeoNames, sindice.com zemanta.com and a vocabulary drawn from from the FOAF, DC in RDFS, SKOS, WGS84 and Timeline ontologies, have been used by the OU to catalogue videos in the annomation tool and provide a discovery service through the SugarTube project.

One comment that Dave made was that many relevant ontologies were too heavyweight for the purpose he had, and this focus on what is needed to solve a problem linked with another theme that ran through the meeting, that of pragmatism and as much simplicity as possible. Chris Gutteridge made a very interesting observation, that the uptake of semantic technologies, like the uptake of the web in the late 1990s, would involve a change in the people working on it from those who were doing so because they were interested in the semantic web to those who were doing so because their boss had told them they had to. This has some interesting consequences, for example: there are clear gains to be made (says Chris) from the application of semantic technologies to e-admin, however the IT support for admin is not often well versed in semantic ideas. Therefore, to realise these gains those pioneering the use of the semantic web and linked data should supply patterns that are easy to follow; consuming data from a million different ontologies won’t scale.

Towards the end of the day the discussion on pragmatism rather than idealism settled on the proposal, I forget who made it, that ontologies were a barrier to mass adoption of the semantic web, and that what would be better would be to create a “big bag of predicates” with domain thing, range thing. The suggestion being that more specific domains or ranges tended to be ignored anyway. (Aside, I don’t know whether the domain & range would be owl:Thing s, or whether it would matter if rdfs:Resource were used instead. If you can explain how a distinction between those two helps interoperability then I would be interested; throw skos:Concept into the mix and I’ll buy you a pint.)

Returning to the SemTech project, the course of the meeting did a lot to reiterate the final report of that project, and in particular the roadmap it produced, which was a sequence of 1) release data openly as linked data with an emphasis on lightweight knowledge models; 2) creation and deployment of applications build on this data; 3) emergence of ontologies and pedagogy-aware semantic applications. While the linked data cloud shows the progress of step 1, I would suggest that it is worth keeping an eye on whether step 2 is happening (the SemTech project provided a baseline survey for comparison, so what I am suggesting is a follow up of that at some point).

Finally: thanks to Thanasis for organising the workshop, I know he had a difficult time of it, and I hope that doesn’t put him off organising a third (once you call something the second… you’ve created a series!)

Sharing service information?

Over the past few weeks the question of how to find service end-points keeps coming up in conversation (I know, says a lot about the sort of conversations I have), for example we have been asked whether we can provide information about where are the RSS feed locations for the services/collections created by the all the UKOER projects. I would generalise this to service end points, by which I mean the things like the base URL for OAI-PMH or RSS/ATOM feed locations or SRU target locations, more generally the location of the web API or protocol implementations that provide machine-to-machine interoperability. It seems that these are often harder to find than they should be, and I would like to recommend one and suggest another approach to helping make them easier to find.

The approach I would like to recommend to those who provide service end points, i.e. those of you who have a web-based service (e.g. a repository or OER collection) that supports machine-to-machine interoperability (e.g. for metadata dissemination, remote search, or remote upload) is that taken by web 2.0 hosts. Most of these have reasonably easy-to-find sections of their website devoted to documenting their API, and providing “how-to” information for what can be done with it, with examples you can follow, and the best of them with simple step-by-step instructions. Here’s a quick list by way of providing examples

I’ll mention Xpert Labs as well because, while the “labs” or “backstage” approach in general isn’t quite what I mean by simple “how-to” information, it looks like Xpert are heading that way and “labs” sums up the experimental nature of what they provide.

That helps people wanting to interoperate with those services and sites they know about, but it begs a more fundamental question, which is how to find those services in the first place; for example, how do you find all those collections of OERs. Well, some interested third-party could build a registry for you, but that’s an extra effort for someone who is neither providing or using the data/service/API. Furthermore, once the information is in the registry it’s dead, or at least at risk of death. What I mean is that there is little contact between the service provider and the service registry: the provider doesn’t really rely on the service registry for people to use their services and the service registry doesn’t actually use the information that it stores. Thus, it’s easy for the provider to forget to tell the service registry when the information changes, and if it does change there is little chance of the registry maintainer noticing. So my suggestion is that those who are building aggregation services based on interoperating with various other sites provide access to information about the endpoints they use. An example of this working is the JournalToCs service, which is an RSS aggregator for research journal tables of contents but which has an API that allows you to find information for the Journals that it knows about (JOPML showed the way here, taking information from a JISC project that spawned JournalToCs and passing on lists of RSS feeds as OPML). Hopefully this approach of endpoint users proving information about what they used would only provide information that actually worked and was useful (at least for them).