Archive for the ‘repositories’ Category

Repositories and the Open Web

Wednesday, February 24th, 2010

On the 19 April, in London CETIS are holding a meeting in London on Repositories and the Open Web. The theme of the meeting is how repositories and social sharing / web 2.0 web sites compare as hosts for learning materials: how well does each facilitate the tasks of resource discovery and resource management; what approaches to resource description do the different approaches take; and are there any lessons that users of one approach can draw from the other?

Both the title of the event (does the ‘and’ imply a distinction? why not repositories on the open web?) and the tag CETISROW may be taken as slightly provocative. Well, the tag is meant lightheartedly, of course, and yes there is a rich vein of work on how repositories can work as part of the web. Just looking back are previous CETIS events I would like to highlight these contributions to previous meetings:

  • Lara Whitelaw presented on the PROWE Project, about using wikis and blogs as shared repositories to support part-time distance tutors in June 2006.
  • David Davies spoke about RSS, Yahoo! Pipes and mashups in June 2007.
  • Roger Greenhalgh, talking about the National Rural Knowledge Exchange, in the May 2008 meeting. And many of us remember his “what’s hot in pigs” intervention in an earlier meeting.
  • Richard Davis talking about SNEEP (social network extensions for ePrints) at the same meeting

Most recently we’ve seen a natural intersection between the aims of Open Educational Resources initiatives and the use of hosting on web 2 and social sharing sites, so, for example, the technical requirements suggested for the UKOER programme said this under delivery platforms:

Projects are free to use any system or application as long as it is capable of delivering content freely on the open web. However all projects must also deposit their content in JorumOpen. In addition projects should use platforms that are capable of generating RSS/Atom feeds, particularly for collections of resources e.g. YouTube channels. Although this programme is not about technical development projects are encouraged to make the most of the functionality provided by their chosen delivery platforms.

We have followed this up with some work looking at the use of distribution platforms for UKOER resources which treats web 2 platforms and repository software as equally useful for that task.

So, there’s a longstanding recognition that repositories live on the open web, and that formal repositories aren’t the only platform suitable for the management and dissemination of learning materials. But I would missing something I think important if I left it at that. For some time I’ve had misgivings about the direction that conceptualising your resource management and dissemination as a repository leads. A while back a colleague noticed that a description of some proposed specification work, which originated from repository vendors, developers and project managers, talked about content being “hidden inside repositories”, which we thought revealing. Similarly, I’ve written before that repository-think leads to talk of interoperability between repositories and repository-related services (I’m sure I’ve written that before). Pretty soon one ends up with a focus on repositories and repository-specific standards per se and not on the original problem of resource management and dissemination. A better solution, if you want to disseminate your resource widely, is not to “hide them in repositories” in the first place. Also, in repository-world the focus is on metadata, rather than resource description: the encoding of descriptive data into fields can be great for machines, but I don’t think that we’ve done a great job of getting that encoding right for educational characteristics of resources, and that this has been at the expense of providing suitable information for people.

Of course not every educational resource is open, and so the open web isn’t an appropriate place for all collections. Also, once you start using some of the web 2.0 social sharing sites for resource management you begin to hit some problems (no option for creative commons licensing, assumptions that the uploader created/owns the resource, limitations on export formats, etc.)–though there are some exceptions. It is, however, my belief that all repository software could benefit from the examples shown by the best of the social sharing websites, and my hope that we will see that in action during this meeting.

Detail about the meeting (agenda, location, etc.) will be posted on the CETIS wiki.

Registration is open, through the CETIS events system.

Repository standards

Tuesday, November 24th, 2009

Tore Hoel tweeted:

The most successful repository initiatives do not engage with LT standards EDRENE report concludes #icoper

pointing me to what looks like a very interesting report which also concludes

Important needs expressed by content users include:

  • Minimize number of repositories necessary to access

Of these, the first bullet point clearly relates to interoperability of repositories, and indicates the importance of focusing on repository federations, including metadata harvesting and providing central indexes for searching for educational content.

Coincidentally I had just finished an email replying to someone who asked about repository aggregation in the context of Open Educational Resources because she is “Trying to get colleagues here to engage with the culture of sharing learning content. Some of them are aware that there are open educational learning resources out there but they don’t want to visit and search each repository.” My reply covered Google advanced search (with the option to limit by licence type), Google custom search engines for OERs, OER Commons, OpenCourseWare Consortium search, the Creative Commons Search, the Steeple podcast aggregator and the similar-in-concept Ensemble Feed finder.

I concluded: you’ll probably notice that everything I’ve written above relies on resources being on the open web (as full text and summarized in RSS feeds) but not necessarily in repositories. If there are any OER discovery services built on repository standards like OAI-PMH or SRU or the like then they are pretty modest in their success. Of course using a repository is a fine way of putting resources onto the web, but you might want to think about things like search engine optimization, making sure Google has access to the full text resource, making sure you have a site map, encouraging (lots of) links from other domains to resources (rather than metadata records), making sure you have a rich choice of RSS feeds and so on.

I have some scribbled notes on 4 or 5 things that people think are good about repositories by which may also be harmful, a focus on interoperability between repositories and repository-related services (when it is at the expense of being part of the open web) is on there.

Feeding a repository

Wednesday, October 28th, 2009

There has been some discussion recently about mechanisms for remote or bulk deposit in repositories and similar services. David Flanders ran a very thought provoking and lively show and tell meeting a couple of weeks ago looking at deposit. In part this is familiar territory; looking at and tweaking the work that the creators of the SWORD profile have done based on APP; or looking again at webDav. But there is also a newly emerging approach of using RSS or Atom feeds to populate repositories, a sort of feed-deposit. Coincidentally we also received a query at CETIS from a repository which is looking to collect outputs of the UKOER programme asking for help in firming-up the requirements for bulk or remote deposit, and asking how RSS possibly fitted into this.

So what is this feed-deposit idea. The first thing to be aware of is that as far as I can make out a lot of the people who talk about this don’t necessarily have the same idea of “repository” and “deposit” as I do. For example the Nottingham Xpert rapid innovation project and the Ensemble feed aggregator are both populated by feeds (you can also disseminate material through iTunesU this way). But, (I think) these are all links-only collections, so I would call them a catalogues not repositories, and I would say that they work by metadata harvest(*) not deposit. However, they do show that you can do something with feeds which the people who think that RSS or Atom is about stuff like showing the last ten items published should take note of. The other thing to take note of is podcasting, by which I don’t mean sticking audio files on a web server and letting people find them, but I mean feeds that either carry or point to audio/video content so that applications and devices like phones and wireless-network enabled media players can automatically load that content. If you combine what Xpert and Ensemble are doing by way of getting information about entire collections with the way that podcasts let you automatically download content then you could populate a repository through feeds.

The trouble is, though, that once you get down to details there are several problems and several different ways of overcoming them.

For example, how do you go beyond having a feed for just the last 10 resources? Putting everything into one feed doesn’t scale. If your content is broken down into manageable sized collections (e.g. The OU’s OpenLearn courses and I guess many other OER projects) you could put everything from each collection into a feed and then have an OPML file to say where all the different feeds are (which works up to a point, especially if the feeds will be fairly static, until your OPML file gets too large). Or you could have an API that allowed the receiver of the feed to specify how they wanted to chunk up the data: OpenSearch should be useful here, it might be worth looking at YouTube as an example. Then there are similar choices to be made for how just about every piece of metadata and the content itself is expressed in the feed, starting with the choice of flavour(s) for RSS or ATOM feed.

But, feed-deposit is a potential solution, and it’s not good to try to start with a solution and then articulate the problem. The problem that needs addressing (by the repository that made the query I mentioned above) is how best to deposit 100s of items given (1) a local database which contains the necessary metadata (2) enough programming expertise to read that metadata from the database and republish or post to an API. The answer does not involve someone sat for a week copy-and-pasting into a web form that the repository provides as its only means of deposit.

There are several ways of dealling with that. So far a colleague who is in this position has had success depositing into Flickr, SlideShare and Scribd by repeated calls to their respective APIs for remote deposit—which you could call a depositer-push approach—but an alternative is that she put the resources somewhere, provides information to tell repositories where they are so any repository that listens can come and harvest them—which would be more like a repository-pull approach, and in which case Feed-deposit might be the solution.

[* Yes, I know about OAI-PMH, the comparison is interesting, but this is a long post already.]

Distribution platforms for OERs

Monday, October 19th, 2009

One of the workpackages for CETIS’s support of the UKOER programme is:

Technical Guidelines–Services and Applications Inventory and Guidance:
Checklist and notes to support projects in selecting appropriate publication/distribution applications and services with some worked examples (or recommendations).
Output: set of wiki pages based on content type and identifying relevant platforms, formats, standards, ipr issues, etc.

I’ve made a start on this here, in a way which I hope will combine the three elements mentioned in the workpackage:

  1. An inventory of host platforms by resource type. Which are platforms that are being used for which media or resource types?
  2. A checklist of technical factors that projects should consider in their choice of platform
  3. Further information and guidance for some of the host platforms. Essentially that’s the checklist filled in

In keeping with the nature of this phase of the UKOER programme as a pilot, we’re trying not to be prescriptive about the type of platform projects will use. Specifically, we’re not assuming that they will use standard repository software and are encouraging projects to explore and share any information about the suitability of web2.0 social sharing sites. At the moment the inventory is pretty biased to these web2.0 sites, but that’s just a reflection of where I think new information is required.

How you can help

Feedback
Any feedback on the direction of this work would be welcome. Are there any media types I’m not considering that I should? Are the factors being considered in the checklist the right ones? Is the level of detail sufficient? Where are the errors?

Information
I want to focus on the platforms that are actually being used, so it would be helpful to know which these are. Also, I know from talking to some of you that there is invaluable experience about using some of these services, for example some APIs are better documented than others, some offer better functionality than others, some have limitations that aren’t apparent until you try to use them seriously. It would be great to have this in-depth information, there is space in the entry for each platform for these “notes and comments”.

Contributions
The more entries are filled out the better, but there’s a limit on what I can do, so all contributions would be welcome. In particular, I know that iTunes/iTunesU is important for audio video / podcasting, but I don’t have access myself — it seems to require some sort of plug-in called “iTunes” ;-) — so if anyone can help with that I would be especially grateful.

Depending on how you feel, you help by emailing me (philb@icbl.hw.ac.uk), or by registering on the CETIS wiki and either using the article talk page (please sign your comments) or the article itself. Anything you write is likely to be distributed under a Creative Commons cc-by-nc licence.

Web2 vs iTunesU

Tuesday, August 11th, 2009

There was an interesting discussion last week on the JISC-Repositories email list that kicked off after Les Carr asked

Does anyone have any experience with iTunes U? Our University is thinking of starting a presence on Apple’s iTunes U (the section of the iTunes store that distributes podcasts and video podcasts from higher education institutions). It looks very professional (see for example the OU’s presence at http://projects.kmi.open.ac.uk/itunesu/ ) and there are over 300 institutions who are represented there.

HOWEVER, I can’t shake the feeling that this is a very bad idea, even for lovers of Apple products. My main misgiving is that the content isn’t accessible apart from through the iTunes browser, and hence it is not Googleable and hence it is pretty-much invisible. Why would anyone want to do that? Isn’t it a much better idea to put material on YouTube and use the whole web/web2 infrastructure?

I’ld like to summarize the discussion here so that the important points raised get a wider airing; however it is a feature of these high quality discussions like this one that people learn and change their mind as a result, so I please don’t assume that people quoted below still hold the opinions attributed to them. (Fro example, invisibility on Google turned out to be far from the case for some resources.) If You would like to see the whole discussion look in the JISCMAIL archive

The first answers from a few posters was that it is not an either/or decision.

Patricia Killiard:

Cambridge has an iTunesU site. [...] the material is normally deposited first with the university Streaming Media Service. It can then be made accessible through a variety of platforms, including YouTube, the university web pages and departmental/faculty sites, and the Streaming Media Service’s own site, as well as iTunesU.

Mike Fraser:

Oxford does both using the same datafeed: an iTunesU presence (which is very popular in terms of downloads and as a success story within the institution); and a local, openly available site serving up the same
content.

Jenny Delasalle and David Davis of Warwick and Brian Kelly of UKOLN also highlighted how iTunesU complemented rather than competed with other hosting options, and was discoverable on Google.

Andy Powell, however pointed out that it was so “Googleable” that a video from Warwick University on iTunesU video came higher in the search results for University of Warwick No Paradise without Banks than the same video on Warwick’s own site. (The first result I get is from Warwick, about the event, but doesn’t seem to give access to the video–at least not so easily that I can find it; the second result I get is the copy from iTunes U, on deimos.apple.com . Incidentally, I get nothing for the same search term on Google Videos.) He pointed out that this is “(implicitly) encouraging use of the iTunes U version (and therefore use of iTunes) rather than the lighter-weight ‘web’ version.” and he made the point that:

Andy also raised other “softer issues” about which ones will students be referred to that might reinforce one version rather than another as the copy of choice even if it wasn’t the best one for them.

Ideally it would be possible to refer people to a canonical version or a list of available version, (Graham Triggs mentioned Google’s canonical URLs, perhaps if if Google relax the rules on how they’re applied) but I’m not convinced that’s likely to happen. So there’s a compromise, variety of platforms for a variety of needs Vs possibly diluting the web presence for any give resource.

And a response from David Davies:

iTunesU is simply an RSS aggregator with a fancy presentation layer.
[...]
iTunesU content is discoverable by Google - should you want to, but as we’ve seen there are easier ways of discovering the same content, it doesn’t generate new URLs for the underlying content, is based upon a principle of reusable content, Apple doesn’t claim exclusivity for published content so is not being evil, and it fits within the accepted definition of web architecture. Perhaps we should simply accept that some people just don’t like it. Maybe because they don’t understand what it is or why an institution would want to use it, or they just have a gut feeling there’s something funny about it. And that’s just fine.

mmm, I don’t know about all these web architecture principles, I just know that I can’t access the only copy I find on Google. But then I admit I do have something of a gut feeling against iTunesU; maybe that’s fine, maybe it’s not; and maybe it’s just something about the example Andy chose: searching Google for University of Warwick slow poetry video gives access to copies at YouTube and Warwick, but no copy on iTunes.

I’m left with the feeling that I need to understand more about how using these services affects the discoverability of resources using Google–which is one of the things I would like to address during the session I’m organising for the CETIS conference in November.

Repositories and linking research and teaching

Friday, May 8th, 2009

I was at the JISC Repository and Preservation end of programme meeting over the last couple of days (search for #rpmeet for more info). The subject of linking research and teaching activities came up two or three times in a way that I thought was interesting. (more…)

Repositories and the Web

Wednesday, February 11th, 2009

Andy Powell has been looking at how a couple of example repositories work in terms of putting stuff on the web. There two posts, both looking the “jump-off” pages for journal articles; one from a Dspace repository at Edinburgh University the other from an ePrints repository at Southampton (though he points out that he has chosen these repositories purely as illustrative examples, there’s nothing specific about the institutions and it’s not clear what if anything is specific to the software). He looks for points such as whether the HTML page title is relevant, is the URL “cool”, is the page linked where relevant (e.g. are the author names linked to something useful), is there any metadata in the HTML page (in the header, as microformats, or as links to machine readable records), how prominent is the link to the actual paper itself, etc.

The discussion fascinating, if you run a repository of any type of material I’ld suggest you take a look and think about your own repository. As Andy concludes:

My point is that I don’t see the issues around “eprint repositories as a part of the Web” featuring high up the agenda of our discussions as a community (and I suggest the same is true of learning object repositories), in part because we have allowed ourselves to get sidetracked by discussion of community-specific ‘interoperability’ solutions that we then tend to treat as some kind of magic bullet, rolling them out whenever someone questions one approach or another.

Hammer time?

Sunday, January 18th, 2009

Every time I see the CRIG logo I find myself thinking perhaps they should stop hammering away and look at the other faces of the cube where they might find a round hole.

The CRIG hammers a round block into a square hole

The learning content management repository virtual environment system 2.0 and its future, summarized

Tuesday, December 2nd, 2008

As I explained earlier, at this year’s JISC CETIS conference I was in charge of running a session comparing content management, virtual learning and repository systems. I’ve just finished updating the session page on the wiki with links to all the presentations and commentaries available from the day. Here are my own summary and reflections on the session.
(more…)

The Learning Content Management Repository Virtual Environment system 2.0 and its future

Monday, November 17th, 2008

That’s not a VLE it’s a repository. That’s not a repository it’s a content management system. That’s not a CMS it’s a VLE. [Discuss ...]
(more…)