Analysing OCWSearch logs

We have a meeting coming up on the topic of investigating what data we have (or could acquire) to answer the question of what metadata is really required to support the discovery, selection, use and management of educational resources. At the same time as I was writing a blog post about that, over at OCWSearch they were publishing the list of top searches for their collection (I think Pierre Far is the person to thank for that). So, what does this tell us about metadata requirements?

I’ve been through the terms at the top half of the list (it says that the list is roughly in descending order of popularity, however it would be really good to know more about how popular each search term was) and tried to judge what characteristic or property of the resource the searcher was searching on.

There were just under 170 search terms in total. It doesn’t surprise me that the vast majority (over 95%) of them are subject searches. Both higher-level, broad subject terms (disciplines, e.g. “Mathematics”) and lower-level, finer-grained subject terms (topics, e.g. “Applied Geometric Algebra”) crop up in abundance. I’m not sure you can say much about their relative importance.

What’s left is (to me) more interesting. We have:

  • resource types, specifically: “online text book”, “audio”, “online classes”.
  • People, who seem to be staff at MIT, so while it’s possible someone is searching for material about them or about their theories, I think it is likely that people are searching for them as resource creators
  • level, specifically: 101, Advanced (x2), college-level. These are often used in conjunction with subject terms.
  • Course codes e.g. HSM 260, 15.822, Psy 315. (These also imply a level and a subject.)

I think with more data and more time spent on the analysis we could get some interesting results from this sort of approach.

Jorum and Google ranking

Les Carr has posted an interesting analysis of Visibility of OER Material: the Jorum Learning and Teaching Competition. He searches for six resources on Google and compares the ranking in the results page of the resource on Google with the resource elsewhere. The results are mixed: sometimes Jorum has the top place sometimes some other site (institutional or author’s site) is top, though it should be said that with one exception we’re talking about which is first and which is second. In other words both would be found quite easily.

Les concludes:

Can we draw any general patterns from this small sample? To be honest, I don’t think so! The range of institutions is too diverse. Some of the alternative locations are highly visible, so it is not surprising that Jorum is eclipsed by their ranking (e.g. Cambridge, very newsworthy Gurkhas international organisation). Some 49% of Open Jorum’s records provide links to external sources rather than holding bitstream contents directly. It would be very interesting to see the bigger picture of OER visibility by undertaking a more comprehensive survey.

Yes it would be very interesting to see the bigger picture, and also it would be interesting to see a more thorough investigation of just the Jorum’s role (I don’t think Les will mind the implication that he has no more than scraped the surface).

Some random thoughts that this raises in my mind:

  • Title searches are too easy, the quality of resource description will only be tested by searching for the keywords that are really used by people looking for these resources. Some will know the title of the resource, but not many. Just have a play with using the most important one or two words from the title rather than the whole title and see how the results change.
  • To say that Jorum enhances/doesn’t enhance visibility depending on whether it comes above or below the alternative sites is too simplistic. If it links to the other site Jorum will enhance the visibility of that site even if it ranks below it; having the same resource represented twice in the search engine results page enhances its visibility no matter what the ordering; on the other hand, having links from elsewhere pointing to two separate sites probably reduces the visibility of both.
  • Sometimes Jorum hosts a copy of the resource, sometimes it just points to a copy elsewhere; that’s got to have an effect (hasn’t it?).
  • What is the cause of the difference? When I’ve tried similar (very superficial) comparisons, I’ve noticed that Jorum gets some of the basics of SEO right (e.g. using the resource’s title in the HTML Title element; curiously it doesn’t seem to use the HTML Description element). How does this compare to other hosts? I’ve noticed some other OER sites that don’t get this right, so we could see Jorum as guaranteeing a certain basic quality of resource discovery rather than as necessarily enhancing visibility. (Question: is this really necessary?)
  • What happens over time? Do people link to the copy in the Jorum or elsewhere. This will vary a lot, but there may be a trend. I’ll note in passing that choosing six resources that had been promoted by Jorum’s learning and teaching competition may have skewed the results.
  • Which should be highest ranked anyway? Do we want Jorum to be highly ranked to reflect its role as part of the national infrastructure, a place to showcase what you’ve produced; or do institutions see releasing OERs as part of a marketing strategy, and the best Jorum can do is quietly improve the ranking of the OERs on the institution’s site by linking to them? This surely relates to the choice between having Jorum host the resource or just having it link to the resource on the institutions site (doesn’t it?).

Sorry, all questions and no answers!

An open and closed case for educational resources

I gave a pecha kucha presentation (20 slides, 20 seconds per slide) at the Repository Fringe in Edinburgh last week. I’ve put the slides on slideshare, and there’s also a video of the presentation but since the slides are just pictures, and the notes are a bit disjointed, and my delivery was rather rushed, it seems to me that it would be useful to reproduce what I said here. Without the 20 second per slide constraint.

The main thrust of the argument is that Open Educational Resource (OER) or OpenCourseWare (OCW) release can be a good way of overcoming some of the problems institutions have regarding the management of their learning materials. By OER or OCW release we mean an institution, group or individual disseminating their educational resources under creative commons licences that allow anyone to take and use those resources for free. As you probably know over the last year or so HEFCE have put a lot of money into the UKOER programme.

I first started thinking about this approach in relation to building repositories four or five years ago.

I was on the advisory group for a typical institutional learning object repository project. The approach that they and many others like them at the time had chosen was to build a closed, inward-facing repository, providing access and services only within the institution. The project concerned about interoperability with their library systems and worried a lot about metadata.

Castle Kennedy The repository was not a success. In the final advisory group meeting I was asked whether I could provide an example of an institution with a successful learning object repository. I gave some rambling unsatisfactory answer about how there were a few institutions trying the same approach but it was difficult to know what was happening since they (like the one I was working with) didn’t want to broadcast much information about what they were doing.

And two days later it dawned on me that what I should have said was MIT.

MIT OpenCourseWare
At that time MIT’s OpenCourseWare initiative was by far the most mature open educational resource initiative, but now we have many more examples. But in what way does OER-related activity relate to the sort of internal management of educational materials that concerns projects like the one with which I was involved?

The challenges of managing educational resources
The problems that institional learning object repositories were trying to solve at that time were typically these:

  • they wanted to account for what educational content they had and where it was;
  • they wanted to promote reuse and sharing within the Institution;
  • they wanted more effective and efficient use of resources that they had paid to develop.

And why, in general, did they fail? I would say that there was a lack of buy-in or commitment all round, there was a lack of motivation from the staff to deposit and there was a lack of awareness that the repository even existed. Also there was more focus on the repository per se and systems interoperability than on directly addressing the needs of their stakeholders.

Does an open approach address these challenges?

Well, firstly, by putting your resources on the open web everyone will be able to access them, including the institution’s own staff and students. What’s more once these resources are on the open web they can be found using Google, which is how those staff and students search. Helping your staff find and have access to the resources created by other staff helps a lot with promoting reuse and sharing within the institution.

It is also becoming apparent that there are good institution-level benefits from releasing OERs.

For example the OU have traced a direct link from use of their OpenLearn website to course enrolment.

In general terms, open content raises the profile of the institution and its courses on the web, providing an effective shop window for the institution’s teaching, in a way that an inward facing repository cannot. Open content also gives prospective students a better understanding of what is offered by an institution’s courses than a prospectus can, and so helps with recruitment and retention.

There’s also a social responsibility angle on OERs. On launching the Open Universities OpenLearn initiative Prof. David Vincent said:

Our mission has always been to be open to people, places, methods and ideas and OpenLearn allows us to extend these values into the 21st century.

While the OU is clearly a special case in UK Higher Education, I don’t think there are many working in Universities who would say that something similar wasn’t at least part of what they were trying to do. Furthermore, there is a growing feeling that material produced with public funds should be available to all members of the public, and that Universities should be of benefit to the wider community not just to those scholars who happen to work within the system.

Another, less positive, harder-edged angle on social responsibility was highlighted in the ruling on a Freedom of Information request where the release of course material was required. The Information Tribunal said

it must be open to those outside the academic community to question what is being taught and to what level in our universities

We would suggest that we are looking at a future where open educational resources should be seen as the default approach, and that a special case should need to be made for resources that a public institution such as a university wants to keep “private”. But for now the point we’re making is that social responsibility is a strong motivator for some individuals, institutions and funders.

Legalities.
Releasing educational content openly on the web requires active management of intellectual property rights associated with the content used for teaching at the institution. This is something that institutions should be doing anyway, but they often fudge it. They should address questions such as:

  • Who is responsible for ensuring there is no copyright violation?
  • Who owns the teaching materials, the lecturer who wrote them or the institution?
  • Who is allowed to use materials created by a member of staff who moves on to another institution?

The process of applying open licences helps institutions address these issues, and other legal requirements such as responding to freedom of information requests relating to teaching materials (and they do happen).

Not all doom and gloom
Some things do become simpler when you release learning materials as OERs.

For example access management for the majority of users (those who just want read-only access) is a whole lot simpler if you decide to make a collection open; no need for the authentication or authorization burden that typically comes with making sure that only the right people have access.

On a larger scale, the Open University have found that setting up partnerships for teaching and learning with other institutions becomes easier if you no longer have to negotiate terms and conditions for mutual access to course materials from each institution.

Some aspects of resource description also become easier.

Some (but not all) OER projects present material in the context in which they were originally delivered, i.e. arranged as courses (The MIT OCW course a screen capture of which I used above is one example). This may have some disadvantages, but the advantage is that the resource is self describing–you don’t have to rely soley on metadata to convey information such as educational level and potential educational use. This is especially important becuase whereas most universities can describe their courses in ways that make sense, we struggle to agree controlled vocabularies that can be applied across the sector.

Course or resources?
The other advantage of presenting the material as courses rather than disaggregated as individual objects is that the course will be more likely to be useful to learners.

Of course the presentation of resources in the context of a course should not stop anyone from taking or pointing to a single component resource and using it in another context. That should be made as simple as possible; but it’s always going to be very hard to go in the other direction, once a course is disaggregated it’s very hard to put it back together (the source of the materil could describe how to put it back together, or how it fitted in to other parts of a course, but then we’re back into the creation of additional metadata).

Summary and technical
What I’ve tried say is that putting clearly licensed stuff onto the open web solves many problems.

What is the best technology genre for this? repository or content management system or VLE or Web2 service. Within the UKOER programme all four approaches were used successfully. Some of these technologies are primarily designed for local management and presentation of resources rather than open dissemination; and vice versa. There’s no consensus, but there is a discernable trend towards using a diversity of approaches and mixing-and-matching, e.g. some UKOER projects used repositories to hold the material and push it to Web 2 services; others pulled material in the other direction.

ps: While I was writing this, Timothy Vollmer over on the CreativeCommons blog was writing “Do Open Educational Resources Increase Efficiency?” making some similar points.

Image credits
Most of the images are sourced from Flickr and have one or another flavour of creative commons licence. From the top: