Using standards to make assessment in e-textbooks scalable, engaging but robust

During last week’s EDUPUB workshop, I presented a demo of how an IMS QTI 2.1 question item could be embedded in an EPUB3 e-book in a way that is engaging, but also works across many e-book readers. Here’s the why and how.

One of the most immediately obvious differences between a regular book and an e-textbook is the inclusion of little quizzes at the end of a chapter that allow the learner to check their understanding of what they’ve just learned. Formative assessment matters in textbooks.

When moving to electronic textbooks, there is a great opportunity to make that assessment more interactive, and provide richer feedback, and connect the learning to a wider view of how a student is doing (i.e. learning analytics). The question is how to do that in a way that works across many e-reading devices and applications, on a scale that works for publishers.

QTI item in Adobe Editions

QTI item in Adobe Editions

Scalability is where interoperability standards like EPUB3, IMS Learning Tool Interoperability (LTI) and IMS Question and Test Interoperability (QTI) 2.1 come in. People use a large number of different software systems in the authoring, management, and playback of e-books. Connecting each of those to all the others with one-off custom integrations just gets too complex, too expensive and too brittle; that’s why an increasing number of publishers and software vendors agreed on the EPUB specification. As long as you implement that spec, solutions can scale across many e-book applications. The same goes for question and test material, where IMS QTI does the same job. LTI does that job for connecting VLEs to any online learning tool.

Which leaves the question of how to square the circle of making the assessment experience as engaging and effective as possible, but also work on devices with very different capabilities.

Fortunately, EPUB3 files can include a number of techniques that allow an author to adapt the content to the capability of the device it is being read on. I used those techniques to present the same QTI item in three different ways; as a static quiz – much like a printed book –, as a simple interactive widget and as a feedback rich test run by an online assessment system inside the book. The latter option makes detailed analytics data available and it should also make it possible to send a grade to a VLE automatically.

The how

QTI item in Apple iBooks

QTI item in Apple iBooks

For the static representation and the interactive widget, I relied on Steve Lay’s rather brilliant transform from QTI XML to HTML5 (and back again), and to make the HTLM5 interactive with some javascript. By including this QTI HTML5 in the EPUB, you get all the advantages of standard QTI, in a way that still works in a simple, offline reader such as Adobe Editions as well as more capable software such as Apple’s iBooks.

For the most capable, online ebook readers such as Readium, the demo e-textbook connects to QTIWorks, an online QTI compliant assessment engine. It does that via IMS LTI 1.1, but in a somewhat unusual way: in LTI terms, the e-book behaves as a tool consumer. That is; like a VLE. Using a hash of an Oauth secret and key, it establishes a connection to QTIWorks, identifies the user, and retrieves the right quiz to show inside the ebook. A place to send the results of the quiz to is also provided, but I’ve not tested that yet. QTIWorks makes detailed report available of what the learner did exactly with each item, which can be retrieved in a variety of machine readable formats.

QTI item in Readium

QTI item in Readium

Because the secret and the key have to be included in the book, the LTI connection the book establishes is not as secure as an LTI connection from a proper VLE. For access to some formative assessment, that may be a price worth paying, though.

The demo EPUB3 uses both scripting and some metadata to determine which version of the QTI item to show. The QTI item, the LTI launch and the EPUB textbook are all valid according to their specifications, and rely on stock readers to work.

Acknowledgements and links

David McKain for making QTIWorks
Steve Lay for the QTI HTML transforms
John Kristian of the OAuth project for the OAuth javascript library
Stephen Vickers for the ceLTIc IMS LTI development tools

The (ugly, content-less) demonstration EPUB3 and associated code is available from Github.

QTI 2.1 spec release helps spur over £250m of investment worldwide

With the QTI 2.1 specification finalised and released, we’re seeing significant global investment in tools that implement the spec. Tools developed by JISC projects have been central.

It has taken a while, but since March this year, IMS Question and Test Interoperability 2.1 has been released as a final specification. That means that people can implement it, secure in the knowledge that it won’t change or disappear, even if there are likely to be future versions.

The release, not coincidentally, happens at a time when there is a lot of activity regarding the use of the specification around the world. This level of investment isn’t just due to a set of documents on a website, it is also due to the fact that there is a range of working implementations available that demonstrate how QTI 2.1 works, and that’s where a couple of Jisc projects play a crucial role. But let’s have a look at what people are doing with the spec around the world first.

The Netherlands

The biggest assessment project in the low lands at the moment is the effort to move all online school exams to the QTI 2.1 format. The multi-million Euro effort is led by the Commissie voor Examens, managed by DUO, with the CITO exam body and trifork as contractors. Because of the specific demands put upon the whole infrastructure, the partners will need an extensive profile.

Accompanying the formal exam profile is the NL-QTI effort led by Kennisnet. This pragmatic but relatively rich profile of the specification is meant to facilitate an eco system of material and software for general use in schools. We should see more of that profile in the near future.

Lastly, Surf is currently running the Assessment and Assessment Driven Learning programme in higher education, which will revolve around a sharable infrastructure for online assessment. Part of that programme will be an exploration to what extent such sharing can be facilitated by QTI 2.1

Germany

The main player here is the Onyx suite from BPS. This complete assessment suite of editor, test player, analytics module and converter is built around QTI 2.1, and has been used standalone as well as integrated with the OLAT VLE. One instance of the latter that is shared between all 13 universities in Saxony has about 50.000 users, with about 25.000 log-ins per day. Similar consortia exist in Thuringia and Rhineland-Palatinate, and there are further university specific installations with a combined total of about a 108.000 users. The hosted Onyx test player runs about 300 – 1000 test runs a day.

France

The work in France is on a smaller scale, but is mature and well targeted. The MOCAH team of UPMC, Paris 6 has developed a system where QTI 2.1 source is transformed such that it can be run on generic Java or PHP based web servers, as well as specialised QTI players. The focus is on the teaching of math to secondary schools students, and it has been used in 160 classes, where 400 patterns have been created. The latter are question item templates that generate large amounts of items for students to practice on; a key requirement.

South Korea

After experiments in the past with, among other tools, QTI 2.1 generated from common word-processing tools, KERIS – the Korea Education and Research Information Service – is now engaging vendors in a project to integrate QTI 2.1 in EPUB 3 ebooks. Various options are being explored at the moment, with results due later this year.

USA

This is where the development-at-scale is taking place at the moment, thanks to the Race To The Top (RTTT) projects that were funded by the Obama administration. There are two state-led consortia – Smarter Balanced and PARCC – with a mission to overhaul the whole assessment infrastructure in schools, base it on open standards and open source software, and provide a tranche of new material to go with it. They had an initial budget of $160-170 million each, with about a third of those budgets intended for tool development. QTI 2.1, along with the Accessible Portable Item Protocol (APIP) extensions, is at the heart of the initiative.

The size of those consortia is having effects elsewhere too. One major educational publisher has already decided to standardise internally on QTI 2.1, and others are looking at the same option. Not that such a thing is new: organisations such as the Northwest Evaluation Association (NWEA) and the world’s largest testing organisation – ETS – have already chosen QTI 2.1 as their internal ‘lingua franca’. Rather than make many point to point integrations between their own systems and collections, and then having to do that again with each organisation they partner with, they translate each format to and from QTI.

UK

Meanwhile, back in the UK, JISC has sponsored a small community – most recently via the Assessment & Feedback programme – that has played a vital role in making QTI 2.1 real. ‘Real’ in the sense of checking whether and how the specification would work, as it was being designed, in the case of Jassess. ‘Real’ also in the sense of putting QTI 2.1 material in the hands of a range of teachers and learners, via editing tools such as Uniqurate and playback tools such as QTIWorks. An excellent RSC Scotland post outlines exactly how those outputs of the QTI-DI and Uniqurate projects work.

All of these UK projects’ tools, guidance and assessment materials are known to all the above communities, as well as plenty of others I’ve not even mentioned. In some cases, the JISC sponsored tools have been extended by others, in other cases, the presence and online accessibility of the resources meant that those other communities knew what was possible, what their own tools and materials should look like, and how they could interoperate.

At this point, it’s not clear whether new Jisc will support future work in this area. What is clear, however, is that JISC’s past investment will continue to have a global effect well beyond the initial outlay.

Assessment & Feedback tool development lessons

With most software development project in the JISC Assessment & Feedback programme drawing to a close, it’s a good time to look at some common themes in their findings.

There’s a small, but perfectly formed little cluster of four projects in ‘strand C’ of the Assessment & Feedback programme. Strand C is the techy corner, because it is these that projects that took existing open source tools and adapted them for use in organisations beyond the ones they were developed in.

Within the strand, the tools that were being developed were:

  • Rogō, a complete assessment authoring, playback and management system, developed by the eponymous project at Nottingham University, and deployed in three other institutions
  • OpenMentor, a system that analyses tutor feedback on assignments, developed at the OU, now deployed in two other institutions by the OMTetra project
  • QTIWorks, a full-featured, QTI compliant assessment and test player, developed at Edinburgh University, now deployed by the QTI-DI project
  • Uniqurate, an online, QTI compliant assessment and test authoring tool developed at Kingston University by the eponymous project, and coupled to QTIWorks

Looking through their development experiences, there’s a couple of themes that seem to recur:

User interface complexity

What to do when one set of users need something simple, and another set want full access to all functions? The clearest example of that dillema was presented to the Uniqurate project: there was an existing assessment item editor called mathqurate that gave access to all aspects of many different question types, but was only really usable by experts, and an earlier version of uniqurate that was very friendly, but also very limited. Which is why the current project aimed to become the “goldilocks editor” by offering a flexible but easily graspable set of item type modules, but also by offering different modes that are accessible to more intrepid users.

The most advanced of these modes gives the user access to the QTI source code of a question, which is something that is also available in QTIWorks. Another, arguably more important simple versus complex user interface issue that QTIWorks has to deal with is how to show runtime variables. For authors, this is vital, but for candidates it is rather confusing and often assessment defeating. Solution? Like Uniqurate: different modes for different audiences.

In OpenMentor, the audience is broadly the same – tutors –, but some wanted to know what’s going on in the ‘black box’ that takes their feedback on assignments and categorises it into a well-known taxonomy, while others where just happy with the results. The likely solution is also to include an advanced mode in a future version of the tool.

Interoperating with other systems

Or: how do I get user information in my tool without asking those users to type it all in?

OpenMentor and Rogō went down the LDAP route, given that it is the most common way to distribute person information inside organisations. It worked for these tools too, though Rogō had to spend quite some time at one of the new sites to adapt the LDAP to Rogō mapping. Some assembly may be required, in other words.

Rogō and QTIWorks also implemented the much newer IMS Learning Technology Interoperability (LTI) specification. This specification is designed to allow more ad hoc connections between a VLE and tools such as the tools from the assessment & feedback programme. LTI is intended primarily to identify users, but it can also be used to move some user information from one system to another, particularly when those systems may be in different organisations. This function is still evolving, though, as Rogō found when they looked for an external examiner role within LTI. They couldn’t find it when they implemented it, but LTI supports it now.

Fostering a community

Because all four projects are open source, and because they were all meant to facilitate wider adoption, community building with users and other developers was paramount. It’s not easy, though.

Uniqurate noticed this particularly with regard to the use of agile software methodologies, as outlined in their last blogpost. Agile is generally advocated because it makes sure development happens in small steps that track what users actually want. Except that the users in this case where very busy academics who were enthusiastic, but rarely available during term time. And a project is too short to easily work around that. Conclusion: sometimes other methodologies may work better.

The OMtetra project used workshops and surveys to engage their user community, which did work. Developer engagement might be a slightly different matter, however: there are three different public code repositories for OpenMentor, of different degrees of currency. The branch developed during this project is the slightly, rather than the very, stale one. Whether all the developments have made it through to the latest branch is not clear. It is still actively developed, however, and that’s the main thing.

For QTIworks, code and documentation is clearer, and with success: the code has been adopted by developers on one of the very large Race To The Top assessment projects in the United States. It has been used there to prototype some potentially revolutionary new functionality in interoperable assessment material, which is likely to become part of the QTI specification itself. Part of the success may also be due to the fact that, like Uniqurate, a demo version of QTIworks is available online.

Both QTIworks and Uniqurate, have, however, been used for teaching and learning in a relatively limited scale compared to Rogō. As the Rogō project discovered, that can be a mixed blessing. Once courses start to rely on a system, the demand for support of all kinds increases exponentially- and that’s before Rogō is being used widely for summative assessment. Sound user and installation documentation helps, but doesn’t resolve all issues that other organisations may need help with, whether there’s a support business model in place or not. Also, demands of other organisations inevitably lead to tensions with the priorities of the original developers. That’s manageable, but requires thought and ongoing commitment.

Conclusion

It is a bit difficult to generalise across these four projects, much less all open source software developments at universities. Yet it seems fairly clear that the main issue is community building: once the right number of the right mix of partners are on board, other issues become more tractable. Fostering such communities is difficult, but it is something that an organisation like OSSWatch can help with; as Rogō has already been doing.

IMS Question and Test Interoperability 2.1 tools demonstrate interoperability

While most of Europe was on the beach, a dedicated group of QTI vendors gathered in Koblenz, Germany to demo what a standard should do: enable interoperability between a variety of software tools.

A total of twelve tools were demonstrated for the attendees of the IMS quarterly meeting that was being held at the University of Koblenz-Landau. The vendors and projects ranged from a variety of different communities in Poland, Korea, France, Germany and the UK, and their tools included:

All other things being equal, the combination of such a diversity of purposes with the comprehensive expressiveness of QTI, means that there is every chance that a set of twelve tools will implement different, non-overlapping subsets of the specification. This is why the QTI working group is currently working on the definition of two profiles: CC (Common Cartridge) QTI and what is provisionally called the Main profile.

The CC QTI profile is very simple and follows the functionality of the QTI 1.2 profile that is currently used in the IMS Common Cartridge educational content exchange format. Nine out of the twelve tools had implemented that profile, and they all happily played, edited or validated the CC QTI reference test.

With that milestone, the group is well on the way to the final, public release of the QTI 2.1 specification. Most of the remaining work is around the definition of the Main profile.

Initial discussion in Koblenz suggested an approach that encompasses most of the specification, with the possible exclusion of some parts that are of interest to some, but not all subjects or communities. To make sure the profile is adequate and implementable, more input is sought from publishers, qualification authorities and others with large collections of question and test items. Fortunately, a number of these have already come forward.

IMS QTI and the economics of interoperability

In the twelve years of its existence, an awful lot has been learned about interoperability by IMS staff and members. This is nowhere more apparent than in the most quintessentially educational of interoperability standards: question and test items (QTI). A recent public spat about the IMS QTI specification provides an interesting contrast to two emerging views of how to achieve interoperability. Fortunately for QTI, they’re not incompatible with each other.

Under the old regime, the way interoperability was achieved was by establishing consensus among the largest number of stakeholders possible, create a spec, publish it and wait for the implementations to follow. With the benefit of hindsight, it’s fair to say that the results have been mixed.

Some IMS specs got almost no implementation at all, some galvanised a lot of development but didn’t reach production use, and some were made to work for particular communities by their particular communities. On the whole, many proved remarkably flexible in use, and of sound technical design.

Trouble was, more often than not, two implementations of the same IMS spec were not able to exchange data. To understand why, the QTI spec is illustrative, but not unique.

For a question and test spec to be useful to most communities, and for several of these communities to be able to share data or tools, a reasonably wide range of types needs to be supported. QuestionMark (probably the market leader in the sector) uses the wide range of question types that its product supports as a key differentiator. Likewise, though IMS QTI 2.1 is very expressive, a lot of practitioners in the CETIS Assessment SIG frequently discuss extensions to ensure that the specification meets their needs.

The upshot is that QTI 2.1 is implementable, as a fair old list of tools on wikipedia demonstrates, but implementing all of it isn’t trivial. This could be argued to be one reason why it is not in wider use, though the other reason might well be that QTI 2.1 was never released as a final specification, and now is no longer accessible to non-IMS members.

To see how to get out of this status quo, the economics of standard implementation need to be considered. From a vendor’s point of view – open or closed source – , implementing any interoperability spec represents a cost. The more complex and flexible the specification, the higher that cost is. This is not necessarily a problem, as long as the benefit is commensurate. If either the market is large enough, or else the perceived value of the spec high enough for the intended customers to be willing to pay more, the specification will be economically viable.

Broadly two models of interoperability can be used to figure out a way to make a spec economically viable, and which you go for largely depends on your assumptions about the technical architecture of the solution.

One model assumes that all implementations of a spec like QTI are symmetrical and relatively numerous. Numerous as in certainly more than two or three, and possibly double digits or more, and systems as in VLEs. With that assumption, the QTI situation needs clear adjustment. The VLE market is not that large to begin with, and is fairly commoditised. There is little room for investment, and there has not been a demonstrated willingness to pay for extended interoperable question and test features.

From the symmetrical perspective, then, the only way forward is to simplify the spec down to a level that the market will bear, which is to say, very simple indeed. Since, as we’re already seeing with the QTI 1.2 profile in Common Cartridge, it is not possible to satisfy all communities with the same small set of question and test items, there will almost certainly need to be multiple small profiles.

There are several problems with such an approach. For one, reducing the feature set to the lowest cost has a linear relation to the value of the feature set to the end user. Beyond a minimum it might be almost useless. Balkanising the spec’s space to several incompatible subsets is likely to exacerbate this; not just for end-users, but also tool and content vendors.

What’s worse, though, is that the underlying assumption is wrong. Symmetrical interoperability doesn’t work. To my knowledge, and I’d love to be corrected, there are no significant examples of an interoperability spec that has significant numbers of independent implementations that happily export and import each others’ data. The task of coordinating the crucial details of the interpretation of data is just too onerous once the number of data sources and targets that a piece of software has to deal with gets into the double digits.

Symetrical, many-to-many interoperability; 8 systems, 56 connections that need to work

Symetrical, many-to-many interoperability; 8 systems, 56 connections that need to work

Within the e-learning world, SCORM 1.2 (and compatible IMS Content Packages) came closest to the symmetric, many-to-many ideal, but only because the spec was very simple, the volume of the market large, compliance often mandated and calculated into Requests For Proposals (RFPs), and vendors were prepared to coordinate their implementations in numerous plugfests and codebashes as a consequence. Also, ADL invested a lot of money in continuous implementation support. Even then, plenty of issues remained, and, crucially, most implementations were not symmetrical: they imported only. Once the complexity of the SCORM increased significantly with the adoption of Simple Sequencing in SCORM 2004, the many-to-many interoperability model broke down.

Instead, the emergence of solutions like Icodeon’s SCORM 2004 plug-in for VLEs brought the spec back to the norm: asymmetrical interoperability. Under this assumption, there will only ever be a handful of importing systems at most, but a limitless number of data sources. It’s how HTML works on the web: uncountable sources that need to target only about four codebases (Internet Explorer, Mozilla, WebKit, Opera), one of which dominates to such an extent that the others need to emulate its behaviour. Same with JPEG picture rendering libraries, BIND implementations and more. In educational technology it is how Simple Sequencing and SCORM 2004 got traction, and it is starting to look as if it will be the way most people will see IMS Common Cartridge too.

Under this assumption, implementing a rich QTI profile in two or three plug-ins or web services becomes economically much more viable. Not only is the amount of required testing much reduced, the effective cost of implementation is spread out over many more systems. VLE vendors can offer the feature for much less, because the total market has effectively paid for just two or three best-of-breed implementations rather than tens of mediocre ones.

Asymetrical, many-to-many interoperability; 8 source systems, 2 consuming systems, 16 connections that need to work

Asymetrical, many-to-many interoperability; 8 source systems, 2 consuming systems, 16 connections that need to work

This is not a theoretic example. Existing rich QTI 2.1 implementations make the asymmetric interoperability assumption. In Korea, KERIS (Korea Education and Research Information Service) is coordinating the development of three commercial implementations of the rendering and test side of QTI, but many specialised authoring tools are envisaged. Likewise, in the UK, two full implementations of the rendering and test management side of QTI exist, but many subject specific authoring tools are envisaged. All existing renderers can be used as a web application, and QTIEngine is also explicitly designed to work as a local plug-in or web service that can be embedded in various VLEs.

That also points to various business models that asymmetric interoperability enables. VLE vendors can focus on the social networking core, and leave the activity specific tools to the specialists with the right expertise. Alternatively, vendors can band together and jointly develop or adopt an open source code library, like the Japanese companies that implemented Simple Sequencing under ALIC auspices, back in the day.

Even if people still want to persist with symmetrical interoperability, designing the specification to accommodate both assumptions is not a problem. All that’s required is one rich profile for the many-to-few, asymmetric assumption, and a very small one for the many-to-many, symmetric assumption. Let’s hope we get both.

Resources

A brief overview of the current QTI 2.1 discussion

Wikipedia’s QTI page, which contains a list of implementations

More on the KERIS QTI 2.1 tools

The QTIEngine demo site

An interview with Kyoshi Nakabayashi, formerly of ALIC, about joint Simple Sequencing implementation work

WebPA wins Learning Impact award

At the Learning Impact conference, Loughborough University’s WebPA peer assessment system won a bronze award. It was also recognised as best in the assessment support category.

WebPA’s Nicola Wilkinson receives the award from IMS’ Lisa Mattson.

WebPA was the only UK entry in the field of twenty-three nominees from all over the world, gathered at the Omni Hotel in Austin, Texas. All the entries were judged in a grueling twenty-three rounds of five minute demos. Judged at the very end of that ordeal, the WebPA team still managed to impress the five judges. The preferences of all the other Learning Impact attendees counted as a sixth, collective judgment.

The Learning Impact awards are IMS’ means to recognise e-learning innovations that have made a palpable difference in a particular community. To ensure parity, a welter of categories and criteria ensures that there is a degree of comparability between entries of very different kinds, scales and levels of maturity. The awards form the centrepiece of the annual IMS public event, the Learning Impact conference.

WebPA is a mature, well developed system that supports the choreography of peer assessment in group work. Rather than simply award a blanket grade to every student in a project group, it allows student to grade each other’s efforts in any number of different criteria. The system is an open source webapplication that is getting deployed in an increasing number of institutions. Further development of the system is currently funded by the JISC.

There’s more Learning Impact news on the IMS website.
Learn more about WebPA on the Loughborough website