Archive for the ‘cetis-standards’ Category

How users can get a grip on technological innovation

Monday, July 14th, 2008

Control of a technology may seem a vendor business: Microsoft and Windows, IBM and mainframes. But by understanding how technology moves from prototype to ubiquity, the people who foot the bill can get to play too.

Though each successful technological innovation follows its own route to becoming an established part of the infrastructure, there are couple of regularities. Simplified somewhat, the process can be conceived as looking like this:
The general technology commodification process

The interventions (open specifications, custom integration et al) are not mutually exclusive, nor do they necessarily come in a fixed chronological order. Nonetheless, people do often try to establish a specification before the technology is entirely ‘baked’, in order to forestall expensive dead-ends (the GSM mobile phone standard, for example). Likewise, a fully fledged open source alternative can take some time to emerge (e.g. mozilla / firefox in the web browser space).

The interventions themselves can be done by any stakeholder; be they vendors, buyers, sector representatives such as JISC or anyone else. Who benefits from which intervention is a highly complex and contextualised issue, however. Take, for example, the case of VLEs:

The technology commodification process as applied to VLEs

When VLEs were heading from innovation projects to the mainstream, stakeholders of all kinds got together to agree open standards in the IMS consortium. No-one controlled the whole space, and no-one wanted to develop an expensive system that didn’t work with third party tools. Predictably, implementations of the agreed standards varied, and didn’t interoperate straight off, which created a niche for tools such as Reload and course genie which allowed people to do some custom integration; a glossing over the peculiarities of different content standard implementations. Good for them, but not optimal for buyers or the big vle vendors.

In the mean time, some VLE vendors smelled an opportunity to control the technology, and get rich off the (license) rent. Plenty of individual buyers as well as buyer consortia took a conscious and informed decision to go along with such a near monopoly. Predictability and stability (i.e. fast commodification) were weighed against the danger of vendor lock-in in the future, and stability won.

When the inevitable vendor lock-in started to bite, another intervention became interesting for smaller tool vendors and individual buyers: reverse engineering. This intervention is clearer in other technologies such Windows file and print server protocols (the Samba project) and the PC hardware platform (the early ‘IBM clones’). Nonetheless, a tool vendor such as HarvestRoad made a business of freeing content that was caught in proprietary formats in closed VLEs. Good for them, not optimal for big platform vendors.

Lastly, when the control of a technology by a platform becomes too oppressive, the obvious answer these days is to construct an open source competitor. Like Linux in operating systems, Firefox in browsers or Apache in webservers, Moodle (and Sakai, atutor, .LRN and a host of others) have achieved a more even balance of interests in the VLE market.

The same broad pattern can be seen in many other technologies, but often with very particular differences. In blogging software, for example, open source packages have been pretty dominant from the start, and one package even went effectively proprietary later on (Movable Type).

Likewise, the shifting interests of stakeholders can be very interesting in technologies that have not been fully commodified yet. Yahoo for example, has not been an especially strong proponent of open specifications and APIs in the web search domain until it found itself the underdog to Google’s dominance. Google clearly doesn’t feel the need to espouse open specifications there, since it owns the search space.

But it doesn’t own mobile phone operating systems or social networking, so it is busy throwing its weight behind open specification initiatives there. Just as they are busy reverse engineering some of Microsoft’s technologies in domains that have already commodified such as office file formats.

From the perspective of technology users and sector representatives, it pays to consider each technology in a particular context before choosing the means to commodify it as advantageously but above all as quickly as possible. In the end, why spend time fighting over technology that is predictable, boring and ubiquitous when you can build brand new cool stuff on top of it?

Semantic tech finds its niches and gets productive

Tuesday, June 3rd, 2008

Rather than the computer science foundation, the annual semantic technology conference in San Jose focusses on the practical applications of the approach. Visitor numbers are growing at a ‘double digit’ clip, and vendors are starting to include big names such as Oracle. We take a look.

It seems that we’re through the trough of disillusionment about the fact that the semantic web as outlined by the Tim Berners Lee in 1999 has not exactly materialised (yet). It’s not news that we do not all have intelligent agents that can seek out all kinds of data on the ‘net, and integrate it to satisfy our specific queries and desires. What we do have is a couple of interesting and productive approaches to the mixing and matching of disparate information that hint at a slope of enlightenment, heading to a plateau of productivity.

Easily the clearest example from the conference is the case of Blue Shield of California, a sizeable health care insurer in the US. They faced the familiar issue of a pile of legacy applications with custom connections, that were required to do things they were never designed to do. As a matter of fact, customer and policy data (core data, clearly) were spread over two systems of different vintage, making a unified view very difficult.

In contrast to what you might expect, the solution they built leaves the data in the existing databases- nothing is replicated in a separate store. Instead, the integration takes place in a ’semantic layer’. In essence, that means that users can ask pretty complex and detailed questions of any information that is held in any system, in terms of a set of models of the business. These queries end up at the same old systems, where they get mapped from semantic web query form into good old SQL database queries.

This kind of approach doesn’t look all that different from the Enterprise Service Bus (ESB) in principle, but takes takes the insulation of service consumers from the details of service providers rather further. Service consumers in a semantic layer have just one API for any kind of data (the W3C’s SPARQL query language) and one datamodel (RDF, though output in XML or JSON is common). Most importantly, the meaning of the data is modelled in a set of ontologies, not in the documentation of the service providers, or the heads of their developers.

While the Blue Shield of California case was done by BBN, other vendors that exhibited in San Jose have similar approaches, often built on open source components. The most eye catching of those components (and also used by BBN) is netkernel: the overachieving offspring of the web and unix. It’s not necessarily semantic tech per se, but more of a language agnostic application environment that competes with J2EE.

Away from the enterprise, and more towards the webby (and educational!) end of things, the state of semantic technology becomes less clear. There are big web apps such as the Twine social network where the triples are working very much in the background, or powerset, where it is much more in your face, but to little apparent effect.

Much less polished, but much, much more interesting is dbpedia.org- an attempt to put all public domain knowledge in a triple store. Yes, that includes wikipedia, but also the US census and much more. DBpedia is accessible via a number of interfaces, including SPARQL. The result is the closest thing yet to a live instance of the semantic web as originally conceived, where it really is possible to ask questions like “give me all football players with number 11 shirts from clubs with stadiums with more than 40000 seats, who were born in a country with more than 10M inhabitants“. Because of the inherent flexibility of a triple store and with the power of the SPARQL interface, dbpedia could end up powering all kinds of other web applications and user interfaces.

Nearer to a real semantic web, though, is Yahoo’s well publicised move to start supporting relevant standards. While the effect isn’t yet so obvious as semantic integration in the enterprise or dbpedia. it could end up being significant, for the simple reason that it focusses on the organisational model. It does that by processing data in various ’semantic web light’ formats that are embedded in webpages in the structuring and presentation of search results. If you’d want to present a nice set of handles on your site’s content in a yahoo search results page -links to maps, contact info, schedules etc- it’s time to start using RDFa or microformats.

Beyond the specifics of semantic standards or technologies of this point in time, though, lies the increasing demands for such tools. The volume and heterogeneity of data is increasing rapidly, not least because means of fishing structured data out of unstructured data are improving. At the same time, the format of structured data (its syntax) is much less of an issue than it once was, as is the means of shifting that data around. What remains is making sense of that data, and that requires semantic technology.

Resources

The semantic conference site gives an overview, but not any presentations, alas.

The California Blue Shield architecture was built with BBN’s Asio tool suite

More about the netkernel resource oriented computing platform can be found on the 1060 website

Twine is still in private beta, but will open up in overhauled form in the autumn.

Powerset is wikipedia with added semantic sauce.

DBpedia is the community effort to gather all public domain knowledge in a triple store. There’s a page that outlines all ways of accessing it over the web.

Yahoo’s SearchMonkey is the project to utilise semweb light specs in search results.

Recycling webcontent with DITA

Wednesday, May 23rd, 2007

Lots of places and even people have a pile of potentially useful content sitting in a retired CMS or VLE. Or have content that needs to work on a site as much as a pdf or a booklet. Or want to use that great open stuff from the OU, but with a tweak in that paragraph and in the college’s colours, please.

The problem is as old as the hills, of course, and the traditional answer in e-learning land has been to use one of the flavours of IMS Content Packaging. Which works well enough, but only at a level above the actual content itself. That is, it’ll happily zip up webcontent, provide a navigation structure to it and allow the content to be exchanged between one VLE and another. But it won’t say anything about what the webcontent itself looks like. Nor does packaging really help with systems that were never designed to be compliant with IMS Content Packaging (or METS, or MPEG 21 DID, or IETF Atom etc, etc.).

In other sectors and some learning content vendors, another answer has been the use of single source authoring. The big idea behind that one is to separate content from presentation: if every system knows what all parts of a document mean, than the form could be varied at will. Compare the use of styles in MS Word. If you religiously mark everything as either one of three heading levels or one type of text, changing the appearance of even a book length document is a matter of seconds. In single source content systems that can be scaled up to include not just appearance, but complete information types such as brochures, online help, e-learning courses etc.

The problem with the approach is that you need to agree on the meaning of parts. Beyond a simple core of a handful of elements such as ‘paragraph’ and ‘title’, that quickly leads to heaps of elements with no obvious relevance to what you want to do, but still lacking the two or three elements that you really need. What people think are meaningful content parts simply differs per purpose and community. Hence the fact that a single source mark-up language such as the Text Encoding Initiative (TEI) currently has 125 classes with 487 elements.

The spec

The Darwin Information Typing Architecture (DITA) specification comes out of the same tradition and has a similar application area, but with a twist: it uses specialisation. That means that it starts with a very simple core element set, but stops there. If you need to have any more elements, you can define your own specialisations of existing elements. So if the ‘task’ that you associate with a ‘topic’ is of a particular kind, you can define the particularity relative to the existing ‘task’ and incorporate it into your content.

Normally, just adding an element of your own devising is only useful for your own applications. Anyone else’s applications will at best ignore such an element, or, more likely, reject your document. Not so in DITA land. Even if my application has never heard of your specialised ‘task’, it at least knows about the more general ‘task’, and will happily treat your ‘task’ in those more general terms.

Though DITA is an open OASIS specification, it was developed in IBM as a solution for their pretty vast software documentation needs. They’ve also contributed the useful open source toolkit for processing content into and out of DITA (Sourceforge), with comprehensive documentation, of course.

That toolkit demonstrates the immediate advantage of specialisation: it saves an awful lot of time, because you can re-use as much code as possible. This works both in the input and output stage. For example, a number of transforms already exist in the toolkit to take docbook, html or other input, and transform it into DITA. Tweaking those to accept the html from any random content management system is not very difficult, and once that’s done, all the myriad existing output formats immediately become available. What’s more, any future output formats (e.g. for a new Wiki or VLE format) will be immediately useable once someone, somewhere makes a DITA to new format transform available.

Moreover, later changes and tweaks to your own element specialisations don’t necessarily require re-engineering all tools or transforms. Hence that Darwin moniker. You can evolve datamodels, rather than set them in stone and pray they won’t change.

The catch

All of this means that it quickly becomes more attractive to use DITA than make a set of custom transforms from scratch. But DITA isn’t magic, and there are some catches. One is simply that some assembly is required. Whatever legacy content you have lying around, some tweakery is needed in order to get it into DITA, and out again without losing to much of the original structural meaning.

Also, the spec itself was designed for software documentation. Though several people are taking a long, hard look at specialising it for educational applications (ADL, Edutech Wiki and OASIS itself), that’s not proven yet. Longer, non-screenful types of information have been done, but might not offer enough for those with, say, an existing docbook workflow.

The technology for the toolkit is of a robust, if pedestrian variety. All the elements and specialisations are in Document Type Definitions (DTDs) –a decidly retro XML technology– though you can use the hipper XMLSchema or RelaxNG as well. The toolkit itself is also rather dependent on extensive path hackery. High volume. real time content transformation is therefore probably best done with a new tool set.

Those tool issues are independent of the architecture itself, though. The one tool that would be difficult to remove is XSL Transforms, and that is pretty fundamental. Though ‘proper’ semantic web technology might have offered a far more powerful means to manipulate content meaning in theory, the more limited, but directly implementable XSLTs give it a distinct practical edge.

Finally, direct content authoring and editing in DITA XML poses the same problem that all structural content systems suffer from. Authors want to use MS Office, and couldn’t care less about consistent meaningful document structuring, while the Word format is a bit of a pig to transform and it is very difficult to extract a meaningful structure from something that was randomly styled.

Three types of solution exist for this issue: one is to use a dedicated XML editor meant for the non-angle bracket crowd. Something like XMLMind’s editor is pretty impressive and free to boot, but may only work for dedicated content authors simply because it is not MS Word. You can use MS Word with templates either directly with an plug-in, or with some post-processing via OpenOffice (much like ICE does). Those templates makes Word behave differently from normal, though, which authors may not appreciate.

Perhaps it is, therefore, best to go with the web oriented simplicity, and transform orientation of DITA, and use a Wiki. Wiki formats are so simple that mapping a style to a content structure is pretty safe and robust, and not too alien and complex for most users.

SCORM getting ready to leave ADL home

Friday, March 23rd, 2007

Not a new idea, but one that took a few decisive steps this week in a series of meetings in London: the moving of the SCORM educational content format out of the US Department of Defense’s ADL initiative, and into something that reflects the current SCORM user community. The parents will continue to cough up some fees, and provide a home if necessary, but would rather SCORM found its own way now.

The ’something that reflects the current SCORM user community’ is, of course, the crucial bit. A prospectus for such a something (LETSI- Learning, Education & Training Systems Interoperability) was introduced to coalesce thinking about the issue at the meetings. That prospectus tentatively assumed that LETSI would be a new, stand-alone organisation. Another acronym to put on the pile, if you like.

This is not universally popular, for the obvious reason that the education technology interoperability sphere is plenty fragmented as it is. Australia’s DEST (Department for Education, Science and Training) makes that point eloquently. In the London event, offers from existing organisations such as IMS to look after a LETSI like set-up where dismissed by the observation that not one of the existing organisations is truly representative of the community that now uses SCORM.

The other point of controversy is whether the organisation is to do more than just SCORM. As the DEST paper indicates, there is no concrete indication of what such work might look like in the prospectus, and the meetings didn’t clarify that further either.

After the meetings, both issues are still relatively open. What was decided was that a strategic and operational charter is to be drawn up by June, for ratification come September, in a meeting parallel to the ISO JTSC 1 SC36 (the educational technology arm of the most formal standards body) quarterly in Toronto. Contributions of time and/or money are solicited- there’ll be more on the adl community site about that.

It’ll be in that process that the heft of LETSI (or some other, better name) and its scope will be decided. It could be an open annual meeting and forum to tweak SCORM and nothing more, or it could be a full bells-and-whistles, big dollar membership organisation that looks at everything to do with education and interoperability, worldwide, etc, etc, etc.

It seems unlikely that we’ll fully escape another acronym, regrettably, but, on the bright side, you could argue that it is in effect a direct swap out of ADL for not-LETSI. That is, from the point of view of the education community, the total acronym population remains constant, but the ecosystem shifted slightly.