Innovation Networks

July 23rd, 2012

Realising benefits from applying ICT in post-compulsory education is something that might reasonably be described as innovation and networks of people and organisations provide an interesting means to achieve this aim. This is something that JISC and CETIS have been involved with for many years and we collectively have many lessons learned.

I recently set out to think about both innovation and innovation networks in a more structured way, being aware that these are complex topics with considerable existing literature, and to try to capture some of the "what it means for us" in the form of an essay. This is available in PDF and DOC formats.

innovation

Exploratory Data Analysis

May 18th, 2012

It doesn't take much to trigger me into a rant about the weaknesses of reports on data and "dashboards" purporting to be "analytics" or "business intelligence". Lots of pie charts and line graphs with added bling are as the proverbial red rag to a bull.

Until recently my response was to demand more rigorous statistics: hypothesis testing, confidence limits, tests for reverse causality (but recognising that causality is a slippery concept in complex systems). Having recently spent some time thinking about using data analysis to gain actionable insights, particularly in the setting of an educational institution, it has become clear to me that this response is too shallow. It embeds an assumption of a linear process: ask a question, operationalise it in terms of data and statistics and crunch some numbers. As my previous post indicates, I don't suppose all questions are approachable. Actually, thinking back to the ways I've  done a little text and data mining in the past, it wasn't quite like this either.

The label "exploratory data analysis" captures the antithesis to the linear process. It was popularised in statistical circles by John W Tukey in the early 1960's and he used it as a title for a highly influential book. Tukey was trying to challenge a statistical community that was very focused on hypothesis testing and other forms of "confirmatory data analysis". He argued that statisticians should do both, approaching data with flexibility and an open frame of mind and he saw having a well-stocked toolkit of graphical methods as being essential for exploration (Tukey was responsible for inventing a number of plot types that are now widely used).

Tukey read a paper entitled "The Technical Tools of Statistics" at the 125th Anniversary Meeting of the American Statistical Association in 1964 which anticipated the development of computational tools (e.g. R and RapidMiner), is well worth a read and has timeless gems like:

"Some of my friends felt that I should be very explicit in warning you of how much time and money can be wasted on computing, how much clarity and insight can be lost in great stacks of computer output. In fact, I ask you to remember only two points:

  1. The tool that is so dull that you cannot cut yourself on it is not likely to be sharp enough to be either useful or helpful.
  2. Most uses of the classical tools of statistics have been, are, and will be, made by those who know not what they do."

There is a correspondence between the open-minded and flexible approach to exploratory data analysis that Tukey advocated and the Grounded Theory (GT) Method of the social sciences. As a non-social scientist, GT seems to be a trying a bit too hard to be a Methodology (academic disputes and all) but the premise of using both inductive and deductive reasoning and going in to a research question free of the prejudice of a hypothesis that you intend to test (prove? how often is data analysed to find a justification for a prejudice?) is appealing.

Although GT is really focussed on qualitative research, some of the practical methods that the GT originators and practitioners have proposed might be applicable to data captured in IT systems and for practitioners of analytics. I quite like the dictum of "no talk" (see the wikipedia entry for an explanation).

My take home, then, is something like: if we are serious about analytics we need to be thinking about exploratory data analysis and confirmatory data analysis and the label "analytics" is certainly inappropriate if neither is occurring. For exploratory data analysis we need: visualisation tools, an open mind and an inquisitive nature.

analytics

A Poem for Analytics

May 18th, 2012

There are many traps for the unwary in the practice of analytics, which I take to be the process of developing actionable insights through problem definition and the application of statistical models. The technical traps are most obvious but the epistemological traps are better disguised.

That these traps exist and are seemingly not recognised in the commercial and corporate rhetoric around analytics worries the more philosphically-minded; Virginia Tech's Garner Campbell has shared some clear and well-received thoughts on the potential for damaging reductionism in Learning Analytics. I particularly like Anne Zelenka's blogged reaction to Gardner's LAK12 MOOC (I believe there is a recording but elluminate recordings don't seem to play on linux) and my colleague Sheila has also blogged on the topic.

I don't see reduction as being the issue per se but careless reductionism and failing to remember that our models are surrogates for what might be does worry me. Analytics does give us power for "myth busting" and a means to reduce the degree to which anecdote, prejudice and the opinion of the powerful determines action but let us be very wary indeed.

This all reminded me of the following poem by my favourite poet and mythographer, Robert Graves. Let us be slow.

In Broken Images

He is quick, thinking in clear images;
I am slow, thinking in broken images.

He becomes dull, trusting to his clear images;
I become sharp, mistrusting my broken images,

Trusting his images, he assumes their relevance;
Mistrusting my images, I question their relevance.

Assuming their relevance, he assumes the fact,
Questioning their relevance, I question the fact.

When the fact fails him, he questions his senses;
When the fact fails me, I approve my senses.

He continues quick and dull in his clear images;
I continue slow and sharp in my broken images.

He in a new confusion of his understanding;
I in a new understanding of my confusion.

Robert Graves

analytics

Making Sense of “Analytics”

May 2nd, 2012

There is currently a growing interest in increasing the degree to which data from various sources can be put to use by organisations to be more effective and a growing number of strategies for doing this. The term “analytics” is frequently being applied to descriptions of these situations but often without clarity as to what the word is intended to mean. This makes it difficult to make sense of what is happening, to decide what to appropriate from other sectors, and to make creative leaps forward in exploring how to adopt analytics.

I have just completed a public draft of a paper entitled "Making Sense of Analytics: a framework for thinking about analytics" [link removed - please visit our publications site to access the final versions] in an attempt to help anyone who is grappling with these questions in relation to post-compulsory education (as I am). It does so by:

  • considering the definition of “analytics”;
  • outlining analytics in relation to research management, teaching and learning or whole-institution strategy and operational concerns;
  • describing some of the key characteristics of analytics (the Framework).

The Framework is intended to support critical evaluation of examples of analytics, whether from commerce/industry or the research community, without resorting to definition of application or product categories. The intention behind this approach is to avoid discussion of "what it is" and to focus on "what it does" and "how it does it".

This is a draft. Please feel free to comment via this blog or directly to me. A revised version will be published in June.

This paper is the first of a series that CETIS is producing and commissioning. These will be emerging during the coming months and collected together in a unified online resource in July/August. This is referred to briefly by Sheila MacNeill in her recent post "Learning Analytics, where do you stand?"

analytics

UK Government Open Standards Consultation - CETIS Response

May 2nd, 2012

Earlier this year the UK Government Cabinet Office published what I thought was a rather good set of proposals for the role of open standards in government IT. They describe it as a "formal public consultation on the definition and mandation of open standards for software interoperability, data and document formats in government IT." There are naturally points where we have critical comments but the direction of travel is broadly one that CETIS supports. The topic of mandation is, however, one to be approached with a great deal of caution in our view.

Our full response, which should be read alongside the consultation document (which includes the questions), is available for your information.

The consultation has now been extended to June 4th 2012 following the revelation of a conflict of interest; the chair of a public consultation meeting in April was found to be also working for Microsoft. This is the latest in a long series of concerns about Microsoft lobbying reported in Computer Weekly and elsewhere. I am actually encouraged by the Cabinet Office response both to FoI requests linked to meetings with Microsoft and to this recent revelation; they do seem to be trying to do the right thing.

standards

Analytics and Big Data - Reflections from the Teradata Universe Conference 2012

April 27th, 2012

As part of our current work on investigating trends in analytics and in contextualising it to post-compulsory education - which we are calling our Analytics Reconnoitre - I attended the Teradata Universe Conference recently. Teradata Universe is very much not an academic conference; this was a trip to the far side of the moon, to the land of corporate IT, grey-suits galore and a dress code...

Before giving some general impressions and then following with some more in-depth reflections and arising thoughts, I should be clear about the terms "analytics" and "big data".

My working definition for Analytics, which I will explain in more detail in a forthcoming white paper and associated blog post is:
"Analytics is the process of developing actionable insights through problem definition and the application of statistical models and analysis against existing and/or simulated future data."

I am interpreting Big Data as being data that is at such a scale that conventional databases (single server relational databases) can no longer be used.

Teradata has a 30 year history of selling and supporting Enterprise Data Warehouses so it should not have been a surprise that infrastructure figured in the conference. What was surprising was the degree to which infrastructure (and infrastructural projects) figured compared to applications and analytical techniques. There were some presentations in which brief case studies outlined applications but I did not hear any reference to algorithmic, methodological, etc development nor indeed any reference to any existing techniques from the data mining (a.k.a. "knowledge discovery in databases") repertoire.

My overall impression is that the corporate world is generally grappling with pretty fundamental data management issues and generally focused on reporting and descriptive statistics rather than inferential and predictive methods. I don't believe this is due to complacency but simply to the reality of where they are now. As the saying goes "if I was going there, I wouldn't start here".

The Case for "Data Driven Decisions"

Erik Brynjolfsson, Director of the MIT Center for Digital Business, gave an interesting talk entitled "Strength in Numbers: How do Data-Driven Decision-Making Practices affect Performance?"

The phrase "data driven decisions" raises my hackles since it implies automation and the elimination of the human component. This is not an approach to strive for. Stephen Brobst, Teradata CTO, touched on this issue in the last plenary of the conference when he asserted that "Sucess = Science + Art" and backed up the assertion with examples. Whereas my objections to data driven decisions revolve around the way I anticipate such an approach would lead to staff alientation and significant disruption to effective working of an organisational, Brobst was referring to the pitfall trap of incremental improvement leading to missed opportunities for breakthrough innovation.

As an example of a case where incremental improvement found a locally optimal solution but a globally sub-optimal one, Brobst cited actuarial practice in car insurance. Conventionally, risk estimation uses features of the car, the driver's driving history and location and over time the fit between these parameters and statistical risk has been honed to a fine point. It turns out that credit risk data is actually a substantially better fit to car accident risk, a fact that was first exploited by Progressive Insurance back in 1996.

Rather than "data driven decisions", I advocate human decisions supported by the use of good tools to provide us with data-derived insights. Paul Miller argues the same case against just letting the data speak for itself on his "cloud of data" blog.

This is, I should add, something Brynjolfsson and co-workers also advocate; they are only adopting terminology from the wider business world. See, for example an article in The Futurist (Brynjolfsson, Erik and McAfee, Andrew, "Thriving in the Automated Economy" The Futurist, March-April 2012.). In this article, Brynjolfsson and McAfee make the case for partnering humans and machines throughout the world of work and leisure. They cite an interesting example of the current best chess "player" in the world, which is 2 amateur American chess players using 3 computers. They go on to make some specific recommendations to try to make sure that we avoid some socio-economic pathologies that might arise from a humans vs technology race (as opposed to humans with machines), although not everyone will find all the recommendations ethically acceptable.

To return to the topic of Brynjolfsson's talk, which is expanded in a paper of the same title (Brynjolfsson, Erik, Hitt, Lorin and Kim, Heekyung "Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance", April, 2011). The abstract:
"We examine whether performance is higher in firms that emphasize decisionmaking based on data and business analytics (which we term a data-driven decisionmaking approach or DDD). Using detailed survey data on the business practices and information technology investments of 179 large publicly traded firms, we find that firms that adopt DDD have output and productivity that is 5-6% higher what would be expected given their other investments and information technology usage. Using instrumental variables methods, we find evidence that these effects do not appear to be due to reverse causality. Furthermore, the relationship between DDD and performance also appears in other performance measures such as asset utilization, return on equity and market value. Our results provide some of the first large scale data on the direct connection between data-driven decisionmaking and firm performance."

This is an important piece of research, adding to a relatively small existing body - which shows correlation between high levels of analytics use and factors such as growth (see the paper) - and one which I have no doubt will be followed up. They have taken a thorough approach to the statistics of correlation and tested for reverse causation. The limitations of the conclusion is clear from the abstract, however in "large publicly traded firms". What of smaller firms? Futhermore, business sector (industry) is treated as a "control" but my hunch is that the 5-6% figure conceals some really interesting variation. The study also fails to establish mechanism, i.e. to demonstrate what it is about the context of firm A and the interventions undertaken that leads to enhanced productivity etc. These kinds of issues with evaluation in the social sciences are the subject of writings by Nick Tilley and Ray Pawson (see for example, "Realistic Evaluation: An Overview") which I hold in high regard. My hope is that future research will attend to these issues. For now we must settle for less complete, but still useful, knowledge.

I expect that as our Analytics Reconnoitre proceeds we will return to this and related research to explore further whether any kind of business case for data-driven decisions can be robustly made for Higher or Further Education, or whether we need to gather more evidence by doing. I suspect the latter to be the case and that for now we will have to resort to arguments on the basis of analogy and plausibility of benefits.

Zeitgeist: Data Scientists

"Data Scientist" is a term which seems to be capturing the imagination in the corporate big data and analytics community but which has not been much used in our community.

A facetious definition of data scientist is "a business analyst who lives in California". Stephen Brobst gave his distinctions between data scientist and business analyst in his talk. His characterisation of a business analyst is someone who: is interested in understanding the answers to a business question; uses BI tools with filters to generate reports. A data scientist, on the other hand, is someone who: wants to know what the question should be; embodies a combination of curiosity, data gathering skills, statistical and modelling expertise and strong communication skills. Brobst argues that the working environment for a data scientist should allow them to self-provision data, rather than having to rely on what is formally supported in the organisation, to enable them to be inquisitive and creative.

Michael Rappa from the Institute for Advanced Analytics doesn't mention curiosity but offers a similar conception of the skill-set for a data scientist in an interview in Forbes magazine. The Guardian Data Blog has also reported on various views of what comprises a data scientist in March 2012, following the Strata Conference.

While it can be a sign of hype for new terminology to be spawned, the distinctions being drawn by Brobst and others are appealing to me because they are putting space between mainstream practice of business analysis and some arguably more effective practices. As universities and colleges move forward, we should be cautious of adopt the prevailing view from industry - the established business analyst role with a focus on reporting and descriptive statistics - and miss out on a set of more effective practices. Our lack of baked-in BI culture might actually be a benefit if it allows us to more quickly adopt the data scientist perspective alongside necessary management reporting. Furthermore, our IT environment is such that self-provisioning is more tractable.

Experimentation, Culture and HiPPOs

Like most stereotypes, the HiPPO is founded on reality; this is decision-making based on the Highest Paid Person's Opinion. While it is likely that UK universities and colleges are some cultural distance from the world of corporate America that stimulated the coining of "HiPPO", we are certainly not immune from decision-making on the basis of management intuition and anecdote suggests that many HEIs are falling into more autocratic and executive style management in response to a changing financial regime. As a matter of pride, though, academia really should try to be more evidence-based.

Avanish Kaushik (Digital Marketing Evangelist at Google) talked of HiPPOs and data driven decision making (sic) culture back in 2006 yet in 2012 these issues are still main stage items at Teradata 2012. Cultural inertia. In addition to proposing seven steps to becoming more data-driven, Kaushik's posting draws the kind of distinctions between reporting and analysis that accords with the business analyst vs data scientist distinctions, above.

Stephen Brobst's talk - "Experimentation is the Key to Business Success" - took a slightly different approach to challenging the HiPPO principle. Starting from an observation that business culture expects its leadership to have the answers to important and difficult questions, something even argumentative academics can still be found to do, Brobst argued for experimentation to acquire the data necessary for informed decision-making. He gained a further nod from me by asserting that the experiment should be designed on the basis of theorisation about mechanism (see earlier reference to the work of Tilley and Paulson).

Proctor and Gamble's approach to pricing a new product by establishing price elasticity through a set of trial markets with different price points is one example. It is hard to see this being tractable for fee-setting in most normal courses in most universities but maybe not for all and it becomes a lot more realistic with large-scale distance education. Initiatives like coursera have the opportunity to build-out for-fee services with much better intelligence on pricing than mainstream HE can dream of.

Big Data and Nanodata Velocity

There is quite a lot of talk about Big Data - data that is at such a scale that conventional databases can no longer be used - but I am somewhat sceptical that the quantity of talk is merited. One presenter at Teradata Universe actually proclaimed that big data was largely an urban myth but this was not the predominant impression; others boasted about how many petabytes of data they had (1PB = 1,000TB = 1,000,000GB). There seems to be an unwarranted implication that big data is necessary for gaining insights. While it is clear that more data points improves the statistical significance and that if you have a high volume of transactions/interactions then even small % improvements can have significant bottom line value (e.g. a 0.1 increase in purchase completion at Amazon), there remains a great deal of opportunity to be more analytical in the way decisions are made using smaller scale data sources. The absence of big data in universities and colleges is an asset, not an impediment.

Erik Brynjolfsson chose the term "nanodata" to draw attention to the fine-grained components of most Big Data stores. Virtually all technology-mediated interactions are capable of capturing such "nanodata" and many do. The availability of nanodata is, of course, one of the key drivers of innovation in analytics. Brynjolfsson also pointed to data "velocity", i.e. the near-real-time availability of nanodata.

The insights gained from using Google search terms to understand influenza is a fairly well-known example of using the "digital exhaust" of our collective activities to short-cut traditional epidemiological approaches (although I do not suggest it should replace them). Brynjolfsson cited a similar approach used in work with former co-worked Lynn Wu on house prices and sales (pdf), which anticipated official figures rather well. The US Federal Reserve Bank, we were told, was envious.

It has taken a long time to start to realise the vision of Cybersyn. Yet still our national and institutional decision-making relies on slow-moving and broadly obsolete data; low velocity information is tolerated when maybe it should not be. In some cases the opportunities from more near-real-time data may be neglected low-hanging fruit, and it doesn't necessarily have to be Big Data. Maybe there should be talk of  Fast Data?

Data Visualisation

Stephen Few, author and educator on the topic of "visual business intelligence", gave both a keynote and a workshop that could be very concisely summarised as a call to: 1) take more account of how human perception works when visualising data; 2) make more use of visualisation for sense-making. Stephen Brobst (Teradata CTO) made the latter point too: that data scientists use data visualisation tools for exploration, not just for communication.

Few gave an accessible account of visual perception as applied to data visualisation with some clear examples and reference to cognitive psychology. His "Perceptual Edge" website/blog covers a great deal of this - see for example "Tapping the Power of Visual Perception" (pdf) - as does his accessible book, "Now You See It". I will not repeat that information here.

His argument that "visual reasoning" is powerful is easily demonstrated by comparing what can be immediately understood from the right kind of graphical presentation with tabulation of the data. The point that visual reasoning usually happens transparently (subconsciously) and hence that we need to guard against visualisation techniques that mislead, confuse of overwhelm.

I did feel that he advocated visual reasoning beyond the point at which it is reliable by itself. For example, I find parallel coordinates quite difficult. I would have liked to see more emphasis on visualising the results of statistical tests on the data (e.g. correlation, statistical significance) particularly as I am a firm believer that we should know of the strength of an inference before deciding on action. Is that correlation really significant? Are those events really independent in time?

Few's second key point - about the use of data visualisation for sense-making - began with claims that the BI industry has largely failed to support it. He summarised the typical pathway for data as: collect > clean > transform > integrate > store > report. At this point there is a wall, Few claims, that blocks a productive sense-making pathway: explore > analyse > communicate > monitor > predict.

Visualisation tools tend to have been created with before-the-wall use cases, to be about the plot in the report. I rather agree with Few's criticism that such tool vendors tend to err towards a "bling your graph" feature-set or flashy dashboards but there is hope in the form of tools such as Tibco Spotfire and Tableau, while Open Source afficionados or the budget-less can use ggobi for complex data visualisation, Octave or R (among others). The problem with all of these is complexity; the challenge to visualistion tool developers is to create more accessible tools for sense-making. Without this kind of advance it requires too much skill acquisition to move beyond reporting to real analytics and that limits the number of people doing analytics that an organisation can sustain.

It is worth noting that "Spotfire Personal" is free for one year and that "Tableau Public" is free and intended to let data-bloggers et al publish to their public web servers, although I have not yet tried them.

Analytics & Enterprise Architecture

The presentation by Adam Gade (CIO of Maersk, the shipping company) was ostensibly about their use of data but it could equally have been entitled "Maersk's Experiences with Enterprise Architecture". Although at no point did Gade utter the words "Enterprise Architecture" (EA), many of the issues he raised have appeared in talks at the JISC Enterprise Architecture Practice Group: governance, senior management buy-in, selection of high-value targets, tactical application, ... etc. It is interesting to note that Adam Gade has a marketing and sales background - not the norm for a CIO - yet seems to have been rather successful; maybe he could sell the idea internally?

The link between EA and Analytics is not one which has been widely made (in my experience and on the basis of Google search results) but I think it is an important one which I will talk of a little more in a forthcoming blog post, along with an exploration of the Zachman Framework in the context of an analytics project. It is also worth noting that one of the enthusiastic adopters of our ArchiMate (TM) modelling tool, "Archi", is Progressive Insurance which established a reputation as a leader in putting analytics to work in the US insurance industry (see, for example the book Analytics at Work, which I recommend, and the summary from Accenture, pdf).

Adam Gade also talked of the importance of "continuous delivery", i.e. that analytics or any other IT-based projects start demonstrating benefits early rather than only after the "D-Day". I've come across a similar idea - "time to value" - being argued for as being more tactically important than return on investment (RoI). RoI is, I think, a rather over-used concept and a rather poor choice if you do not have good baseline cost models, which seems to be the case in F/HEIs. Modest investments returning tangible benefits quickly seems like a more pragmatic approach than big ideas.

Conclusions - Thoughts on What this Means for Post-compulsory Education

For all the general perception is that universities and colleges are relatively undeveloped in terms of putting business intelligence and analytics to good use, I think there are some important "but ..." points to make. The first "but" is that we shouldn't measure ourselves against the most effective users from the commercial sector. The second is that the absence of entrenched practices means that there should be less inertia to adopting the most modern practices. Third, we don't have data at the scale that forces us to acquire new infrastructure.

My overall impression is that there is opportunity if we make our own path, learning from (but not following) others. Here are my current thoughts on this path:

Learn from the Enterprise Architecture pioneers in F/HE

Analytics and EA are intrinsically related and the organisational soft issues in adopting EA in F/HE have many similarities to those for adopting analytics. One resonant message from the EA early adopters, which can be adapted for analytics, was "use just enough EA".

Don't get hung up on Big Data

While Big Data is a relevant technology trend, the possession of big data is not a pre-requisite for making effective use of analytics. The fact that we do not have Big Data is a freedom not a limitation.

Don't focus on IT infrastructure (or tools)

Avoid the temptation (and sales pitches) to focus on IT infrastructure as a means to get going with analytics. While good tools are necessary, they are not the right place to start.

Develop a culture of being evidence-based

The success of analytics depends on people being prepared to critically engage with evidence based on data (including its potential weaknesses or bias and to avoid being over-trusting of numbers) and to take action on the analysis rather then being slaves to anecdote and the HiPPO. This should ideally start from the senior management. "In God we trust, all others bring data" (probably mis-attributed to W. Edwards Deming).

Experiment with being more analytical at craft-scale

Rather than thinking in terms of infrastructure or major initiatives, get some practical value with the infrastructure you have. Invest in someone with "data scientist" skills as master crafts-person and give them access to all data but don't neglect the value of developing apprentices and of developing wider appreciation of the capabilities and limitations of analytics.

Avoid replicating the "analytics = reporting" pitfall

While the corporate sector fights its way out of that hole, let us avoid following them into it.

Ask questions that people can relate to and that have efficiency or effectiveness implications

Challenge custom and practice or anecdote on matters such as: "do we assess too much?", "are our assessment instruments effective and efficient?", "could we reduce heating costs with different use of estate?", "could research groups like mine gain greater REF impact through publishing in OA journals?", "how important is word of mouth or twitter reputation in recruiting hard-working students?", "can we use analytics to better model our costs?"

Look for opportunities to exploit near-real-time data

Are decisions being made on old data, or no changes being made because the data is essentially obsolete? Can the "digital exhaust" of day-to-day activity be harnessed as a proxy for a measure of real interest in near-real-time?

Secure access to sector data

Sector organisations have a role to play in making sure that F/HEIs have access to the kind of external data needed to make the most of analytics. This might be open data or provisioned as a sector shared service. The data might be geospatial, socio-economic or sector-specific. JISC, HESA, TheIA, LSIS and others have roles to play.

Be open-minded about "analytics"

The emerging opportunities for analytics lie at the intersection of practices and technologies. Different communities are converging and we need to be thinking about creative borrowing and blurring of boundaries between web analytics, BI, learning analytics, bibliometrics, data mining, ... etc. Take a wide view.

Collaborate with others to learn by doing

We don't yet know the pathway for F/HE and there is much to be gained from sharing experiences in dealing with both the "soft" organisational issues and the challenge of selecting and using the right technical tools. While we may be competing for students or research funds, we will all fail to make the most from analytics and to effectively navigate the rapids of environmental factors if we fail to collaborate; competitive advantage is to be had from how analytics is applied  but that can only occur if capability exists.

analytics

New Draft British Standard - Exchanging Course Related Information

February 29th, 2012

The two parts of a draft British Standard (BS), "BS 8581 - Exchanging course related information – Course advertising profile" have recently been released for public comment on the British Standard Institute "Draft Review" website.

This standard is heavily based on the XCRI-CAP 1.2 specification, which has been developed and piloted over the past few years with support from JISC and CETIS, and would create a British Standard that is consistent with the European Standard "Metadata for Learning Opportunities - Advertising" (EN 15982, also to be adopted as BS) but extends it and provides more detail suited to UK application.

The two parts for public review, which closes on April 30th 2012, are:

Registration is required to access the drafts and comment.

Background information and details of implementations of XCRI may be found on the XCRI-CAP website.

standards

EdTech Blogs - a visualisation playground

February 21st, 2012

During the CETIS Conference today (Feb 22nd), I showed a few graphs, plots and other visualisations that show the results of text mining around 7500 blog posts, mostly from 2011 and into early 2012. These were crawled by the RWTH Aachen University "Mediabase".

There are far too many to show here and each of three analyses has its separate auto-generated output, which is linked to below. Each of these outlines key aspects of the method and headline statistics. I am quite aware that it is bad practice just to publish a load of visualisations without either an explicit or implicit story. If this bothers you, you might want to stop now, or visit my short piece "East and West: two worlds of technology enhanced learning", which uses the first method outlined below but is not such a "bag of parts". If you want to weave your own story... read on!

Stage 1: Dominant Themes

The starting point is simply to look at the dominant themes in blog posts from 2011 and early 2012 through the lens of frequent terms used. Common words with little significance (stop words) are removed and similar words are aggregated (e.g. learn, learner, learning). This set of blog posts is then split into two sets: those from CETIS and those from a broadly representative set of Ed Tech blogs. The frequent terms are then filtered into those that are statistically more significant in the CETIS set and those that are statistically more significant in the Ed Tech set.

The results of doing this are: "Comparison: CETIS Blogging vs EdTech Bloggers Generally (Jan 2011-Feb 2012)"

Co-occurrence Pattern - Ed Tech Blogger Frequent Terms

Co-occurrence Pattern - Ed Tech Blogger Frequent Terms. (see the "results" link above for explanation and more...)

Stage 2: Emerging and Declining Themes

Stage 2a: Finding Rising and Falling Terms

In this case, I home in on CETIS blogs only, but go back further in time: to January 2009. The blog posts are split into two sets: one contains posts from the last 6 months and the other contains posts since the end of January 2009. The distribution of terms appearing in each set is compared to find those which are statistically significant in the change, taking into account the sample size. This process identifies four classes of term: terms that appear anew in recent months, terms that rose from very low frequencies, those that rose from moderate or higher frequencies and those that fell (or vanished).

The results of doing this are: "Rising and Falling Terms - CETIS Blogs Jan 31 2012". This has a VERY LARGE number of plots, many of which can be skipped over but are of use when trying to dig deeper. This auto-generated report also contains links to the relevant blog posts and ratings for "novelty" and "subjectivity".

Significant Falling Terms

Significant Falling Terms

Stage 2b: Visualising Changes Over Time

Various terms were chosen from Stage 2a and the changes in time rendered using the (in-) famous "bubble chart". Although these should not be taken too seriously since the quantity of data per time step is rather small, these allow for quite a lot of experimentation with a range of related factors: term frequency, number of documents containing the term, positive/negative sentiment in posts containing the term. Four separate charts were created for CETIS blogs from 2009-2012: Rising, Established, Falling and Familiar (dominant terms from Stage 1). The dominant non-CETIS terms are also available, but only for 2011.

Final Words

Due to some problems with the blog crawler, a number of blogs could not be processed or had incompletely extracted postings so this is not truly representative. The results are not expected to change dramatically but there will be some terms appearing and some disappearing when these issues are fixed. This posting will be altered and the various auto-generated reports will be re-generated in due course.

The R code used, and results from using the same methods on conference abstracts/papers are available from my GitHub. This site also includes some notes on the technicalities of the methods used (i.e. separate from the way these were actually coded).

Uncategorized, telmap

The Network of Society of Scholars (Fiction)

February 20th, 2012

As preparation for the session "Emerging Reality: Making sense new models of learning organisations" at this week's CETIS Conference (which is a session hosted by the TELMap Project), I have created the following scenario to try to make real some plausible drivers/issues/etc. The session will be debating the plausibility of these and other issues and hence their potential shapers of future learning organisations. I will emend this posting once the outcomes of the workshop are published.

The scenario is pure fiction, an informal speculation about something that might happen by around 2020-5.

The Scenario

What is a “Society of Scholars”?

A Society of Scholars is based in one or more large old houses with a combination of study-bedrooms, communal cooking and social spaces, a library and a central seminar. Students and some of the Fellows live there while many Fellows live with their families and study there.

There are no fixed courses but a framework within which depth and breadth of scholarship is guided and measured. This framework is validated by an established University, which awards the degree and provides QA (all for a fee). The Network of Societies of ScholarsTM has additional ethical codes and strict membership rules.
The physical co-location is a central part of the Society, combined with the wider (virtual) network of peers.

Genesis

Societies of Scholars sprang out of an initial “wild card” experiment where a small group of progressive academics with experience of inquiry-based learning pooled their redundancy payments from one of many rounds of staff-culling. A few sold their houses.

Their idea was to strip out the accumulation of both central services and formality of teaching and learning setting and to get back to basics while reducing cost and being able to do more of what they enjoy: thinking and talking. In doing this they hoped to attract students who were otherwise being asked to pay ever higher fees to endure ever more “commoditised” offerings and suffer poor employment prospects. The promise of high wages to pay off high debt is elusive for many who follow the conventional route. Graduate employment and student satisfaction are worse for those who opt for the newer “no frills degree course” offerings, which have cut costs without re-inventing the educational experience.

For several years they struggled to attract students but gradually a few gifted students managed to develop ultra-high web reputations started to attract more applications. The turning point was the winning of an international prize for work on “Smart Cities”, which led to a media frenzy in 2018. This triggered a spate of endowments of new Societies by successful entrepreneurs and the establishment of satellite societies to Cambridge and Oxford Universities in the UK and ETH Zurich in Switzerland with others quickly following (all recognising the threat but also the early-mover opportunity).

Character of a Society of Scholars

Societies are highly reputation conscious as are the individuals within them. They are highly effective at using the web, what we called “new media” in the naughties and in media management generally.

With the exception of assessment, Fellows and Students undertake essentially the same kind of activities; the Students strive to emulate the attitude and work of the fellows. Both divide their time between private study, informal and formal discussion. Collaboration works. There is no “Fellows teach Students”; all teach each other through the medium of the seminar. All consider “teaching the world” to be an important (but not dominating) part of what they do.

The selection process plays a key role in shaping the character of the Society. Students are admitted NOT primarily on the basis of examination grades but on evidence of self-discipline, self-awareness and especially self-directed intellectual activity.

Course and Assessment

There are no specified courses and all Students follow a unique pathway of their own. Fellows offer guidance and almost all Students piece together a collection of topics that are identifiable (e.g. similar to a conference theme, a textbook, etc). There is no fixed minimum or maximum period of study.

Societies typically focus on 3-4 disciplines but always adopt a multi-disciplinary perspective, for example computer science, electronic engineering, built environment and social theory was the combination that led to the “Smart Cities” prize.

Online resources are exploited to the fullest extent. Free or cheap MOOCs (massively open online courses, especially the form pioneered by Stanford University and udacity.com) are combined with the for-fee examinations offered alongside them.

Wikipedia is considered to be a “has been”; Society members (across the Network) and others collaborate on DIY textbooks using a system build on top of “git” (permitting multiple versions, derivatives, etc see GitHub for a "social coding" example) and a decentralised network of small servers. While being widely useful this activity is also a valued learning activity with the side effect of promoting coherence in the study pathway.

Assessment is complicated primarily by the idiosyncrasy of all pathways but also by the need to connect achievement to the breadth/depth framework. An award is typically evidenced by a mixture of: externally taught and examined modules; public examinations of the University of London; a patchwork of personal work (a “portfolio”); contributions to the DIY textbooks; seminar performance.

Demand and Expansion

Societies of Scholars are niche occupiers in a much wider higher education landscape. Demand is no more than 5% and supply  only about 3% in 2025. There is a feeling that graduates of the Societies are the “new elite”.

While some politicians call for the massification of the Society concept, society at large recognises that they need a special kind of student: more of an intellectual entrepreneur. The rise of the Society of Scholars has, however, started to change the way society understands (and answers) questions like: “what is the purpose of education?”; “how does learning happen?”... The long-term effect of this change on the face of education is not known yet (2025).

Employers in particular have understood what Societies offer and, while graduate unemployment for those following a conventional route to a degree remains close to 2012 levels, Society graduates are highly employable. Employers value: creativity, good communication skills, media-savvy people, multi-disciplinary thinking, self-motivation, intellectual flexibility, collaborative and community-oriented lifestyle.

The Drivers/Issues

This is a summary of some of the implicit or explicit assumed drivers/issues embedded in the scenario and which determine the plausibility of it (or alternatives). They are intentionally phrased as statements that could be disagreed with, argued for, ...

  1. Physical co-location and (especially intimate) face-to-face interactions will continue to be seen to be an essential aspect of high quality education. Students who can afford (or otherwise access) this will generally do so. Employers will value awards arising from courses containing it more highly than those that do not. Telegraph newspaper article.
  2. Graduate unemployment will be an issue for years to come. Effective undergraduates will find ways to distinguish themselves. HESA Statistics
  3. Wikipedia (and similar centralised "web commons" services) are unsustainable in their current form. As the demand from users rises and the support from contributors and sponsors wanes (it becomes less cool to be a Wikipedian) a point of unsustainability is reached. One option is to monetise but another is to "go feral" and transition to peer-to-peer or decentralised approaches. Digital Trends article.
  4. Universities and colleges will increase the supply of course and educational components, disaggregated from "the course", "the programme" and "the institutional offering". Examinations, Award Granting and Quality Assurance are all potentially independent marketable offerings. David Willets article on the BBC (see "Flexible Learning")
  5. Cheap large-scale online courses are capable of replacing a significant percentage of conventional teaching time. The “Introduction to AI” course demonstrated this: see http://is.gd/JOseb.
  6. Employers are conservative when it comes to education. While employers bemoan narrow knowledge of graduates, poor "soft skills", etc, their shortlisting criteria continue to favour candidates with conventional degree titles and high grades from research-intensive universities. They will generally fail to take advantage of rich portfolio evidence.

jiscobs, telmap

The Stanford “Introduction to AI” Course - the sign of a disruptive innovation?

February 8th, 2012

Over on the JISC Observatory website a recent interview with Seb Schmoller has just been published in which he talks about his experiences - from the perspective of an online distance educator - of the recent large scale open online course "Introduction to AI" run in association with Stanford University. As the interview unfolded it occurred to me that the aspects of the course that had struck Seb as being of potentially profound importance fitted the criteria for a "low end disruptive innovation" in the terminology of innovation theorist Clayton M Christensen. Low end disruption refers to the way apparently well-run businesses could be disrupted by newcomers with cheaper but good-enough offerings that focus on core customer needs and often make use generic off-the-shelf technologies.

Interesting stuff to ponder...

(interview on the JISC Observatory site)

Uncategorized