SNA session at CETIS 12

I attended the SNA session at the CETIS conference hosted by Lorna, Sheila, Tony and Martin. Before the session I had blogged about some of the questions I had on SNA and although I think I have more new questions than answers I feel like things are much clearer now. My mind is still going over the conversations that were had at the session but these are the main themes and some early thoughts that I came away with.

What are the quick wins?
At the start of the session Sheila asked the question ‘What are the quick wins?’.   While Tony and Martins presentations were excellent I think it is hard for people who don’t have their head regularly in this space to replicate the techniques quickly. Lorna said that although she understood what was happening in the SNA examples there was some ‘secret magic that she couldn’t replicate when doing it for herself, Tony agreed that when you work in this area for a while you start to develop a workflow and understand some of the quirks of the software. I could relate to Lorna’s dilema as it took me a few hours of using Gephi just to know exactly when I needed to force quit the program and start all over again.

So for people who want to find out useful information about social networks but don’t have the time to get into the secret magic of SNA can we develop quick and simple tools that answer quick and simple questions?

The crossover between data driven visualisations and SNA
The session helped me make a clear distinction between Data Driven Journalism and SNA . While there is a crossover between the two the reasons for doing them are quite different. SNA is a way to study social networks and data driven visualisations are a way to convey a story to an audience. Although the two do cross over I found that making distinctions between them both helped me get to grips with the ‘why is it worth doing this’ question.

Data Validation
Martin made the point that when he was playing with PROD data to create visualisations he found that it was a great way of validating the data itself as he managed to spot errors and feed that back to Wilbert and myself.

Lies, Damned Lies and Pretty Pictures
Amber Thomas did a fantastic presentation, if you missed the session it is available here. I felt Amber had really thought about the ‘How is this useful?’ question and I felt lots of pieces of the puzzle click into place during the presentation. I really recommend spending the time to go through the slides.

Thanks to Sheila, Lorna, Amber, Tony and Martin for an interesting session.

Standards used in JISC programmes and projects over time

Today I took part in an introduction to R workshop being held at The University of Manchester. R is a software environment for statistics  and while it does all sorts of interesting things that are beyond my ability one thing that I can grasp and enjoy is exploring all the packages that are available for R, these packages extend Rs capabilities and let you do all sorts of cool things in a couple of lines of code.

The target I set out for myself was to use JISC CETIS Project Directory data and find a way of visualising standards used in JISC funded projects and programmes over time. I found a Google Visualisation package and using this I was surprised at how easy it was to generate an output , the hardest bits being manipulating the data (and thinking about how to structure it).  Although my output from the day is incomplete I thought I’d write up my experience while it is fresh in my mind.

First I needed a dataset of projects, start dates, standards and programme. I got the results in CSV format by using the sparqlproxy web service that I use in this tutorial and stole and edited a query from Martin

Sparql:

PREFIX rdfs:
PREFIX jisc:
PREFIX doap:
PREFIX prod:
SELECT DISTINCT ?projectID ?Project ?Programme ?Strand ?Standards ?Comments ?StartDate ?EndDate
WHERE {
?projectID a doap:Project .
?projectID prod:programme ?Programme .
?projectID jisc:start-date ?StartDate .
?projectID jisc:end-date ?EndDate .
OPTIONAL { ?projectID prod:strand ?Strand } .
# FILTER regex(?strand, “^open education”, “i”) .
?projectID jisc:short-name ?Project .
?techRelation doap:Project ?projectID .
?techRelation prod:technology ?TechnologyID .
FILTER regex(str(?TechnologyID), “^http://prod.cetis.ac.uk/standard/”) .
?TechnologyID rdfs:label ?Standards .
OPTIONAL { ?techRelation prod:comment ?Comments } .
}

From this I created a pivot table of all standards, and how much they appeared in each projects and programmes for each year (using the project start date). After importing this into R, it took two lines to grab the google visualisation package and plot this as Google Visualisation Chart.

library(“googleVis”)
M = gvisMotionChart(data=prod_csv, idvar=”Standards”, timevar=”Year”, chartid=”Standards”)

Which gives you the ‘Hans Rosling’ style flow chart. I can’t get this to embed in my wordpress blog, but you can click the diagram to view the interaction version. The higher up a standard is the more projects it is in and the further across it goes the more programmes it spans.

Google Visualisation Chart

Some things it made me think about:

  1. Data from PROD is inconsistent
  2. Standards can be spelt differently; some programmes/projects might have had a more time spent on inputting related standards than others

  3. How useful is it?
  4. This was extremely easy to do, but is it worth doing? I feel it has value for me because its made me think about the way JISC CETIS staff use PROD and the sort of data we input. Would this be of value to anybody else?  Although it was interesting to see the high number of projects across three programmes that involved XCRI in 2008.

  5. Do we need all that data?
  6. There are a lot of standards represented in the visualisation. Do we need them all? Can we concentrate on subsets of this data.