DataJConf Debrief

August 5, 2017

DataJConf at UCD

A month ago we held the first European Data and Computational Journalism Conference in Dublin, Ireland. This is a long overdue post about that event.

The conference idea started as all good ideas do, in the pub and with a tweet. It was at a social event during ICWSM '16 where I was first introduced to Bahareh Heravi, a data journalism lecturer from UCD. We talked briefly about the things we’re doing in Cardiff with the CompDJ MSc, and she spoke about her plans to introduce something similar in Dublin. A long time passed, and she got in touch over Twitter to ask if I was interested in organising a conference with her, to cover Data and Computational Journalism. Always keen to say yes to things that aren’t technically part of my day-to-day job and that will cause me a lot of work, I jumped straight in, dragging m’colleague Glyn along for the ride.

We spent several weeks having Skype calls to discuss and plan the conference, getting a website together, releasing the call for papers, organising the programme committee, managing the reviews, selecting talks, creating the programme, and then getting tickets on sale for the conference. It was a bit of a mad rush, but by June we were starting to see tickets sold, and had an excellent line up of speakers for the day. All I had to do then was sit back and wait to see if people turned up. Bahareh had less of an easy time, as she was hosting the thing, so spent many hundreds of hours organising the logistics of the event, the catering, lanyards, bags, souvenirs and all the other things that go into making a successful conference - a huge amount of work for which we are truly grateful!

When we initially spoke about the conference, we wanted to make sure we had a mix of industry and academia, and that it really was a mix. Bahareh had had a disappointing time at another DataJ conference where an academic track was included, but kept totally separate from the industry track, which resulted in a lack of discussion between the two groups of participants. This was something we were determined to avoid at all costs. We were also unsure about whether there was an appetite for this sort of conference. Our initial aim was that if we had about 50 people turn up, we’d count it as a success. In the end, we had just over 100 people through the doors, which was amazing, and there was a real mix of people from academia and industry. There was a diverse set of talks, on a range of topics, and it was really nice to see industry types asking questions of the academics, and vice versa. We also avoided the dreaded ‘all-male’ lineup, with a majority of talks being given by females. The proceedings from the conference are now available, if you’re interested.

The conference was followed by a couple of half-day workshops: an introduction to Data Journalism, and an Unconference, both of which were received very well.

All in all, a really successful event. I met a lot of interesting people and made some good contacts for the future. There were a lot of interesting discussions and I came home full of ideas for things to introduce within our teaching and research.

It was such a good time, we’re doing it all over again. DataJConf 2018 will be held here in Cardiff. So I guess this time it’ll be Glyn’s turn to do all the running around organising things…

CompDJ Team Selfie

Post-Graduation Thoughts

August 4, 2017

Last month I took part in my first graduation ceremony as part of the academic procession. This is the bit where staff members from the school(s) that are graduating get dressed up in their silly robes, ‘process’ into the graduation ceremony, and sit on stage for an hour or so clapping as all their students stride across the stage to shake the VC’s hand and graduate from their degree.

It’s a lot of fun, because who doesn’t like dressing up in silly robes and a hat? But its also good for the students, I think it shows them that we genuinely care about the fact they’re graduating, and it’s nice for them to see familiar faces up there on the stage celebrating their hard work. I know I enjoyed that part of my own graduation, so I’m happy now to be able to take part myself. I actually went to two ceremonies this year; the ceremony for the School of Computer Science & Informatics, and the ceremony for the School of Journalism, Media and Cultural Studies.

In both ceremonies I was really happy to see a number of students that I know and have taught. In the COMSC ceremony there were a lot of MSc students from the various programmes from my second year of lecturing the Web Apps and the Visualisation modules. There were also a couple of students whose dissertation projects I supervised, and a few undergraduates who I’ve worked with on summer projects. Then in the JOMEC ceremony this was the first year that we’ve had students from the MSc in Computational Journalism attend the graduation ceremony, which was really nice. I had a nice feeling of pride as they read out the name of the degree programme I helped create, then more as the students strode across the stage.

It’s really pleasing to see students you’ve taught start making their way in the world. Even more so when you see them creating great work and doing interesting things in ‘the Industry’. Take one of our first students Charles, who’s followed a successful stint at Trinity Mirror with a move to go push things forward at The Bureau Local. Or one of his colleagues Nikita, who’s working at one of the first data journalism outfits in India. Or last year’s grad Niko, who after a successful Google News Lab fellowship at The Guardian last year is now working on their vis team. Even this year’s students are at it before they’ve even finished: Jess is busy on a GNL fellowship for Trinity Mirror, Laura is on an internship at The Telegraph and Haluka is doing the same at The Financial Times. Four of this year’s students have job offers already, with Rae having left for the US to go run the Springfield bureau of The Daily Line.

It’s a bit weird, knowing that a few years ago we had an idea that we needed a course to train people to do a thing, and now there are people out there doing just that thing. It’s exciting, and I’m looking forward to seeing more of it.

Thoughts from the CEI Learning & Teaching Conference 2017

July 5, 2017

Yesterday was the annual ‘Learning & Teaching Conference’ of the Cardiff University Centre for Education Innovation. This year the theme was ‘To Tech or not to Tech, is that the Question?’. It’s the first time I’ve attended this conference, and I thought I’d get some of my thoughts from the day down in pixels.

(Reading this back later, I realise what a tangent this went on. From “I’m going to review #CUCEI17” to “there are some teachers out there that could do better but I don’t know how to help them”. That was a quite a diversion, for which I apologise. For an actual summary of the conference there’s a great storify of the main discussions during the conference which I think sums it all up very nicely.)

It was a very interesting and at times thought-provoking conference. I felt the main thematic question was solved fairly early on, perhaps even before we entered the room. I think most of us realise it’s not about the tech, it’s about the teaching, and the tech is just one tool in our toolkit that helps us do that effectively. The answer to the question is therefore ‘no’ and the real question is: ‘what tech and how much?’.

The keynote was an interesting look at how technology can be embraced by a whole institution, but I think what really caught my attention during this talk (with one eye on our new building with MATHS) was the lecture theatres that are arranged for group work:

Learning Spaces

How fantastic would it be able to teach in that space - you could do so many activities beyond just standing at the front ‘lecturing’. I’m becoming more and more vehement in my belief that a traditional ‘didactic’ lecture is the wrong way to teach most topics in Computer Science, and is actually harmful to our student’s ability to learn and think independently. Breaking the link between the idea of a ‘lecture theatre’ and a ‘lecture’ would be a good start towards changing the way both staff and students think about these things we call ‘lectures’. One of the frequent comments I’ve heard (and apologies to whoever said it as I can’t remember who it was - possibly Vince or Dafydd) is “wouldn’t it be great to have lecture theatres that don’t have a ‘front’”. I can’t agree more.

But of course, this, a lecture theatre with a funky design, isn’t technology. Yes, the group tables can be tech-enabled, with power and interactive displays and all sorts of other gadgets and gizmos, but really we’re just talking here about rethinking our way of teaching to a more interactive, collaborative (and collectivist?) paradigm.

This is where I think the problem comes in. Show a lecture theatre like that to a room full of academics who have all managed to carve out the time to attend a conference on education innovation and of course they’re all going to start thinking ‘Wow, the things I could do with that’. We’re the same people who are already trying to innovate in our teaching. We used lecture capture as soon as they put webcams in our classrooms (or even before). We’ve tried out all the polls and live Q&A systems during lectures. We’re creating long-lasting communities for our students in Slack, asking them to text in questions during lectures, and open-sourcing our lecture materials. We’ve already moved past an “I’ll stand at the front and talk, you sit there and listen” model of teaching. Some have stopped ‘lecturing’ entirely, are fully committed to a flipped learning model and now spend all our contact time on interacting with students, working on activities or problems, and really delivering ‘value’. The conference in this regard was preaching to the choir. Yes, it was helping those of us keen on innovation to discover new tools for our toolkit, talk to other like-minded teachers, and to validate our own approaches, but it wasn’t really attempting to answer the big problem in our own work: How do we convince everyone else to change with us?

Because the big problem isn’t with the people who are trying to innovate. The problem is the academic who doesn’t want to do any of that stuff. The academic who thinks “well, I learnt through traditional lectures, so that’ll be fine for all these students”. I actually had a colleague say to me the other day “I learnt from a book, so I don’t need to make videos for my students. They can just read the book like me”. They generally didn’t realise the benefits that can be had by moving the passive learning to non-contact time and creating an active learning environment within their lectures.

We have stuffed universities with the kind of academics who don’t realise that they’re there because they’re the sort of academically minded studious individual who could have taught themselves off some notes scribbled on the underside of a table if they had to. Anyone who’s dragged themselves through three years of undergraduate education, a year of masters, three to five years of a PhD, and a probable multitude of short-term RA contracts on many different topics before finally reaching the relative stability of a lectureship is going to be the kind of person who can learn things themselves and quickly in whatever circumstance. They don’t know what it is to be the person who struggles in class, or who finds things difficult, or who just doesn’t respond well to a fifty minute ‘lecture’.

We have a situation where people who have no difficulty in learning are having to teach.

And I find that that means they have no desire to try to do things differently, because the way they’re doing things “worked for them”. So how do we communicate to these individuals that actually ‘it worked for me’ is not a valid argument. How do we show them that there is a better way? That they can actually engage with students in a more meaningful fashion? How do we make them understand that

Just reading from a fucking powerpoint is not education.

Those are the questions I want answered next.

Catching a Bug

June 12, 2017

I’m doing some data analysis, and I just caught a showstopper of a bug. Want to see it? Here’s the code as it was before:

new_index = [LIKERT[value] for value in LIKERT.keys() if value in data_counts.index]

and here’s a simple fix for the code:

new_index = [LIKERT[value] for value in data_counts.index]

Doesn’t look like much of a problem, but it completely changed the way my data was analysed. Both lines are creating a new index for a pandas dataframe. I have a dataframe that is indexed:

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0]

and I want to replace the index with the correct names from a likert scale that these values refer to:

['N/A', 'Disagree Strongly', 'Disagree', 'Neither Agree nor Disagree', 'Agree', 'Agree Strongly']

so I create a dictionary that maps from keys in the first index, to values for the new index:

0.0: 'N/A',
1.0: 'Disagree Strongly',
2.0: 'Disagree',
3.0: 'Neither Agree nor Disagree',
4.0: 'Agree',
5.0: 'Agree Strongly'

I then do a little list comprehension that adds the correct new value to the new index, if it’s key is in the old index. If the key isn’t there, it gets skipped:

new_index = [LIKERT[value] for value in LIKERT.keys() if value in data_counts.index]

All fine, right? Sure, if the index is always in numerical order. Which it isn’t. Using this code, if the index is in the wrong order, you can get ‘5’ being replaced with ‘Disagree Strongly’ (or any of the values other than ‘Agree Strongly’) and suddenly your analysis is completely wrong.

The second line fixes this by looping through the index, not the dictionary, and so creates the new index in the correct order.

A better fix is actually to use the .rename() function, which can rename the index of a dataframe (or the column names) using a dictionary as a lookup, like so:

data.rename(index=LIKERT, inplace=True)

Any values present in the index but not in the lookup are left alone, and values in the lookup but not in the index are ignored, and the result is exactly what I need, all my ‘5s’ replaced with ‘Agree Strongly’ and so on.

So I guess the lesson learnt here is RTFM, and don’t try to be clever and re-invent functionality that already exists.

Weeknotes - 29th May 2017

June 4, 2017

Monday 29th May


Tuesday 30th May


Finished my visualisation coursework marking today. Generally really good quality across the board, and a really enjoyable set of work to mark. As time goes on, I’m liking this visualisation course more and more. It’s fun to teach as it’s an interesting and quite subjective field, which is not usual in a ‘normal’ Computer Science course. There’s lots of scope for discussion and argument and plenty of chances for students to really get stuck into some data analysis and communication and really show off their skills. I get the feeling the mark distribution skewed a little higher than last year, but I haven’t checked that yet.

Also met with the last student who has expressed an interest in our CUROP project for this summer, so we’ll be able to make a decision on that soon and get that project rolling. I also met with another of our CompDJ students about their dissertation project - they’re looking to build a bot that will write articles automatically about particular topics. A very ambitious project, but one that looks to be really interesting.

The other major task on Tuesday was a Skype call with the rest of the organising committee of DataJConf to finalise the accepted talks and sort out the schedule. We had a really great set of submissions, with a good mix from industry and academia. Our programme committee did a great job of reviewing them, so it was a fairly simple task to conduct a quick meta-review of the papers, decide where our cut-off point is and then take the top 8 papers forward to the conference. Sadly the fact that we’re only one-day main track this year meant we had to lose some very good submissions, but I’m hopeful those authors will still come along and pitch their discussion topics for the Unconference on the day after DataJConf (and we invited them to in their notification emails). The schedule is now online, and it looks like it’s going to be a really good day. Tickets are selling, and the attendee numbers are ticking up. We were supposed to make a decision this week on which room to go for, the big room or the bigger room, but we put that off to see how numbers look in a weeks time. It’s a bit of a gamble as there’s always a chance that if we need to switch from the room we already have booked the other room will be unavailable by the time we make our minds up, but who doesn’t like a bit of risk in their conference planning?

Wednesday 31st May

A day in which very little was accomplished towards my own goals, but which had to be done. Most of the morning was taken up with a meeting with my counterpart in undergraduate operations, the school manager, and various faces from college about our generally low survey response rates in the School, and how we might do better at communicating with students to foster a culture that encourages these response rates to improve. One of the key points we came up with was that while we’re very good at listening to students as a school, and then acting upon their feedback, we’re pretty rubbish at communicating those actions and changes back to the students. The outcome of this discussion was a need to empower the operations teams for postgrads and undergrads to do more with the various surveys and module feedback questionnaires, to bring actions and recommendations to the teaching and learning quality committee and boards of studies, and to work with the comms team to make sure students know that what they tell us is listened to and acted upon, and is therefore quite important. Essentially it’s about a culture change within the school, and we all know how easy that is, right…

Also had some interesting discussions with my Head of School today about a number of projects I’ve got going on at the moment. I already wrote about trying to coordinate the large number of new programme / programme change approvals that we have happening within the school, but we also discussed a couple of other projects. One, looking at end-of-module feedback has been going on for a while but is close to being ready for launch. The other was around module-review, and how I want to improve that process by moving to a git based approach, which will allow better oversight and review of module changes and data collection. I’ll talk more about that as the project develops.

At lunchtime we had our first official meeting with Stuart, our third-year student who is working with us for the summer on our Education Innovation chatbot project. He seems to have really hit the ground running and is getting stuck in to building code and designing solutions. Really great to see, and it looks highly likely that we’re going to have something working to test with students in the Autumn.

The afternoon was taken up with an Academic Approval Event in the School of Modern Languages. I was on the panel as the internal member from outside the college. It’s the second approval event I’ve done, and was a fairly pleasant experience. The programme we were looking at was well thought out, and would clearly be a benefit to the school in question. There were the usual typos and small inconsistencies in the documentation, and we had some recommendations that might improve the student experience, particularly around assessment, where there were a lot of essays that might be replaced with some more interesting types of assessment. Overall though it looks good, and I hope they make a success of it.

While all that was going on, we were hosting a hackday over in Bute, a collaboration with The Bureau Local. A team came over from The Bristol Cable and along with our students spent the day looking at voter numbers within local constituencies. I wrote a tiny write up over at the CompDJ blog, but I was a bit annoyed I couldn’t get more involved, what with everything else that was going on yesterday. Hopefully I’ll be able to get stuck in at the next one, as I’m sure this wont be the last hackday type event the Bureau organises.

Thursday 1st June

Today was spent interviewing students for another of our summer projects, looking at Journalism Education. We’ve been carrying out a data collection experiment since last summer looking at the skill requirements of the media industry as exposed through job advertising and mailing list postings, and now we’re looking to back that up through a qualitative analysis of journalism school educators and their syllabi. We had 12 students from a range of schools express an interest in working on this project with us, and choosing between them is a very hard task indeed. Luckily m’colleague is leading this project, so most of that particular burden falls on him. Hopefully we’ll have someone in place very soon and we can get the third of our summer projects up and running.

Friday 2nd June


Sunday 4th June

My ‘Friday’ was spent working on some analysis of module evaluation feedback. As I mentioned in Tuesday’s notes, we need to do more and better with the feedback given to us by students. I’ve been working for a while on creating some simple dashboards that transform the quite poor output of the module evaluation system into something that is firstly a little more usable by module leaders, but that also looks more like the survey dashboards (NSS, PTES, etc) that we are used to dealing with.

Module Evaluation Dashboards - WiP

The idea is that consistency between the types of visualisations and analysis used will reduce the cognitive burden when trying to assess the feedback and compare across surveys. I’m now starting to put together a system that will create individual dashboards for lecturers and module leaders, and that will also allow comparison between modules on the various programmes and years of our degree schemes, and allow comparison to the school as a whole. With any luck I’ll be able to present this at the next TLAQC and we can start to deliver these to lecturers and operations teams to help them understand what the students are telling them. Today was mostly refactoring my existing analysis code that takes the raw survey data and converts it into percent agree/disagree scores as per the NSS dashboards, and collects the data across the different groupings (years/programmes) of modules.

Weeknotes - 22nd May 2017

May 28, 2017

Strange week this week - coming back from holiday, lots of time spent catching up, arranging meetings and organising more meetings for next week

Monday 22.05.17

Most of Monday morning was spent dealing with all the emails I’d received while away last week. The usual mix of requests for information from admin, queries from current, potential and past students, and a number of things relating to projects that are either about to start or were supposed to have started by now! It took an absolute age to crack through it all. The apocryphal tale of the colleague who just ‘deleted everything’ on the return from holiday with the assumption that anything important would be chased up loomed large in my mind as I replied to my fiftieth message. In a world where ‘responsiveness to communication’ is one of the questions in any number of student feedback surveys, I just don’t think that path is the right one to take.

Monday afternoon saw myself and Glyn working on our talk for Wednesday, taking the usual divide and conquer approach to put together something interesting (we hoped) for the ‘Investigating (with) Big Data’ symposium being held by the Digital Cultures Network.

Tuesday 23.05.17

Another morning of marking this morning. I mentioned last week how pleased I was with the quality of the submissions this year, and it has held up through this latest batch too. The students really seem to have engaged with the module, have thought about what the data says and the message they want to communicate, and have then brought the technical skills to the table to implement their solution. I’m really pleased with how it’s gone. Over halfway through the marking now. It’s supposed to be done by the end of this week, but with two days of training courses and a very busy Wednesday, that’s just not going to happen. I have supplied the necessary apologies to the admin staff and I’m fairly sure they’re not going to hurt me too much.

The afternoon was taken up with meetings with m’colleague, potential CUROP students, and a couple of our MSc CompDJ students who are beginning to think about their dissertation projects for this summer. One of the things Glyn and I discussed was our lack of self-promotion around the activities we do as the ‘Computational and Data Journalism’ team. In the last couple of months we’ve scored research project funding, student project funding, international workshop funding and our students have landed a number of prestigious summer internships, and we’re really not doing a good enough job at shouting about this activity. I’ve resolved to drive this forward a bit better, so came up with a list of potential items for promotion, and I’ll be trying to push those out over the summer, and then keep things ticking over during term time next year.

There was also some movement on the Untappd data project front, as I was finally pushed into responding to my co-investigators with some plans on how to progress from last year’s ICWSM conference paper to a fuller journal paper submission. This is one of those side projects that it’s a real shame to not have more time for, as I think we have a lot of interesting things that we can do, but are all lacking the time to really get stuck in to the analysis. Hopefully we’ll be able to push things forward over summer and get something delivered.

Wednesday 24.05.17

Wednesday started with my first catch up meeting with the DoT for a couple of weeks. I’ve been deputy DoT since September(ish), and we’ve probably not had enough of these meetings. The plan is to make them more regular in the future, and that will probably help with keeping all the plates spinning, as I’m now working on a lot of different projects for the School. We discussed the programme approval process, as we have a number of new programmes in the pipeline as well as some changes to existing programmes going on, and we need to make sure we keep everything coherent. I’ve been tasked with setting up some meetings with the key proposers and the usual suspects within the school to make sure there’s enough coordination going on.

In the afternoon, it was over to John Percival Building to give a talk as part of the ‘Investigating (with) Big Data’ symposium. This was a double hander with m’colleague, and we’d chosen to discuss some issues around large data investigations within news media. Glyn started by presenting some of the more recent large-scale collaborative data investigations that have been carried out by news orgs. I followed that up with a discussion around data openness, transparency, and some of the technical issues that are holding back data journalism. I think the talk went well, people seemed interested and receptive to the ideas we presented.

Sadly I couldn’t hang around for the rest of the symposium as I’d double booked myself for the afternoon, having agreed to go to a briefing for exam board chairs being held over in main building. There’s a few new people taking on the exam board chair role within the school, and although I’m not one of them it was ‘decided’ (no idea who by) that I should also attend the briefing, as I’m probably going to be one of the people called upon to step in if the usual chair isn’t available. It was a fairly dull but not entirely useless presentation on the process of getting ready for and dealing with the aftermath of an exam board. It ticks the boxes though, so now I’m trained and can step into that particular set of shoes if necessary.

Thursday 25.05.17 & Friday 26.05.17

Days 2 and 3 of the ‘Leading Teaching Teams’ training programme that I’d managed to score a place on. This part of the programme was run by the Leadership Foundation for Higher Education, and was probably one of the best training courses I’ve been on so far. I spent a long time reflecting on the way I work, and it really delivered some useful insights. We did a lot of self-assessment and analysis of how our individual approaches may or may not be helpful in managing teams, and I’m looking forward to putting some of the ideas into practice.

As with many of these training courses, one of the added benefits was being able to spend time with colleagues from across the University. It’s always fascinating to find out how others work and to hear about common problems or issues across different schools and colleges, and how they’ve been solved (or not!) in different ways. It’s also nice to get an opportunity to discuss things and to hear that others feel the same way. There was a lot of discussion and dissatisfaction expressed over the 2 days about the increased corporatisation and commodification of Higher Education. I’d love to tell you that we’ve solved that particular issue, but sadly not. Many did get righteously angry about it though. I suspect a higher societal change is needed to fix it, and all we can do at this level is to keep pushing for that change.

Weeknotes - 14th May 2017

May 15, 2017

A very good week this week, in that I was only in work for 3 days, but still accomplished a lot. Through some convenient meetings I’ve managed to get a whole mess of projects lined up for the next year or so, and I came out of the week on Friday very eager to get cracking on things.

Monday 8.5.17

This week started with our Assessment and Feedback focused teaching ‘away day’, which wasn’t really an away day because we didn’t go anywhere, but which was incredibly useful nonetheless. Put together by myself, Andrew (DoT) and Helen (A&F lead), the event was attended by a good number of teaching staff within the school and allowed us to spend the day thinking about our teaching practice and the way that we do things within COMSC.

We took a look at the upcoming Cardiff University commitments and principles around assessment and feedback, and considered how well our assessment lines up with some of the ideas within this draft of the document. A surprising amount of assessment within the school is some variation of ‘build this project in language X using paradigm Y and assess how well it performs in terms of property Z, then write a report on it’, and it turns out that trying to work out how well that corresponds with a 4,000 word essay is quite a challenge. Discussion around this topic also highlighted how the National Software Academy have done a good job of using larger projects as assessment for a number of different modules, something that we could do more in the BSc Computer Science, as currently there are a lot of (too many?) larger assessments within each module. Combining these makes a lot of sense - for instance why not have a software project in the first year that gets assessed for both the ‘OO Development in Java’ and the ‘Developing Quality Software’ modules, rather than a separate project in each module?

In a session that I chaired we looked at Learning Outcomes of our modules - with a particular focus on how well they match with assessment or are assessable. We also looked at trying to get a handle on the year level learning outcomes for BSc Computer Science to make sure they are up to date and relevant.

The final session of the morning saw us covering exam feedback, and how we can provide this to students in a useful fashion. The afternoon saw some discussion around a few different projects that aim to help give visibility to the workload that assessment gives to both staff and students. One project from a team in the University seemed almost useful, but focused too much on deadlines, with little regard to start dates, duration and effort. So, as a tool to help prevent deadline bunching it was great, but to actually monitor workload it was less than great. George is working on a project as part of the Cardiff Futures project that promises to deliver what I think is needed (essentially automating the creation of the coursework timetables that we delivered at the beginning of this year), and hopefully that will be taken up by central University, as effective communication of this information is a key part of helping students and staff manage their workload.

Tuesday 9.5.17

After a fairly involved and in-depth Monday, Tuesday was a day of playing catch up with admin and sorting things out before my week off next week. First things first was my PDR. This was my first PDR, having come off probation last September a few months early because I was fed up of not getting paid enough. I thought it had been a pretty decent year, and Andrew seemed to agree. I agreed some interesting objectives for the next year that were basically things I’ve been wanting to do for the last few months, and are all things I’m looking forward to getting stuck in to over the next 12 months.

In the afternoon I met with a few students who are interested in our summer CUROP project doing some analysis and visualisation of the Creative Cardiff data, which was fun as it’s always good to meet with interested and engaged students who are keen to get involved with research projects. Still got a couple more students to meet with, but hopefully we’ll have someone for this project relatively soon.

I also met up with m’colleague on Tuesday afternoon and we did some more planning for the next few months. The textbook we’re writing is coming along, and we’ve identified an opportunity to get some excellent input to the book from the attendees of the Data and Computational Journalism Conference we’re organising in Dublin. We discussed the upcoming intake of students for the next academic year, and the progress our current students are making on their dissertation project pitches. We also solidified our publication plans for the next six months - with a couple of decent journal papers in the pipeline alongside a couple of decent conference presentations it’s looking like a strong finish to the year.

Wednesday 10.5.17

I took a day off today to have a sneaky date with my wife for her birthday (which isn’t really until later this month). We went off to the theatre to watch OmiDaze doing Romeo and Juliet, which was very enjoyable, then to a bar in the bay for lunch which was very tasty (food) and very average (beer).

Thursday 11.5.17

Worked from home today ploughing through the CMT212 marking. The quality this year is incredibly high, and I’m very pleased with how the students have analysed and visualised their data. I’ve had a few students submitting data analysis in R, a lot of Python, and then the majority of the visualisation so far has been some very good D3 code. If the standard keeps up across the whole cohort when I get round to marking the rest I’ll be very happy indeed.

Friday 12.5.17

Nada. Day off again (2 in 1 week!) packing and preparing for the week off. Mad day rushing around with Arthur collecting parcels, packing bags, and trying to optimise the fitting of bags into the car boot so that we could get both ourselves and the luggage in the car at the same time.

Scraping the Assembly

November 2, 2016

M’colleague is currently teaching a first-semester module on Data Journalism to the students on our MSc in Computational and Data Journalism. As part of this, they need to do some sort of data project. One of the students is looking at the expenses of Welsh Assembly Members. These are all freely available online, but not in an easy to manipulate form. According to the Assembly they’d be happy to give the data out as a spreadsheet, if we submitted an FOI.

To me, this seems quite stupid. The information is all online and freely accessible. You’ve admitted you’re willing to give it out to anyone who submits an FOI. So why not just make the raw data available to download? This does not sound like a helpful Open Government to me. Anyway, for whatever reason, they’ve chosen not to, and we can’t be bothered to wait around for an FOI to come back. It’s much quicker and easier to build a scraper! We’ll just use selenium to drive a web browser, submit a search, page through all the results collecting the details, then dump it all out to csv. Simple.

Scraping AM expenses

I built this as a quick hack this morning. It took about an hour or so, and it shows. The code is not robust in any way, but it works. You can ask it for data from any year (or a number of years) and it’ll happily sit there churning its way through the results and spitting them out as both .csv and .json.

All the code is available on Github and it’s under an MIT Licence. Have fun 😉

Atom Plugins for Web Development

October 5, 2016

I’ve had a number of students in my web-dev module asking me what plugins I’m using in my text editor, so I thought I’d dash off a quick blog post on the plugins I find useful day-to-day. (Actually, most people are normally asking me ‘how did you do that thing where you typed one word and suddenly you had a whole HTML page? The answer is I used a plugin, so ‘what plugins do you use?’ is really the question they should be asking…)

I’m using Atom as my text editor. It’s free, open source, and generally reliable. If you’re a student on my web-dev course you’re stuck using Sublime Text in the lab for now. I’m pretty sure most of the Atom plugins I use have either direct Sublime equivalents, or similarly functioning alternatives.

There’s a guide to Atom packages here and one for Sublime Text here

A quick google for ‘best atom packages web developer’ will probably get you to a far more comprehensive list than this, but here’s my current pick of useful plugins anyway:


This is essential for anyone writing any amount of HTML. This is the magic package that allows me to write ‘html:5’ in a blank document, hit the shortcut keys (CTRL + E in my setup), and suddenly have a simple boilerplate HTML page.

Emmet auto completion

It’s ace. Not only that, but it can write loads of HTML for you, and all you have to do is write a CSS selector for that HTML:

html css Selector expansion

Great stuff. The documentation is here.


This will tidy up your code automatically, fixing the indentation and spacing etc. It can even be set to automatically tidy your code every time you save a file. Awesome huh? Imagine being set a coursework where some of the marks were dependent on not writing code that looks like it was written by a five-year old child who’s addicted to hitting the tab key, then finding out that there’s software to strap that five-year olds thumbs to his hands so he can’t hit that tab key. Awesome.

Beautiful tidy code


This one adds a colour picker right into atom. Just CMD-SHIFT-C and choose your colours!

Colour picker

Another useful colour related plugin you may want to look at is Pigments, which can highlight colours in your projects, and gather them all together so you can see your palette.


My last recommendation is linter. This plugin will automatically check your code for errors. You’ll need to install linters for whatever language you want to check, like linter-tidy, linter-csslint, linter-pylint and linter-jshint.

Linter finds errors in your code

So there we go – a few recommendations to get you started. Found anything else interesting? Let me know!

Sustainable Software Institute – Research Data Visualisation Workshop

August 1, 2016

Last week I gave a talk and delivered a hands on session at the Sustainable Software Institute’s ‘Research Data Visualisation Workshop‘ which was held at Manchester University. It was a really engaging event, with a lot of good discussion on the issues surrounding data visualisation.

Professor Jessie Kennedy from Edinburgh Napier University gave a great keynote looking at a some key design principles in visualisation, including a number of studies I hadn’t seen before but will definitely be including in my teaching in future.

I gave a talk on ‘Human Science Visualisation’ which really focused on a couple of key issues. Firstly, I tried to illustrate the importance of interactivity in complex visualisations. I then talked about how we as academic researchers need publish our interactive visualisations in posterity, and how we should press academic publishers to help us communicate our data to readers. Finally, I wanted to point people towards the excellent visualisation work being done by data journalists, and that the newsrooms are an excellent source of ideas and tips for data visualisation. The slides for my talk are here. It’s the first time I’ve spoken about visualisation outside of the classroom, and it was a really fun talk to give.

We also had two great talks from Dr Christina Bergmann and Dr Andy South, focusing on issues of biological visualisation and mapping respectively. All the talks generated some good discussion both in the room and online, which was fantastic to see.

In the afternoon I lead a hands on session looking at visualising data using d3. This was the first time I’d taught a session using d3 v4, which made things slightly interesting. I’m not fully up to speed with all the areas of the API that have changed, so getting the live coding right first time was a bit tricky, but I think I managed. Interestingly, I feel that the changes made to the .data(), .exit(), .enter(), update cycle as discussed in Mike’s “What Makes Software Good” make a lot more sense from a teaching perspective. The addition of .merge() in particular helps a great deal. As you might expect from a d3 workshop that lasted a mere three hours, I’m not entirely convinced that everybody ‘got’ it, but I think a most went away satisfied.

Overall it was a very successful workshop. Raniere Silva did an excellent job putting it together and running the day, and I really enjoyed it. I’m looking forward to seeing what other people thought about it too.