Weeknotes - 14th May 2017

Monday, May 15, 2017

A very good week this week, in that I was only in work for 3 days, but still accomplished a lot. Through some convenient meetings I’ve managed to get a whole mess of projects lined up for the next year or so, and I came out of the week on Friday very eager to get cracking on things.

Monday 8.5.17

This week started with our Assessment and Feedback focused teaching ‘away day’, which wasn’t really an away day because we didn’t go anywhere, but which was incredibly useful nonetheless. Put together by myself, Andrew (DoT) and Helen (A&F lead), the event was attended by a good number of teaching staff within the school and allowed us to spend the day thinking about our teaching practice and the way that we do things within COMSC.

We took a look at the upcoming Cardiff University commitments and principles around assessment and feedback, and considered how well our assessment lines up with some of the ideas within this draft of the document. A surprising amount of assessment within the school is some variation of ‘build this project in language X using paradigm Y and assess how well it performs in terms of property Z, then write a report on it’, and it turns out that trying to work out how well that corresponds with a 4,000 word essay is quite a challenge. Discussion around this topic also highlighted how the National Software Academy have done a good job of using larger projects as assessment for a number of different modules, something that we could do more in the BSc Computer Science, as currently there are a lot of (too many?) larger assessments within each module. Combining these makes a lot of sense - for instance why not have a software project in the first year that gets assessed for both the ‘OO Development in Java’ and the ‘Developing Quality Software’ modules, rather than a separate project in each module?

In a session that I chaired we looked at Learning Outcomes of our modules - with a particular focus on how well they match with assessment or are assessable. We also looked at trying to get a handle on the year level learning outcomes for BSc Computer Science to make sure they are up to date and relevant.

The final session of the morning saw us covering exam feedback, and how we can provide this to students in a useful fashion. The afternoon saw some discussion around a few different projects that aim to help give visibility to the workload that assessment gives to both staff and students. One project from a team in the University seemed almost useful, but focused too much on deadlines, with little regard to start dates, duration and effort. So, as a tool to help prevent deadline bunching it was great, but to actually monitor workload it was less than great. George is working on a project as part of the Cardiff Futures project that promises to deliver what I think is needed (essentially automating the creation of the coursework timetables that we delivered at the beginning of this year), and hopefully that will be taken up by central University, as effective communication of this information is a key part of helping students and staff manage their workload.

Tuesday 9.5.17

After a fairly involved and in-depth Monday, Tuesday was a day of playing catch up with admin and sorting things out before my week off next week. First things first was my PDR. This was my first PDR, having come off probation last September a few months early because I was fed up of not getting paid enough. I thought it had been a pretty decent year, and Andrew seemed to agree. I agreed some interesting objectives for the next year that were basically things I’ve been wanting to do for the last few months, and are all things I’m looking forward to getting stuck in to over the next 12 months.

In the afternoon I met with a few students who are interested in our summer CUROP project doing some analysis and visualisation of the Creative Cardiff data, which was fun as it’s always good to meet with interested and engaged students who are keen to get involved with research projects. Still got a couple more students to meet with, but hopefully we’ll have someone for this project relatively soon.

I also met up with m’colleague on Tuesday afternoon and we did some more planning for the next few months. The textbook we’re writing is coming along, and we’ve identified an opportunity to get some excellent input to the book from the attendees of the Data and Computational Journalism Conference we’re organising in Dublin. We discussed the upcoming intake of students for the next academic year, and the progress our current students are making on their dissertation project pitches. We also solidified our publication plans for the next six months - with a couple of decent journal papers in the pipeline alongside a couple of decent conference presentations it’s looking like a strong finish to the year.

Wednesday 10.5.17

I took a day off today to have a sneaky date with my wife for her birthday (which isn’t really until later this month). We went off to the theatre to watch OmiDaze doing Romeo and Juliet, which was very enjoyable, then to a bar in the bay for lunch which was very tasty (food) and very average (beer).

Thursday 11.5.17

Worked from home today ploughing through the CMT212 marking. The quality this year is incredibly high, and I’m very pleased with how the students have analysed and visualised their data. I’ve had a few students submitting data analysis in R, a lot of Python, and then the majority of the visualisation so far has been some very good D3 code. If the standard keeps up across the whole cohort when I get round to marking the rest I’ll be very happy indeed.

Friday 12.5.17

Nada. Day off again (2 in 1 week!) packing and preparing for the week off. Mad day rushing around with Arthur collecting parcels, packing bags, and trying to optimise the fitting of bags into the car boot so that we could get both ourselves and the luggage in the car at the same time.

Scraping the Assembly

Wednesday, Nov 2, 2016

M’colleague is currently teaching a first-semester module on Data Journalism to the students on our MSc in Computational and Data Journalism. As part of this, they need to do some sort of data project. One of the students is looking at the expenses of Welsh Assembly Members. These are all freely available online, but not in an easy to manipulate form. According to the Assembly they’d be happy to give the data out as a spreadsheet, if we submitted an FOI.

To me, this seems quite stupid. The information is all online and freely accessible. You’ve admitted you’re willing to give it out to anyone who submits an FOI. So why not just make the raw data available to download? This does not sound like a helpful Open Government to me. Anyway, for whatever reason, they’ve chosen not to, and we can’t be bothered to wait around for an FOI to come back. It’s much quicker and easier to build a scraper! We’ll just use selenium to drive a web browser, submit a search, page through all the results collecting the details, then dump it all out to csv. Simple.

Scraping AM expenses
Scraping AM expenses

I built this as a quick hack this morning. It took about an hour or so, and it shows. The code is not robust in any way, but it works. You can ask it for data from any year (or a number of years) and it’ll happily sit there churning its way through the results and spitting them out as both .csv and .json.

All the code is available on Github and it’s under an MIT Licence. Have fun 😉

Atom Plugins for Web Development

Wednesday, Oct 5, 2016

I’ve had a number of students in my web-dev module asking me what plugins I’m using in my text editor, so I thought I’d dash off a quick blog post on the plugins I find useful day-to-day. (Actually, most people are normally asking me ‘how did you do that thing where you typed one word and suddenly you had a whole HTML page? The answer is I used a plugin, so ‘what plugins do you use?’ is really the question they should be asking…)

I’m using Atom as my text editor. It’s free, open source, and generally reliable. If you’re a student on my web-dev course you’re stuck using Sublime Text in the lab for now. I’m pretty sure most of the Atom plugins I use have either direct Sublime equivalents, or similarly functioning alternatives.

There’s a guide to Atom packages here and one for Sublime Text here

A quick google for ‘best atom packages web developer’ will probably get you to a far more comprehensive list than this, but here’s my current pick of useful plugins anyway:

emmet

This is essential for anyone writing any amount of HTML. This is the magic package that allows me to write ‘html:5’ in a blank document, hit the shortcut keys (CTRL + E in my setup), and suddenly have a simple boilerplate HTML page.

emmet auto completion
emmet auto completion

It’s ace. Not only that, but it can write loads of HTML for you, and all you have to do is write a CSS selector for that HTML:

html css Selector expansion
html css Selector expansion

Great stuff. The documentation is here.

atom-beautify

This will tidy up your code automatically, fixing the indentation and spacing etc. It can even be set to automatically tidy your code every time you save a file. Awesome huh? Imagine being set a coursework where some of the marks were dependent on not writing code that looks like it was written by a five-year old child who’s addicted to hitting the tab key, then finding out that there’s software to strap that five-year olds thumbs to his hands so he can’t hit that tab key. Awesome.

Beautiful tidy code
Beautiful tidy code

color-picker

This one adds a colour picker right into atom. Just CMD-SHIFT-C and choose your colours!

Colour picker
Colour picker

Another useful colour related plugin you may want to look at is Pigments, which can highlight colours in your projects, and gather them all together so you can see your palette.

linter

My last recommendation is linter. This plugin will automatically check your code for errors. You’ll need to install linters for whatever language you want to check, like linter-tidy, linter-csslint, linter-pylint and linter-jshint.

Linter finds errors in your code
Linter finds errors in your code

So there we go – a few recommendations to get you started. Found anything else interesting? Let me know!

Sustainable Software Institute – Research Data Visualisation Workshop

Monday, Aug 1, 2016

Last week I gave a talk and delivered a hands on session at the Sustainable Software Institute’s ‘Research Data Visualisation Workshop‘ which was held at Manchester University. It was a really engaging event, with a lot of good discussion on the issues surrounding data visualisation.

Professor Jessie Kennedy from Edinburgh Napier University gave a great keynote looking at a some key design principles in visualisation, including a number of studies I hadn’t seen before but will definitely be including in my teaching in future.

I gave a talk on ‘Human Science Visualisation’ which really focused on a couple of key issues. Firstly, I tried to illustrate the importance of interactivity in complex visualisations. I then talked about how we as academic researchers need publish our interactive visualisations in posterity, and how we should press academic publishers to help us communicate our data to readers. Finally, I wanted to point people towards the excellent visualisation work being done by data journalists, and that the newsrooms are an excellent source of ideas and tips for data visualisation. The slides for my talk are here. It’s the first time I’ve spoken about visualisation outside of the classroom, and it was a really fun talk to give.

We also had two great talks from Dr Christina Bergmann and Dr Andy South, focusing on issues of biological visualisation and mapping respectively. All the talks generated some good discussion both in the room and online, which was fantastic to see.

In the afternoon I lead a hands on session looking at visualising data using d3. This was the first time I’d taught a session using d3 v4, which made things slightly interesting. I’m not fully up to speed with all the areas of the API that have changed, so getting the live coding right first time was a bit tricky, but I think I managed. Interestingly, I feel that the changes made to the .data(), .exit(), .enter(), update cycle as discussed in Mike’s “What Makes Software Good” make a lot more sense from a teaching perspective. The addition of .merge() in particular helps a great deal. As you might expect from a d3 workshop that lasted a mere three hours, I’m not entirely convinced that everybody ‘got’ it, but I think a most went away satisfied.

Overall it was a very successful workshop. Raniere Silva did an excellent job putting it together and running the day, and I really enjoyed it. I’m looking forward to seeing what other people thought about it too.

Quick Update...

Monday, Jul 13, 2015

Been a bit quiet here recently. It’s been a very busy few months. I’ve got a few projects and thoughts that I’ll be posting more on in the next couple of weeks, but I figured it was worth a quick update on what’s been going on, and what I’ve been up to.

MSc Computational Journalism

We have finished the taught part of the MSc, and we’re getting well into the dissertation phase for the first cohort of our students. It’s been a really good first year, and I’ll be posting a debrief and some thoughts on the next year sometime over summer.

BarDiff

I’ve launched a data dashboard thing for beer drinking in Cardiff. Powered by Untappd checkins, it’s providing (I think) a fairly interesting overview of the city. I’ve got some ideas for some better visualisations, but for now it’s nicely ticking over. Plus it’s getting some decent interaction on the social medias

Academia

The usual ticking over of academia continues - journal reviews, conference reviews,  a book chapter to write, paper deadlines coming and going. It’s the same old same old….

Teaching

I’ve started on my teaching qualification (PgCUTL). The first module portfolio was submitted a couple of weeks ago, and results are due any day now (fingers crossed). I’ve also got a few thoughts on the recently announced TEF that I’ll be putting up soon, and some things on employability…

and finally…

The reason I’ve not posted in a while:

Arthur!
Arthur!

My son, Arthur James Chorley-Jones was born on 13th May 2015. He’s amazing, I think he’s the best thing that has ever happened, and since he’s been around there has not been a huge amount of time for blogging, side-projects, and other such things. Which is ace.

Accessing and Scraping MyFitnessPal Data with Python

Thursday, Feb 5, 2015

Interesting news this morning that MyFitnessPal has been bought by Under Armour for  $475 million. I’ve used MFP for many years now, and it was pretty helpful in helping me lose all the excess PhD weight that I’d put on, and then maintaining a healthy(ish) lifestyle since 2010.

News of an acquisition always has me slightly worried though - not for someone else having access to my data, as I’ve made my peace with the fact that using a free service generally means that it’s me that’s being sold. Giving away my data is the cost of doing business. Rather, it worries me that I may lose access to all the data I’ve collected. I have no idea what Under Armour intend for the service in the long run, and while its likely that MFP will continue with business as usual for the foreseeable, it’s always worth having a backup of your data.

A few years ago, I wrote a couple of python scripts to back up data from MFP and then extract the food and exercise info from the raw HTML. These scripts use Python and Beautiful Soup to do a login to MFP, then go back through your diary history and save all the raw HTML pages, essentially scraping your data.

I came to run them this morning and found they needed a couple of changes to deal with site updates. I’ve made the necessary updates and the full code for all the scripts is available on GitHub. It’s not great, but it works. The code is Python 2 and requires BeautifulSoup and Matplotlib (if you want to use generate_plots.py).

Personality and Places

Tuesday, Jan 27, 2015

Our paper examining the link between individual personality and the places people visit has just been published in Computers in Human Behavior. It’s open access, so you can go read it for free, now

In an experiment we ran previously, we asked users of Foursquare to take a personality test and give us access to their checkin history. The personality test gives us a measure of how each person scores for five different factors: Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism. The checkin history lists all the places they’ve ever checked in to using Foursquare. Because a couple of hundred people took part in the experiment, we ended up with a large number of individual personalities that we could link to over a hundred thousand venues. In total, this represents a pretty staggering half a million Foursquare checkins that we have personality data associated with.

Our first step with this data has been to see if there are any links between personality factors and the places people choose to visit, and we found some interesting connections.

One of our main finding shows that the use of Foursquare for recording checkins seems to correlate well with Conscientiousness. The more conscientious a user is, the more likely they are to have checked in at more places and to have visited more venues. This could be because people with a high Conscientiousness score tend to be quite organised and disciplined, and so are more likely to remember to check in at every place they visit.

The opposite is true for Neuroticism: the more neurotic an individual is, the fewer places they have visited. Neuroticism is associated with negative feelings, and a tendency to be less social, which could then translate into people going to fewer places, and so checking in less. This is expressed again when we look at only those venues classed as ‘social’ (i.e. - somewhere you would go to hang out with friends). The more neurotic someone is, the fewer ‘social’ venues they have been to.

Surprisingly, we have found no link between Extraversion and the number of social venues visited. It may be expected that extraverts (who are very social in their nature) may go to more social venues. However, the data does not support this. In fact, we find no link between Extraversion and any aspect of Foursquare checkins that we have examined so far.

The personality factor of Openness is related to feelings of creativity and artistic expression, and a willingness to experience new things. It is interesting to find that there is a link between Openness and the average distance travelled between checkins -** the more Open an individual is, the further they tend to have travelled**. This could be an expression of an Open individual’s desire to experience new things exposing itself through wider travel, and a larger geographic spread of checkins. However, we do not find any link between Openness and the number of different categories visited by a user. We do not see a desire for new experiences express itself in the range and diversity of places visited.

Ultimately, this data could be incredibly useful in improving venue recommendation systems. Current systems use many different information ‘cues’ to recommend to a user a place they might like to visit. These cues include things such as where they have been in the past, where their friends have been, or where is popular nearby. Perhaps by including aspects of an individual’s personality (so including aspects of why they might visit somewhere) we can increase the usefulness of these recommendations.

There is still a lot of analysis to be done on this data, and both myself and Nyala Noe are busy churning through it to discover other links between personality and the places people visit. As we find more interesting connections, I’ll post more here.

NHS Hackday 2015

Monday, Jan 26, 2015

This weekend I took part in an incredibly successful NHS hackday, hosted at Cardiff University and organised by Anne Marie Cunningham and James Morgan. We went as a team from the MSc in Computational Journalism, with myself and Glyn attending along with Pooja, Nikita, Annalisa and Charles. At the last-minute I recruited a couple of ringers as well, dragging along Rhys Priestland Dr William Wilberforce Webberley from Comsc and Dr Matthew Williams, previously of this parish. Annalisa also brought along Dan Hewitt, so in total we had a large and diverse team.

The hackday

This was the first NHS hackday I’d attended, but I believe it’s the second event held in Cardiff, so Anne Marie and the team have it down to a fine art. The whole weekend seemed to go pretty smoothly (barring a couple of misunderstandings on our part regarding the pitch sessions!). It was certainly one of the most well organised events that I’ve attended, with all the necessary ingredients for successful coding: much power, many wifi and plenty of food, snacks and coffee. Anne Marie and the team deserve much recognition and thanks for their hard work. I’m definitely in for next year.

The quality of the projects created at the hackday was incredibly high across the board, which was great to see. One of my favourites used an Oculus Rift virtual reality headset to create a zombie ‘game’ that could be used to test people’s peripheral vision. Another standout was a system for logging and visualising the ANGEL factors describing a patient’s health situation. It was really pleasing to see these rank highly with the judges too, coming in third and second in the overall rankings. Other great projects brought an old Open Source project back to life, created a system for managing groups walking the Wales Coast path, and created automatic notification systems for healthcare processes. Overall it was a really interesting mix of projects, many of which have clear potential to become useful products within or alongside the NHS. As Matt commented in the pub afterwards, it’s probably the first hackday we’ve been to where several of the projects have clear original IP with commercial potential.

Our project

We had decided before the event that we wanted to build some visualisations of health data across Wales, something like nhsmaps.co.uk, but working with local health boards and local authorities in Wales. We split into two teams for the implementation: ‘the data team’ who were responsible for sourcing, processing and inputting data, and the ‘interface team’ who built the front-end and the visualisations.

Progress was good, with Matthew and William quickly defining a schema for describing data so that the data team could add multiple data sets and have the front-end automatically pick them up and be able to visualise them. The CompJ students worked to find and extract data, adding them to the github repository with the correct metadata. Meanwhile, I pulled a bunch of D3 code together for some simple visualisations.

By the end of the weekend we established a fairly decent system. It’s able to visualise a few different types of data, at different resolutions, is mostly mobile friendly, and most importantly is easily extensible and adaptable. It’s online now on our github pages, and all the code and documentation is also in the github repository.

We’ll continue development for a while to improve the usability and code quality, and hopefully we’ll find a community willing to take the code base on and keep improving what could be a fairly useful resource for understanding the health of Wales.

Debrief

We didn’t win any of the prizes, which is understandable. Our project was really focused on the public understanding of the NHS and health, and not for solving a particular need within (or for users of) the NHS. We knew this going in to the weekend, and we’d taken the decision that it was more important to work on a project related to the course, so that the students could experience some of the tools and technologies they’ll be using as the course progresses than to do something more closely aligned with the brief that would have perhaps been less relevant to the students work.

I need to thank Will and Matt for coming and helping the team. Without Matt wrangling the data team and showing them how to create json metadata descriptors we probably wouldn’t have anywhere near as many example datasets as we do. Similarly, without Will’s hard work on the front end interface, the project wouldn’t look nearly as good as it does, or have anywhere near the functionality. His last-minute addition of localstorage for personal datasets was a triumph. (Sadly though he does lose some coder points for user agent sniffing to decide whether to show a mobile interface :-D.) They were both a massive help, and we couldn’t have done it without them.

Also, of course, I need to congratulate the CompJ students, who gave up their weekend to trawl through datasets, pull figures off websites and out of pdf’s, and create the lovely easy to process .csv files we needed. It was a great effort from them, and I’m looking forward to our next Team CompJ hackday outing.

One thing that sadly did stand out was a lack of participation from Comsc undergraduate students, with only one or two attending. Rob Davies stopped by on Saturday, and both Will and I discussed with him what we can do to increase participation in these events. Hopefully we’ll make some progress on that front in time for the next hackday.

Media

There’s some great photos from the event on Flickr, courtesy of Paul Clarke (Saturday and Sunday).

Elsewhere…

We got a lovely write-up about out project from Dyfrig Williams of the Good Practice Exchange at the Wales Audit Office. Dyfrig also curated a great storify of the weekend.

Hemavault labs have done a round up of the projects here

CompJ Labs - Postcodes

Monday, Jan 5, 2015

I’ve thrown up a post on compj.cs.cf.ac.uk about some work we’ve done looking at empty properties in Cardiff. I’ll add further posts over the coming weeks about other data analysis we’ve done on this topic, and new work looking at the Welsh Assembly.

Extended Mind Crowdsourcing

Monday, Dec 15, 2014

Update 13/01/15: the paper containing the research described below is currently available from the HICSS website

This post is one I’m cross-posting both here and on the MobiSoc blog. Here, because it’s my personal translation of one of our latest research papers, and there because it’s a very good paper mostly written and driven by Roger Whitaker, so deserves an ‘official’ blog post!

A lot of use is made of Crowdsourcing in both business and academia. Business likes it because it allows simple tasks to be outsourced for a small cost. Researchers like it because it allows the gathering of large amounts of data from participants, again for minimal cost. (For an example of this, see our TweetCues work (paper here), where we paid Twitter users to take a simple survey and massively increased our sample size for a few dollars). As technology is developing, we can apply crowdsourcing to new problems; particularly those concerned with collective human behaviour and culture.

Crowdsourcing

The traditional definition of crowdsourcing involves several things:

  1. a clearly defined crowd

  2. a task with a clear goal

  3. clear recompense received by the crowd

  4. an identified owner of the task

  5. an online process

The combination of all these things allows us to complete a large set of simple tasks in a short time and often for a reduced cost. It also provides access to global labour markets for users who may not previously have been able to access these resources.

Participatory Computing

Participatory computing is a related concept to crowdsourcing, based around the idea that the resources and data of computing devices can be shared and used to complete tasks. As with crowdsourcing, these tasks are often large, complex and data-driven, but capable of being broken down into smaller chunks that can be distributed to separate computing devices in order to complete the larger task. BOINC is a clear example of this class of participatory computing.

participatory_img

Extended Mind Crowdsourcing

The extended mind hypothesis describes the way that humans extend their thinking beyond the internal mind, to use external objects. For instance, a person using a notebook to record a memory uses the ‘extended mind’ to record the memory; the internal mind simply recalls that the memory is located in the notebook, an object that is external to the individual.

Extended mind crowdsourcing takes crowdsourcing and participatory computing a step further by including the extended mind hypothesis, to allow us to describe systems that use the extended mind of participants, as represented by their devices and objects, in order to add implicit as well as explicit human computation for collective discovery.

emc_img

What this means is that we can crowdsource the collection of data and completion of tasks using both individual users, their devices, and the extended mind that the two items together represent. Thus by accessing the information stored within a smartphone or similar personal device, and the wider internet services that the device can connect to, we can access the extended mind of a participant and thus learn more about his or her behaviour and individual characteristics. In essence, extended mind crowdsourcing captures the way in which humans undertake and respond to daily activity. In this sense it supports observation of human life and our interpretation of and response to the environment. By including social networks and social media communication within the extended mind, it is clear that while an individual extended mind may represent a single individual human, it is also possible to represent a group, such as a network or a collective using extended mind crowdsourcing.

By combining the ideas of social computing, crowdsourcing, and the extended mind, we are able to access and aggregate the data that is created through our use of technology. This allows us to extend ideas of human cognition into the physical world, in a less formal and structured way than when using other forms of human computational systems. The reduced focus on task driven systems allows EMC to be directed at the solving of loosely defined problems, and those problems where we have no initial expectations of solutions or findings.

This is a new way of thinking about the systems we create in order to solve problems using computational systems focused on humans, but it has the potential to be a powerful tool in our research toolbox. We are presenting this new Extended Mind Crowdsourcing idea this week at HICSS.