Archive for October, 2013

Playing with Google Fusion – prison overcrowding

I saw a past student’s blogpost on Friday about using GF to create chloropleth maps, and we didn’t do a data session on Friday, so I thought I would give it a go and try and learn something by myself over the weekend.

It started out well. Having decided to throw caution to the wind and abandon tutorials I’d seen online that used US boundaries and US data (all far easier to find, better-documented, etc), I found a dataset I was interested in – England & Wales prison population numbers in the last month, ie September 2013. And I figured out what I wanted to do. To my total lack of surprise, it turns out a fair few prisons are currently overcrowded.

Wrong format, stupid

Now, helpfully (not), it was in a word format and it took me just a few minutes to learn something that I genuinely have been irritated about for some time and never knew was possible. How do you change a word document into an excel document?

Here you go:

  1. Open Word document
  2. File -> Save as… HTML
  3. Open Excel
  4. File -> Open the file you just saved

Ridiculously simple.

Gathering data

So I had the data in the right format, but I wanted to show it visually as ‘by county’ not by prison (to be honest, this was a terrible piece of decision-making because it serves zero purpose, but I wanted to see if I could do it. Now, after spending hours on it, I’ve decided to go back to where I was at the start – more on that in a minute.)

I scraped the table about where the prisons are located in England and Wales from Wikipedia, using a Google Spreadsheet and using the importHTML function. Then I merged the two tables together in Excel in a really complicated and time-consuming way, which I probably could have done automatically. [Copied & pasted in location data as extra column, deleted ones that didn’t match up…it took me a while]. I then added up all the figures for each county, then calculated the % of ‘operational capacity’ taken up.

But what does it all mean?

The column headings in the prison dataset aren’t immediately obvious. I had to do some searching to figure out what everything meant, and out of five or six columns, decide which columns I was going to use… Or, which columns are the most accurate to use, to find out the extent of overcrowding in prisons. There’s no point in using ‘operational capacity’ against CNA because that doesn’t take into account cells that can’t currently be used, eg. So I guess my point here, is make sure you know what figures you need to use.

Finding the borders

I must have spent about three to four hours just trying to find any way of getting what I wanted – county border data in a format that could be geocoded by Google Fusion. I cannot believe that there is not a file out there containing the data I need. I refuse to believe it. But I did try everything I could think of, and I’m still not there. I tried downloading shapefiles, converting those to KMLs, I tried importing KMLs directly, I tried merging other tables with geocoded data in… ARGH. Nothing works. I have found the files I thought I could use but I…couldn’t. I don’t know why.

When I eventually figure this all out, I’m going to make sure I keep a copy for future use. No way I’m letting it go!


Having failed in my quest to find borders that work with the county names I have, I have fully given up with the idea of doing it by border. And in any case, it’s not really a true picture of the prison situation… Some areas have several prisons, some have one…plus, as they are all added together it doesn’t make sense.

But this is as far as I’ve got with my map:

Prison Population by County Map

Remember that this is *all prisons in counties*, NOT individual prisons. The map tells you where there’s overcrowding in prison/s. The markers are as follows:

  • Green: 0-50% full
  • Yellow:  50%-100%
  • Red dots: 100-150%
  • Red markers: 150%+

So you can see that the total capacity for Swansea & Bedfordshire have been exceeded, by 50% or over. As it happens, both counties only have one prison so those two are genuinely quite bad. Although the numbers are fairly small (we’re talking around 400 prisoners)- but they are still over capacity.

Lessons learned..

  • Make sure you figure out the right way to visualise something before you embark on something that will take you hours to do
  • Use something easy
  • I need to figure out how to get borders to work

What next?

I’m not happy that the info is *markers* by county (so they’re super vague) instead of boundary-lines with colour, but I think the best thing for me to do is to take the data back to how it was, scrape the actual prison addresses from somewhere, and plot them exactly where they are. I think that will give a much clearer picture of what’s going on. When I’ve done that I’ll publish and link it properly so you can have a play around.


, , ,

Leave a comment

Do we have a better way of measuring engagement online?

There are loads of tools that help you measure social media/community engagement online right now. Clearly this is something that people are becoming more and more concerned about; who is looking at, reading about, writing about our product? Who is sharing our news stories? Why?

I just can’t help but notice that so much of it is about numbers.

Klout scores

Klout assesses your engagement (supposedly) and gives you a number, ‘ranking’ you in how you influence people in your circle, ie how much you can get them to engage with things you write about or point out online. I’ll be honest; I just don’t understand the reasoning behind of pinning a number to your ‘value’ as an engaging user. What does the number 46 tell me about you? Bugger-all. Are Klout scores merely another vanity tool, a way to tell you you’re great… or a way to tell other people that you’re great?*


I use Hootsuite a lot (though I am going to move to for various reasons) to schedule, since Tweetdeck was bought by Twitter and fell off a very high cliff. I like Hootsuite – I’m a free user, and I get detailed analytics about which articles do well, where people click from, etc – but again my issue is that these are all numerical values. They don’t tell me much about the intention behind the action.

Did someone click my link by accident? Did they click it because they saw someone else retweet it? How many degrees is this person from me? Are they glad they clicked it or are they annoyed because it wasn’t what they expected or wanted? The answer to those questions would perhaps be more interesting. It would be useful to know if people click directly from my page or from other people’s retweets, because if it’s the case that people click through retweets, then that makes a retweet – through people you already know – arguably more valuable than the amount of followers you have. Similarly, it’d be much easier to ‘do social media right’ if I knew how successful I was at pointing people to what they wanted to read. What’s the point in having followers who don’t click links or engage with you?

Why use quantitative data to measure actions?

It’s easy. I get it. But really, what does having 50,000 followers tell me about someone? That they’ve been on the site for long enough, that they are a nice person, that they are useful? Perhaps. But that doesn’t mean that someone with 500 followers isn’t also all of those things.

The key thing is: engagement has intention and meaning beyond numbers. I wonder how many people click on Daily Mail links because they don’t like it and they want to hate-read. How many people tweet along with the #XFactor hashtag but love to hate it? Numbers just don’t cut it in these instances.

What about Facebook? Facebook Page Insights gave us stats about users ‘talking about’ a page, users ‘engaging with’ a page (both in terms of who has seen it, and who has liked, commented, shared). I have no idea if people are talking about my page in a positive way. In the case that they are talking negatively, there’s not a lot I can do to resolve that person’s negative perception of my page. (In fact, I have yet to find a way of actually seeing where people are talking about my page, but that’s a different issue.)

My answer to all of these questions would be that analysing engagement in a way that takes into account intention or meaning would be brilliantly useful for social media/communities editors.

Of course, this analysis is already done – by people who are employed by brands or PR companies, to monitor social media for mentions of the brand. Is there a brand with so much action on social media that they cannot cope? I don’t know. But it would be a lot easier to automate it.

I’m not arguing that current analytical tools are not useful or should not be used. Not by a long shot. Some of them are really sophisticated (and I’m aware as a free/non-corporate user, I don’t have full access to them) and great for measuring your success on social media. I would just be wary of drawing too many absolute conclusions about what those figures really mean when we have so little information about what’s behind them.

Automating qualitative analysis of engagement

People are currently trying to fix this very problem with comments on articles. In fact, recently, there was a Hack Day on ‘re-imagining comments’ which a team from The Times won. The ideas to come out of that are definitely interesting. These take into account either an extra layer of ‘moderation’ whereby communities or staff label people as useful/experts, or an analysis of the words used in the comment.

So when you comment “this is great, I agree”, it’s flagged up as a positive comment; “terrible article” is clearly a negative one. It’s definitely a move closer to the kind of thing I’m talking about, but there are some obvious issues especially when it comes to exact words that could mean something very different out of context. Instead of searching for words or phrases in isolation, the whole sentence/paragraph needs to be contextually analysed. Now it gets difficult… right?

I like coding now and again, but I’ve never touched social media APIs and I wouldn’t have a clue about where to start. Maybe the next Hack Day could be about re-imagining analytical tools?

If I’ve missed an important analytical tool which does take into account semantics, feel free to tweet or leave a comment. I’m very interested in trying it out!

*If you’ve found a use for Klout, again…tweet me, because I cannot for the life of me understand how it’s useful

, , , , ,

1 Comment

Who moves the goalposts when the badgers aren’t around?

I wrote a really dreary post yesterday about the course and I feel like in doing so I kind of did it and myself a disservice. It wasn’t talking about the course in a bad light; more my struggles and concerns about certain modules. But I’ve decided I can look at these issues from a different perspective.

Today Paul Bradshaw talked to us about the BASIC principles of writing online so I am going to do my utmost to stick to them.

This post is about goals. No badgers, sorry.

I was reading this piece by Scott Adams the cartoonist, about how to succeed. The key is failing. Over and over. He also says he doesn’t believe in ‘following your passion’ or giving yourself goals.

I like goals. I’m constantly making them to try to challenge myself, then doing them, then trying something harder.

Goals help me to focus my energy and time. And importantly they aren’t static – I move my goalposts (like badgers, apparently) to adapt to what I’m learning and doing.

End of term goals…

  • Get better and more confident at doing maths (online course?) – measure by ability to cope with data etc
  • Set up my own method of learning Media Law
  • Take part in whole of PA week and write up notes
  • Set up work experience at two different places
  • Find 3-5 tools I can use to get the most out of Twitter
  • Experiment with Twitter, document experiments if interesting/useful
  • Play around with datasets and visualisations – see what I can come up with
  • Come up with 10 valid, doable ideas for the group blog

A couple of these could be quite challenging. I am so excited about 70-80% of the course but there’s that small fraction that I just find too difficult or intimidating right now, and I think having some positive pointers and goals will help me.

Leave a comment

Sifting through crap, making a map

[Rhyming unintentional but I think it works.]

Hoxton Ward

The lovely ward of Hoxton

Patch problems

I’ve been given the ward Hoxton in the borough of Hackney, which you would think would be an excellent place to find things to investigate. No such luck. For the last week and a half I’ve been really struggling to get to grips with the area.

It’s not that nothing is going on, it’s more that there’s no contention, and everything really interesting is just out of my reach! There’s a mooted CPO in my area, for example, and as far as I can gather (more on that later), all the residents are perfectly happy with moving. Go just a road away from my ward to the north, and you’ll find hundreds of people who are livid at Hackney Council, who don’t want to move, who are unhappy with where they’ve moved to, etc. Perhaps I am being too ambitious here.

Sifting through crap

I’ve spent many, many, many hours on the Hackney Council website this week desperately trying to find leads, people I can contact, etc. The problem is that I half-suspect council websites were specifically designed to obfuscate important information from the public. I feel like I’ve read every single document on there (though naturally, I haven’t, and I don’t think anyone would ever have the time – or inclination – to) and I feel like I’ve lived in Hackney forever, such is my increasing familiarity. [For the unaware: I’ve never lived in Hackney!]

What I did realise quite early on was that, as I was (attempting to) source things online, I wasn’t entirely sure where things were, whether they were in my ward or not, and consequently whether I could cover them. So many potential amazing stories have passed me by because they just aren’t in my area, and it keeps completely throwing me. I decided that it would be much more helpful to just have a map where my boundary was overlaid onto it, which I could use as a reference point. And if I could plot key organisations into it, then so much the better.

I actually stumbled upon what I needed much later. I was half-looking at a list of community groups and organisations in Hackney, which could be exported in the form of a .xls spreadsheet. It suddenly occurred to me that I could experiment with using this data. What would make the most sense for me to do? Ah. That map I was thinking of.

Making a map

This is fairly geeky but I’ve wanted to make my own map for a while now, I just never really had something solid I could work on. I didn’t really have a specific project. But being given Hoxton was a brilliant excuse to fiddle around with Google’s Maps Engine. It’s actually far, far easier than I thought it would be. With a little Googling and a little trial and error, I managed to mark a boundary around Hoxton on the map and I imported my spreadsheet into it. Which actually worked, to my surprise. I still can’t quite believe how little time it actually took to make it; the most time-consuming part of the whole process was actually finding the data which I’d accidentally found on my internet travels. Strange, huh?

Incidentally one other thing worth checking out is the Hackney shape file. James Ball mentioned shape files yesterday and I thought ‘huh, that sounds like exactly what I was trying to do, AND SO MUCH EASIER’. So I Googled Hackney shape files. Lo and behold, some lovely man by the name of Ændrew Rininsland had realised that this might be useful too, so he made a .kmz file with all the Hackney ward shape files in it. I can’t seem to find a use beyond temporarily overlaying Google Maps with ward shapes (ie I can’t search for anything else when that’s on), but perhaps somebody else might be able to use it.

Hoxton community groups

My slightly faulty Hoxton map

Dirty, dirty data

The initial map I’ve made (see above) is made from ‘raw’ data I haven’t bothered verifying or organising into something coherent that I can easily reference. The eagle-eyed among you will notice that I’ve drawn the boundary wrong – I had to flip back and forth from GME to the map of the ward at the top of this page and I couldn’t read the road names. That’s fixed on my map now!

My next tasks: take out the layer of raw data because it’s absolutely useless to me right now; clean the spreadsheet I have; verify addresses and organisations. Then, I want to import the clean data and find some way of simultaneously building up a system of links, feeds, events, etc from this.

What do you think so far? What other tools do you think I could make use of as I build up a rich digital picture of Hoxton?

Leave a comment