Archive for October, 2013
I saw a past student’s blogpost on Friday about using GF to create chloropleth maps, and we didn’t do a data session on Friday, so I thought I would give it a go and try and learn something by myself over the weekend.
It started out well. Having decided to throw caution to the wind and abandon tutorials I’d seen online that used US boundaries and US data (all far easier to find, better-documented, etc), I found a dataset I was interested in – England & Wales prison population numbers in the last month, ie September 2013. And I figured out what I wanted to do. To my total lack of surprise, it turns out a fair few prisons are currently overcrowded.
Wrong format, stupid
Now, helpfully (not), it was in a word format and it took me just a few minutes to learn something that I genuinely have been irritated about for some time and never knew was possible. How do you change a word document into an excel document?
Here you go:
- Open Word document
- File -> Save as… HTML
- Open Excel
- File -> Open the file you just saved
So I had the data in the right format, but I wanted to show it visually as ‘by county’ not by prison (to be honest, this was a terrible piece of decision-making because it serves zero purpose, but I wanted to see if I could do it. Now, after spending hours on it, I’ve decided to go back to where I was at the start – more on that in a minute.)
I scraped the table about where the prisons are located in England and Wales from Wikipedia, using a Google Spreadsheet and using the importHTML function. Then I merged the two tables together in Excel in a really complicated and time-consuming way, which I probably could have done automatically. [Copied & pasted in location data as extra column, deleted ones that didn’t match up…it took me a while]. I then added up all the figures for each county, then calculated the % of ‘operational capacity’ taken up.
But what does it all mean?
The column headings in the prison dataset aren’t immediately obvious. I had to do some searching to figure out what everything meant, and out of five or six columns, decide which columns I was going to use… Or, which columns are the most accurate to use, to find out the extent of overcrowding in prisons. There’s no point in using ‘operational capacity’ against CNA because that doesn’t take into account cells that can’t currently be used, eg. So I guess my point here, is make sure you know what figures you need to use.
Finding the borders
I must have spent about three to four hours just trying to find any way of getting what I wanted – county border data in a format that could be geocoded by Google Fusion. I cannot believe that there is not a file out there containing the data I need. I refuse to believe it. But I did try everything I could think of, and I’m still not there. I tried downloading shapefiles, converting those to KMLs, I tried importing KMLs directly, I tried merging other tables with geocoded data in… ARGH. Nothing works. I have found the files I thought I could use but I…couldn’t. I don’t know why.
When I eventually figure this all out, I’m going to make sure I keep a copy for future use. No way I’m letting it go!
Having failed in my quest to find borders that work with the county names I have, I have fully given up with the idea of doing it by border. And in any case, it’s not really a true picture of the prison situation… Some areas have several prisons, some have one…plus, as they are all added together it doesn’t make sense.
But this is as far as I’ve got with my map:
Remember that this is *all prisons in counties*, NOT individual prisons. The map tells you where there’s overcrowding in prison/s. The markers are as follows:
- Green: 0-50% full
- Yellow: 50%-100%
- Red dots: 100-150%
- Red markers: 150%+
So you can see that the total capacity for Swansea & Bedfordshire have been exceeded, by 50% or over. As it happens, both counties only have one prison so those two are genuinely quite bad. Although the numbers are fairly small (we’re talking around 400 prisoners)- but they are still over capacity.
- Make sure you figure out the right way to visualise something before you embark on something that will take you hours to do
- Use something easy
- I need to figure out how to get borders to work
I’m not happy that the info is *markers* by county (so they’re super vague) instead of boundary-lines with colour, but I think the best thing for me to do is to take the data back to how it was, scrape the actual prison addresses from somewhere, and plot them exactly where they are. I think that will give a much clearer picture of what’s going on. When I’ve done that I’ll publish and link it properly so you can have a play around.
I wrote a really dreary post yesterday about the course and I feel like in doing so I kind of did it and myself a disservice. It wasn’t talking about the course in a bad light; more my struggles and concerns about certain modules. But I’ve decided I can look at these issues from a different perspective.
Today Paul Bradshaw talked to us about the BASIC principles of writing online so I am going to do my utmost to stick to them.
This post is about goals. No badgers, sorry.
I was reading this piece by Scott Adams the cartoonist, about how to succeed. The key is failing. Over and over. He also says he doesn’t believe in ‘following your passion’ or giving yourself goals.
I like goals. I’m constantly making them to try to challenge myself, then doing them, then trying something harder.
Goals help me to focus my energy and time. And importantly they aren’t static – I move my goalposts (like badgers, apparently) to adapt to what I’m learning and doing.
End of term goals…
- Get better and more confident at doing maths (online course?) – measure by ability to cope with data etc
- Set up my own method of learning Media Law
- Take part in whole of PA week and write up notes
- Set up work experience at two different places
- Find 3-5 tools I can use to get the most out of Twitter
- Experiment with Twitter, document experiments if interesting/useful
- Play around with datasets and visualisations – see what I can come up with
- Come up with 10 valid, doable ideas for the group blog
A couple of these could be quite challenging. I am so excited about 70-80% of the course but there’s that small fraction that I just find too difficult or intimidating right now, and I think having some positive pointers and goals will help me.
[Rhyming unintentional but I think it works.]
I’ve been given the ward Hoxton in the borough of Hackney, which you would think would be an excellent place to find things to investigate. No such luck. For the last week and a half I’ve been really struggling to get to grips with the area.
It’s not that nothing is going on, it’s more that there’s no contention, and everything really interesting is just out of my reach! There’s a mooted CPO in my area, for example, and as far as I can gather (more on that later), all the residents are perfectly happy with moving. Go just a road away from my ward to the north, and you’ll find hundreds of people who are livid at Hackney Council, who don’t want to move, who are unhappy with where they’ve moved to, etc. Perhaps I am being too ambitious here.
Sifting through crap
I’ve spent many, many, many hours on the Hackney Council website this week desperately trying to find leads, people I can contact, etc. The problem is that I half-suspect council websites were specifically designed to obfuscate important information from the public. I feel like I’ve read every single document on there (though naturally, I haven’t, and I don’t think anyone would ever have the time – or inclination – to) and I feel like I’ve lived in Hackney forever, such is my increasing familiarity. [For the unaware: I’ve never lived in Hackney!]
What I did realise quite early on was that, as I was (attempting to) source things online, I wasn’t entirely sure where things were, whether they were in my ward or not, and consequently whether I could cover them. So many potential amazing stories have passed me by because they just aren’t in my area, and it keeps completely throwing me. I decided that it would be much more helpful to just have a map where my boundary was overlaid onto it, which I could use as a reference point. And if I could plot key organisations into it, then so much the better.
I actually stumbled upon what I needed much later. I was half-looking at a list of community groups and organisations in Hackney, which could be exported in the form of a .xls spreadsheet. It suddenly occurred to me that I could experiment with using this data. What would make the most sense for me to do? Ah. That map I was thinking of.
Making a map
This is fairly geeky but I’ve wanted to make my own map for a while now, I just never really had something solid I could work on. I didn’t really have a specific project. But being given Hoxton was a brilliant excuse to fiddle around with Google’s Maps Engine. It’s actually far, far easier than I thought it would be. With a little Googling and a little trial and error, I managed to mark a boundary around Hoxton on the map and I imported my spreadsheet into it. Which actually worked, to my surprise. I still can’t quite believe how little time it actually took to make it; the most time-consuming part of the whole process was actually finding the data which I’d accidentally found on my internet travels. Strange, huh?
Incidentally one other thing worth checking out is the Hackney shape file. James Ball mentioned shape files yesterday and I thought ‘huh, that sounds like exactly what I was trying to do, AND SO MUCH EASIER’. So I Googled Hackney shape files. Lo and behold, some lovely man by the name of Ændrew Rininsland had realised that this might be useful too, so he made a .kmz file with all the Hackney ward shape files in it. I can’t seem to find a use beyond temporarily overlaying Google Maps with ward shapes (ie I can’t search for anything else when that’s on), but perhaps somebody else might be able to use it.
Dirty, dirty data
The initial map I’ve made (see above) is made from ‘raw’ data I haven’t bothered verifying or organising into something coherent that I can easily reference. The eagle-eyed among you will notice that I’ve drawn the boundary wrong – I had to flip back and forth from GME to the map of the ward at the top of this page and I couldn’t read the road names. That’s fixed on my map now!
My next tasks: take out the layer of raw data because it’s absolutely useless to me right now; clean the spreadsheet I have; verify addresses and organisations. Then, I want to import the clean data and find some way of simultaneously building up a system of links, feeds, events, etc from this.
What do you think so far? What other tools do you think I could make use of as I build up a rich digital picture of Hoxton?