Archive for category Data Journalism

Behind the story, part three: creating a map with Fusion

In the first post I explained the purpose of this three-part tutorial. I was working on a couple of things for the Hackney Post, and I did this interactive map of entertainment venues in Hackney. I wanted to get the addresses, verify they were in Hackney, and then turn them into a map. You could of course, do this all by hand, or you could semi-automate the process by using scrapers and different tools.

The tutorial covers the following:

Part one: scraping with Outwit Hub
Part two: cleaning and verifying data
Part three: creating a map with Google Fusion

Using Google Fusion

Go to your Google Drive (if you don’t have one, you can get one very easily if you use Gmail – just go to and it should explain everything there). Click on create, and then click on Fusion Table. If you don’t have that option, just click “connect more apps” and you should be able to find it easily.

Screen Shot 2014-04-01 at 09.46.52

When prompted, choose the file you saved in the last tutorial (if it’s a CSV file, ensure that “comma” is checked for separator character) and click next. It should display a preview of the information you have in the table, so click next again, and then finish – unless you want to change some of these values, which may be useful later but we don’t need to tinker with them now. It should look a little something like this, with an extra map tab to the right:

Screen Shot 2014-04-01 at 09.52.46

The highlighted cells – the “Name” values – are what Google is suggesting it should use to find the location, if you want to make a map. Of course, this is wrong, so we can change that by clicking on the arrow next to the Postcode column, then Change.. and then change the type from Text into Location. All of the Postcode column should now be highlighted, and some of the Name values will be highlighted too. Change the Name column type from Location to Text, and then Google Fusion should start geocoding.

Screen Shot 2014-04-01 at 09.57.01

This is basically the process through which Google is figuring out what the location values mean – it’s thinking about how to process the information in the postcode column, because location values can be in many different forms. When that’s done, you want to click on the Map tab, and Google should have figured out that all of the pins are in Hackney, London.

Rename your map “Art galleries in Hackney”, so we don’t forget to do it later. Then click “Change feature styles…” – this is where you can change the icon if you want to. You can use this to specify gradients depending on values, or different icons for different values. In the original map I made, you’ll see that theatres, art galleries, and museums are all different colours. Before I imported into Fusion, I added an extra column with values like large_green, large_blue and large_red into the column, one for each type – and then I specified the column in the feature styles option.

For now, we just want it to be a large green icon. So click that and then save. If you click on one of the green pins now, it should look come up with a little window, like this. We want to change that.

Screen Shot 2014-04-01 at 10.03.33

Click on “Change info window…” and you can see exactly what just came up. If you click on Custom, you can customise it to say whatever you want. I’d like to bold the name and take out the postcode bit. Move {Name} into where Name: is, and get rid of the <b> tags and Postcode: text. It should look like this:

Screen Shot 2014-04-01 at 10.06.42

Now, when you click on a pin, it should have the name in bold, and the postcode written underneath it. And we’re done. Zoom in so you can see all of the pins as closely as possible (this is how people will see it when they initially click on it, so this is important!) Now, click the arrow next to the map tab, and Publish. You’ll need to change permissions to “anyone with a link” to be able to embed it and send links to people. Check that the link works, and you’re done!

Making more elaborate maps

In the original piece I did for the Hackney Post, I had the entire address, phone numbers where possible, and hyperlinks to the venue website, all contained in the info window on the map. This information I had retained from the various scrapes I did of the Independent Directory and other places, and I also scraped the first ten results for each venue from Google, using Outwit Hub. This tutorial made a really simple map that can be made quite quickly once you have the data, but there is so much more you can do with Google Fusion.


, ,

Leave a comment

Behind the story, part two: cleaning and verifying data

In the first post I explained the purpose of this three-part tutorial. I was working on a couple of things for the Hackney Post, and I did this interactive map of entertainment venues in Hackney. I wanted to get the addresses, verify they were in Hackney, and then turn them into a map. You could of course, do this all by hand, or you could semi-automate the process by using scrapers and different tools.

The tutorial covers the following:

Part one: scraping with Outwit Hub
Part two: cleaning and verifying data
Part three: creating a map with Google Fusion

Cleaning messy data

Open up the file you just saved all of your data as. The first thing you might notice is that Outwit has scraped some extra rows that don’t have addresses in them. We need to get rid of those, so click Data, then Filter – it should have put arrows on both of the columns. Click on the arrow on the address column, and click ‘Ascending’. Go down to the bottom, and you’ll see there are venue names without addresses. Highlight the rows and delete them, then clear the filter.

We now have a clean dataset we can work with.

Note: any dataset you download will be much messier than this – and even with a great scraper sometimes you need to spend a while cleaning the data.

Separating text in columns

What we want to do now is separate the address of each row, because all we need is the postcode in order to verify that it’s actually in Hackney. Looking at the address column, you’ll notice there are three chunks of the address which are separated by a comma.

Highlight the address column, and CTRL (or CMD) F. Search for “, “. That is, comma space. The space is necessary as it means there won’t be any erroneous spaces at the beginning or end of the cells. Click Replace… and type in “@”, and then click Replace all. It’ll look like this:

Screen Shot 2014-03-31 at 18.09.03

Then, click on Data and then Text to columns. Make sure ‘delimited’ is highlighted. Click next, and then where it says delimiters, put @ in the ‘other’ box. This tells it to split by the @ we just inserted into the column. Then click finish.

Most of the postcodes should be in column D but some of them are in E or even C. You now need to go through and just ensure that all of the postcodes are in the same column. Delete the other columns with the street or city in them, so you have 2 columns again; one with the name of the venue, and one with the postcode. Save it, and we’re ready to

More cleaning with Open Refine

Now you need to open Refine [download it here], and create a new project. Open the file you just saved – find it in the browser and click next. If it’s a csv file, make sure that ‘columns are separated by’ commas (CSV). The preview should show two columns exactly as they were in the spreadsheet. Click next and then ‘create project’. Now the project should be ready to edit.

First, copy the column to another column so we have our original postcode data if anything goes wrong.

Click the arrow next to ‘Address’ and then ‘Edit Column’ and click ‘Rename column’. Rename it as original. Then click the arrow again, edit column, then add column based on this column. It should be identical. Call it postcode and click ok.

‘Edit Cells’ then ‘Transform’. Like this:

Type in: replace(value," ","")

This will take the value ” ” (space) and replace it with nothing, thereby effectively removing the space. Replace is a good command to know.

Using Open Refine to scrape

Now, the aim of this part of the work is to validate the postcodes by using a website which looks up postcodes and brings back information about them. There are a few, but I quite like this particular one called Planning Finder – it’s really useful for getting information about postcodes and planning projects in the area. It automatically comes up with the nearest planning project to the postcode, so we know that the first value it returns is either the exact postcode or very near.

We will validate our postcodes two-fold. Firstly, if the postcode isn’t recognised at all, then we’ll ignore it. And secondly, we’ll find out what area the postcode is in, and if it isn’t in Hackney, then we know we don’t want it. Click the arrow on the postcode column again, edit column, and then ‘Add column by fetching URLs’.

The expression is: ""+value

You’re telling it to go to the website string, plus the value of whatever is in that cell – the postcodes – and to then download everything on that page into the cell. You will end up with the HTML of the entire page, so it’ll take a long time – time to grab a cup of tea!

When it’s done fetching everything, you need to transpose the column. Arrow > Edit cells > Transpose. This is the expression you need: value.parseHtml().select("")[0].select("")[3].toString()

It’ll parse the HTML, selecting the first table with the class “search-results”, and the fourth cell in the row (0 in coding is always the first), with the class “info”. This cell contains the borough information – you should see that in the preview. Then the toString() will convert the response into a string – if you don’t do this, it will simply fail.

You should now have something that looks like this – you can see immediately that some of them aren’t in Hackney.

Screen Shot 2014-03-31 at 21.55.51

So, let’s export that and work on that in Excel. Export it as a CSV or Excel document (top right) and open it up.

Do the same as we did before with find and replace – highlight the whole of column PF, find <td class="info"> and replace with nothing, and find </td> and replace it with nothing. Delete the column Postcode, because that doesn’t have any spaces in it. Then you want to get rid of all of the non-Hackney postcodes.

So click on the top row, go to Data, then Filter. Click on the arrow next to the PF column, and untick the two Hackney boroughs – you should now have all of the non-Hackney rows displayed. Select them all (click on the top row and drag down – do not CTRL/CMD-A as this won’t work) and then delete the rows. Clear the filter, and you should have 61 rows (60 venues, and the title row) with exactly what you want in them. Delete the column PF, as we don’t need that anymore, save it, and then you’re done!

In the next (and last) tutorial, we’ll look at making a map with Google Fusion.

, , , ,

Leave a comment

Behind the story, part one: scraping with Outwit

Recently I was working on a couple of things for the Hackney Post, and I did this interactive map of entertainment venues in Hackney. I wanted to get the addresses, verify they were in Hackney, and then turn them into a map. You could of course, do this all by hand, or you could semi-automate the process by using scrapers and different tools.

The tutorial covers the following:

Part one: scraping with Outwit Hub
Part two: cleaning and verifying data
Part three: creating a map with Google Fusion

Scraping a single page

I used several different sources, but for the sake of brevity, the page we’re going to scrape from is

Open up Outwit Hub [download or add the Firefox extension here], paste the URL in the address bar (like a normal browser) and make sure you’re on the “Page” tab at the very top. Find the first instance you want to get. In this case, we want to find the venue name, and the address. The first one is “Fortescue Avenue”.

We’re ready to scrape. Click on the ‘scrapers’ tab under ‘automators’ and click ‘New’. Call your new scraper IndyDirectory – it should look like so:

Then CTRL-F (or CMD-F if you’re on a Mac) and start typing in Fortescue Avenue, because that’s what we need to find. It’ll scroll down to where it’s found – this is the first instance of it.

You’ll see there are actually three places where Fortescue Avenue is written. One is in the HTML telling browsers it’s a link (after the <a href tag), one is the text link for the URL (in between the opening and closing <a> tags), and one is part of the address. We’ll get the address in a minute. The most important thing is to try and find a pattern before and after the text you need, which is repeated down all of the venue names, but which won’t pick up extra information you don’t want.

You can see that the text before the first instance is probably the one we want. It starts title=" and ends - art galleries in Hackney. This is perfect. Now put that into your marker before and marker after boxes, and name the column ‘Name’. Each row in the scraper is a new column in your eventual spreadsheet. See below, that’s what you should now have in the bottom window.

Screen Shot 2014-03-31 at 17.26.12

Now onto the address. We only need the postcode but we can sort that out later in Excel. This bit is even easier. Each address (scroll down to double check) has a <p class="adr"> before it, and a </p> after. Put those values in the ‘marker before/after’ boxes, and you’ve finished the scraper.

Now, check it works. Click “Execute” at the top of the bottom window.

It should have come up with 20 rows of neat venue names and addresses, like this:

Screen Shot 2014-03-31 at 17.32.15

Now, this isn’t the end of the directory – there are over 100 items in the list on the Independent directory, spread out over a few pages. You could go into each page and do it as we have just scraped the first one, but why do that when you can also automate this process too?

Scraping from multiple pages

If you go back to your normal browser and click on a different page at the bottom of the directory list, you’ll notice in the address bar that the difference between the pages is a number appended to the original URL, and it goes up to 7. Click on page 2 and copy the URL in the address bar.

Open Excel and open a new workbook. Paste the URL for page 2 into cell A2. Copy it into cell A1 and change the 2 at the end to a 1, because the first page must be that URL. Then highlight both cells and drag down until Excel automatically fills in cell A7 with the URL plus the 7. It should look like this:

Screen Shot 2014-03-31 at 17.37.52

**Make sure that the column is wider than the text in the column, otherwise the next part won’t work!

Next, save it as a web page. It depends which version of Excel you have, but it is either File > Save as Web Page, or File > Save as… and then you can choose web page from the drop-down. Call it something like “INDYURLs”.

Go to Outwit again, click File > Open… and choose the file you just saved. It will come up with an error but that’s ok. What you need to do to access it, is click on Tables under Data on the right. It should look like this:

Screen Shot 2014-03-31 at 17.42.33

Select all of the rows (click on one and then Ctrl/Cmd-A), then right-click, go to Auto-explore, then Fast Scrape, then pick IndyDirectory. Within a few seconds, it should have finished scraping, and tell you that you have 139 scraped rows (more than we need, but never mind! We’ll sort that in the next tutorial)

Select all, and then right-click, export as… I usually export in either CSV or Excel – CSV is the safest one to go for, as it’s more basic and doesn’t have any formatting in it.

There’s your first scraper done! In the next tutorial, we’ll clean up the data and verify the postcodes.

, ,

Leave a comment

The soldier suicides: Good story, done badly

The Metro had a story on the site last month about soldiers committing suicides, that I think is a really good and important story, but could have really been presented and written in a much better way. It’s here, if you want to have a look.

Firstly, we’re bombarded with figures but not really given enough space to actually make sense of them. It starts:

Every 65 minutes, a US military veteran takes his own life, the equivalent of 22 former soldiers committing suicide every day.

This would be much better if it was just written in a straight way, omitting the 65 minutes. While the “every 65 minutes” sounds more shocking, it’s a weird amount of time to get your head around. If it was 60 minutes (a unit of time we are all familiar with in our daily lives), it would have a lot more impact, but obviously you can’t just make up the numbers to suit your piece.

…more active-duty US soliders killed themselves (177) in 2012 than died in war zones (176).

Across all services, 349 military personnel committed suicide last year, compared to the 311 American troops killed in combat.

Yet more numbers that are difficult to make sense of easily. It may have been better to write it up without the numbers but show the figures in a chart, like this:

Soldier suicides

This example chart I just whipped up isn’t that great, because there are only two pieces of information in it, but the original data could have been interrogated to compare suicides with those killed in combat over a number of years. Has the number of suicides risen? Yes – the piece already explains that, but there is no way for the reader to explore that data themselves. The data has been cherry-picked, and as someone interested in figures and numbers, I’m left wondering what’s been left out.

There are some other interesting figures, too, relating to the number of suicides between 2001 and 2011 after tours to Afghanistan and Iraq. This is again, just written – it would be much better as a visualisation, easing the numbers into the piece rather than confronting the reader with so many figures that it’s hard to get a good understanding of what exactly is going on. Here’s one way they could have done it:

Soldier suicides 2

Talking of infographics and charts, the infographic included in the article has been made for the paper. It is longer than the screen (well, MY screen – on my MacBook pro) and effectively blocks the article in a way which isn’t that user-friendly. The two most useful points on this graphic are the ‘4,552 suicides by men in Britain’ (bottom right corner), and the donut chart at the top right corner, which explains that 74% of military cases assessed in 2011/12 had a mental disorder. That’s something worth charting in a more pleasing way, though, I think. Leading with a raw number (5,404 – which is the total number of military cases) doesn’t make a whole lot of sense here, whereas “nearly three quarters” or 74% is more immediately compelling.

The three soldier heads in the infographic is a bit strange – they don’t universally indicate ‘three times as likely’ (rather, I assumed it was something to do with three PEOPLE, eg three soldiers committing suicide every x hours), and the clock also doesn’t make much sense in this context, because (as I explained above) it’s a strange number that doesn’t translate well into a simple diagram, using units we are already conversant with.

Overall I think the piece could have done with more efficient and lively infographics which pick out key figures in a way which can be easily identified on the first read, rather than forcing the reader to go back through and spend time trying to fully get to grips with what’s going on. I also think that the numbers could have been sprinkled throughout the piece in a more subtle way – again, to save the reader the work of having to sit there and figure out which numbers mean what.

Leave a comment

Excess Winter Deaths in England, visualised

It’s been all over the news today, so I whipped up a quick graphic about Excess Winter Deaths. Click on the picture to see the visualisation and get more information on what each bar represents. Source: Office for National Statistics.

Excess Winter Deaths in England

Leave a comment

Google searches as a prediction tool

I’m still reading through Nate Silver’s book The Signal and The Noise, which is about using data to predict certain outcomes. He’s most well-known for accurately predicting the outcome of the US presidential election, but his book discusses baseball, the weather, and earthquake predictions, amongst others.

I’m up to the point where he discusses the economy and how horribly difficult it is for economists to accurately predict how it will behave – from those who are simply overly optimistic, to those who are just wrong because they see the economy as a set of variables, rather than a story. Or those who need to predict a certain way because it suits their political agenda.

He never claims it’s easy, but it’s mentioned that trends in Google searches could probably ‘predict’ economic failures before anybody else really notices. I think that one reason for this is that financial problems or worries tend to be kept privately. People hear rumours at work that there will be a round of redundancies – and immediately they’re on Google looking for another job, not necessarily reaching out to other people directly.

What do Google searches show?

'Find a job'

‘Find a job’

It’s perhaps hard to see when the picture is small, but a short glance at the trend for ‘find a job’ is quite interesting. See the short spike upwards every so often? This seems to indicate that in January, many more people in the UK are looking for a new job. It seems to be a common New Year’s Resolution! However, in the years that we know in hindsight were problematic, there doesn’t seem to be much of a difference compared to previous years (in fact, 2004-2006 seems to be particularly rough, with far more interest in ‘find a job’ than in recent years).

'Unemployed' and 'unemployment'

‘Unemployed’ and ‘unemployment’

Here, the red line indicates searches for ‘unemployment’, and the blue for ‘unemployed’. You can see a dramatic increase in the number of searches in late 2008. In fact, the exact month that the interest in ‘unemployment’ starts to increase is September 2008, when the Lehman Brothers were declared bankrupt, suggesting that it sent shockwaves throughout the UK, as well as the US.



Now, this chart is quite interesting. It shows the trend for JSA – Job Seeker’s Allowance. Obviously it’s hard to fully understand the motivations behind searching this but I note two things. Firstly, there’s a sharp incline at the very beginning of 2011. I believe that at the end of 2010 and beginning of 2011, the government started initiating changes in the way that benefits work, leading more people to look at JSA online.

Secondly, I think that as a result of this, there is likely a general increase in interest in JSA as a whole. I wouldn’t say this is particularly indicative of much other than genuine interest in JSA itself, or the changes to JSA, rather than people searching specifically to ‘get’ JSA. Of course, if people are out of work, then they are more likely to be interested in how to get JSA, but I would be wary of directly and explicitly linking the two.

'Apply for JSA'

‘Apply for JSA’

From October 2010 to November 2010, the searches for ‘Apply for JSA’ start coming in. Two years after the Lehman collapse and the financial crisis began, but also a few months into a new coalition government. A cursory search online suggests that there were a lot of policy suggestions/ideas about changing benefits around October and November 2010 – yet, again, that doesn’t fully explain the graph we see above. Renewed interest in applying for JSA as a result of loss of jobs logically follows a financial crisis, but it doesn’t account for the complete absence of searches for ‘Apply for JSA’ prior to that. There would have been people made redundant from 2005-2010 – or at least, there is no reason I can think of to believe this is not the case.

Google as a ‘state of the economy’ indicator

These are extremely brief forays into what historical search data can reveal about how policies and economic situations impact people directly, and influence the things they search for. Could it be used to predict or see in ‘real time’ what on earth is going on with the economy, before ‘official’ indicators and experts do? I think it is possible, if you are finely tuned to how the graph should look ‘normally’, and you can recognise the warning signs and key differences. But it needs to be used very wisely indeed, with as much background/context to the statistics as possible, with an open mind as to what indicators (search terms) you should look for, and with an awareness that not everything can be fully explained in statistics.

Because as we learned earlier, ‘the economy’ is not simply a mass of variables and inputs to be analysed, but a whole range of intricate and complex stories to be examined.

, , , , , , , , , , , ,

1 Comment

2012 mayoral & assembly election turnout by borough

While I was looking for some datasets a few days ago, I stumbled upon a goldmine of data for London, which also included boundaries in shape files. This time, I wanted to map some very simple data. (Last time I was trying to do something so complicated that I couldn’t finish it).

It took me a little bit of tinkering (where, for example, names like Kensington & Chelsea weren’t matched to Kensington and Chelsea) and some thinking to figure out what to do with the colours. The lowest election turnout in 2012 was Barking and Dagenham with 27.7% coming out to vote, whereas the highest was only 48.5% in the City of London.

Naturally, this means there isn’t all that much difference in some of the boroughs – so instead of colouring by bucket (as is sometimes useful), I decided to pick the colours based on a gradient going from red, through orange, through yellow, to green.

It’s not all that interesting, really, but here it is: Mayoral & Assembly elections 2012. Questions, pointers, etc gratefully received @SophieWarnes.

, , , ,

Leave a comment

Playing with Google Fusion – prison overcrowding

I saw a past student’s blogpost on Friday about using GF to create chloropleth maps, and we didn’t do a data session on Friday, so I thought I would give it a go and try and learn something by myself over the weekend.

It started out well. Having decided to throw caution to the wind and abandon tutorials I’d seen online that used US boundaries and US data (all far easier to find, better-documented, etc), I found a dataset I was interested in – England & Wales prison population numbers in the last month, ie September 2013. And I figured out what I wanted to do. To my total lack of surprise, it turns out a fair few prisons are currently overcrowded.

Wrong format, stupid

Now, helpfully (not), it was in a word format and it took me just a few minutes to learn something that I genuinely have been irritated about for some time and never knew was possible. How do you change a word document into an excel document?

Here you go:

  1. Open Word document
  2. File -> Save as… HTML
  3. Open Excel
  4. File -> Open the file you just saved

Ridiculously simple.

Gathering data

So I had the data in the right format, but I wanted to show it visually as ‘by county’ not by prison (to be honest, this was a terrible piece of decision-making because it serves zero purpose, but I wanted to see if I could do it. Now, after spending hours on it, I’ve decided to go back to where I was at the start – more on that in a minute.)

I scraped the table about where the prisons are located in England and Wales from Wikipedia, using a Google Spreadsheet and using the importHTML function. Then I merged the two tables together in Excel in a really complicated and time-consuming way, which I probably could have done automatically. [Copied & pasted in location data as extra column, deleted ones that didn’t match up…it took me a while]. I then added up all the figures for each county, then calculated the % of ‘operational capacity’ taken up.

But what does it all mean?

The column headings in the prison dataset aren’t immediately obvious. I had to do some searching to figure out what everything meant, and out of five or six columns, decide which columns I was going to use… Or, which columns are the most accurate to use, to find out the extent of overcrowding in prisons. There’s no point in using ‘operational capacity’ against CNA because that doesn’t take into account cells that can’t currently be used, eg. So I guess my point here, is make sure you know what figures you need to use.

Finding the borders

I must have spent about three to four hours just trying to find any way of getting what I wanted – county border data in a format that could be geocoded by Google Fusion. I cannot believe that there is not a file out there containing the data I need. I refuse to believe it. But I did try everything I could think of, and I’m still not there. I tried downloading shapefiles, converting those to KMLs, I tried importing KMLs directly, I tried merging other tables with geocoded data in… ARGH. Nothing works. I have found the files I thought I could use but I…couldn’t. I don’t know why.

When I eventually figure this all out, I’m going to make sure I keep a copy for future use. No way I’m letting it go!


Having failed in my quest to find borders that work with the county names I have, I have fully given up with the idea of doing it by border. And in any case, it’s not really a true picture of the prison situation… Some areas have several prisons, some have one…plus, as they are all added together it doesn’t make sense.

But this is as far as I’ve got with my map:

Prison Population by County Map

Remember that this is *all prisons in counties*, NOT individual prisons. The map tells you where there’s overcrowding in prison/s. The markers are as follows:

  • Green: 0-50% full
  • Yellow:  50%-100%
  • Red dots: 100-150%
  • Red markers: 150%+

So you can see that the total capacity for Swansea & Bedfordshire have been exceeded, by 50% or over. As it happens, both counties only have one prison so those two are genuinely quite bad. Although the numbers are fairly small (we’re talking around 400 prisoners)- but they are still over capacity.

Lessons learned..

  • Make sure you figure out the right way to visualise something before you embark on something that will take you hours to do
  • Use something easy
  • I need to figure out how to get borders to work

What next?

I’m not happy that the info is *markers* by county (so they’re super vague) instead of boundary-lines with colour, but I think the best thing for me to do is to take the data back to how it was, scrape the actual prison addresses from somewhere, and plot them exactly where they are. I think that will give a much clearer picture of what’s going on. When I’ve done that I’ll publish and link it properly so you can have a play around.

, , ,

Leave a comment

Sifting through crap, making a map

[Rhyming unintentional but I think it works.]

Hoxton Ward

The lovely ward of Hoxton

Patch problems

I’ve been given the ward Hoxton in the borough of Hackney, which you would think would be an excellent place to find things to investigate. No such luck. For the last week and a half I’ve been really struggling to get to grips with the area.

It’s not that nothing is going on, it’s more that there’s no contention, and everything really interesting is just out of my reach! There’s a mooted CPO in my area, for example, and as far as I can gather (more on that later), all the residents are perfectly happy with moving. Go just a road away from my ward to the north, and you’ll find hundreds of people who are livid at Hackney Council, who don’t want to move, who are unhappy with where they’ve moved to, etc. Perhaps I am being too ambitious here.

Sifting through crap

I’ve spent many, many, many hours on the Hackney Council website this week desperately trying to find leads, people I can contact, etc. The problem is that I half-suspect council websites were specifically designed to obfuscate important information from the public. I feel like I’ve read every single document on there (though naturally, I haven’t, and I don’t think anyone would ever have the time – or inclination – to) and I feel like I’ve lived in Hackney forever, such is my increasing familiarity. [For the unaware: I’ve never lived in Hackney!]

What I did realise quite early on was that, as I was (attempting to) source things online, I wasn’t entirely sure where things were, whether they were in my ward or not, and consequently whether I could cover them. So many potential amazing stories have passed me by because they just aren’t in my area, and it keeps completely throwing me. I decided that it would be much more helpful to just have a map where my boundary was overlaid onto it, which I could use as a reference point. And if I could plot key organisations into it, then so much the better.

I actually stumbled upon what I needed much later. I was half-looking at a list of community groups and organisations in Hackney, which could be exported in the form of a .xls spreadsheet. It suddenly occurred to me that I could experiment with using this data. What would make the most sense for me to do? Ah. That map I was thinking of.

Making a map

This is fairly geeky but I’ve wanted to make my own map for a while now, I just never really had something solid I could work on. I didn’t really have a specific project. But being given Hoxton was a brilliant excuse to fiddle around with Google’s Maps Engine. It’s actually far, far easier than I thought it would be. With a little Googling and a little trial and error, I managed to mark a boundary around Hoxton on the map and I imported my spreadsheet into it. Which actually worked, to my surprise. I still can’t quite believe how little time it actually took to make it; the most time-consuming part of the whole process was actually finding the data which I’d accidentally found on my internet travels. Strange, huh?

Incidentally one other thing worth checking out is the Hackney shape file. James Ball mentioned shape files yesterday and I thought ‘huh, that sounds like exactly what I was trying to do, AND SO MUCH EASIER’. So I Googled Hackney shape files. Lo and behold, some lovely man by the name of Ændrew Rininsland had realised that this might be useful too, so he made a .kmz file with all the Hackney ward shape files in it. I can’t seem to find a use beyond temporarily overlaying Google Maps with ward shapes (ie I can’t search for anything else when that’s on), but perhaps somebody else might be able to use it.

Hoxton community groups

My slightly faulty Hoxton map

Dirty, dirty data

The initial map I’ve made (see above) is made from ‘raw’ data I haven’t bothered verifying or organising into something coherent that I can easily reference. The eagle-eyed among you will notice that I’ve drawn the boundary wrong – I had to flip back and forth from GME to the map of the ward at the top of this page and I couldn’t read the road names. That’s fixed on my map now!

My next tasks: take out the layer of raw data because it’s absolutely useless to me right now; clean the spreadsheet I have; verify addresses and organisations. Then, I want to import the clean data and find some way of simultaneously building up a system of links, feeds, events, etc from this.

What do you think so far? What other tools do you think I could make use of as I build up a rich digital picture of Hoxton?

Leave a comment