Posts Tagged google fusion

Behind the story, part three: creating a map with Fusion

In the first post I explained the purpose of this three-part tutorial. I was working on a couple of things for the Hackney Post, and I did this interactive map of entertainment venues in Hackney. I wanted to get the addresses, verify they were in Hackney, and then turn them into a map. You could of course, do this all by hand, or you could semi-automate the process by using scrapers and different tools.

The tutorial covers the following:

Part one: scraping with Outwit Hub
Part two: cleaning and verifying data
Part three: creating a map with Google Fusion

Using Google Fusion

Go to your Google Drive (if you don’t have one, you can get one very easily if you use Gmail – just go to drive.google.com and it should explain everything there). Click on create, and then click on Fusion Table. If you don’t have that option, just click “connect more apps” and you should be able to find it easily.

Screen Shot 2014-04-01 at 09.46.52

When prompted, choose the file you saved in the last tutorial (if it’s a CSV file, ensure that “comma” is checked for separator character) and click next. It should display a preview of the information you have in the table, so click next again, and then finish – unless you want to change some of these values, which may be useful later but we don’t need to tinker with them now. It should look a little something like this, with an extra map tab to the right:

Screen Shot 2014-04-01 at 09.52.46

The highlighted cells – the “Name” values – are what Google is suggesting it should use to find the location, if you want to make a map. Of course, this is wrong, so we can change that by clicking on the arrow next to the Postcode column, then Change.. and then change the type from Text into Location. All of the Postcode column should now be highlighted, and some of the Name values will be highlighted too. Change the Name column type from Location to Text, and then Google Fusion should start geocoding.

Screen Shot 2014-04-01 at 09.57.01

This is basically the process through which Google is figuring out what the location values mean – it’s thinking about how to process the information in the postcode column, because location values can be in many different forms. When that’s done, you want to click on the Map tab, and Google should have figured out that all of the pins are in Hackney, London.

Rename your map “Art galleries in Hackney”, so we don’t forget to do it later. Then click “Change feature styles…” – this is where you can change the icon if you want to. You can use this to specify gradients depending on values, or different icons for different values. In the original map I made, you’ll see that theatres, art galleries, and museums are all different colours. Before I imported into Fusion, I added an extra column with values like large_green, large_blue and large_red into the column, one for each type – and then I specified the column in the feature styles option.

For now, we just want it to be a large green icon. So click that and then save. If you click on one of the green pins now, it should look come up with a little window, like this. We want to change that.

Screen Shot 2014-04-01 at 10.03.33

Click on “Change info window…” and you can see exactly what just came up. If you click on Custom, you can customise it to say whatever you want. I’d like to bold the name and take out the postcode bit. Move {Name} into where Name: is, and get rid of the <b> tags and Postcode: text. It should look like this:

Screen Shot 2014-04-01 at 10.06.42

Now, when you click on a pin, it should have the name in bold, and the postcode written underneath it. And we’re done. Zoom in so you can see all of the pins as closely as possible (this is how people will see it when they initially click on it, so this is important!) Now, click the arrow next to the map tab, and Publish. You’ll need to change permissions to “anyone with a link” to be able to embed it and send links to people. Check that the link works, and you’re done!

Making more elaborate maps

In the original piece I did for the Hackney Post, I had the entire address, phone numbers where possible, and hyperlinks to the venue website, all contained in the info window on the map. This information I had retained from the various scrapes I did of the Independent Directory and other places, and I also scraped the first ten results for each venue from Google, using Outwit Hub. This tutorial made a really simple map that can be made quite quickly once you have the data, but there is so much more you can do with Google Fusion.

Advertisements

, ,

Leave a comment

2012 mayoral & assembly election turnout by borough

While I was looking for some datasets a few days ago, I stumbled upon a goldmine of data for London, which also included boundaries in shape files. This time, I wanted to map some very simple data. (Last time I was trying to do something so complicated that I couldn’t finish it).

It took me a little bit of tinkering (where, for example, names like Kensington & Chelsea weren’t matched to Kensington and Chelsea) and some thinking to figure out what to do with the colours. The lowest election turnout in 2012 was Barking and Dagenham with 27.7% coming out to vote, whereas the highest was only 48.5% in the City of London.

Naturally, this means there isn’t all that much difference in some of the boroughs – so instead of colouring by bucket (as is sometimes useful), I decided to pick the colours based on a gradient going from red, through orange, through yellow, to green.

It’s not all that interesting, really, but here it is: Mayoral & Assembly elections 2012. Questions, pointers, etc gratefully received @SophieWarnes.

, , , ,

Leave a comment

Playing with Google Fusion – prison overcrowding

I saw a past student’s blogpost on Friday about using GF to create chloropleth maps, and we didn’t do a data session on Friday, so I thought I would give it a go and try and learn something by myself over the weekend.

It started out well. Having decided to throw caution to the wind and abandon tutorials I’d seen online that used US boundaries and US data (all far easier to find, better-documented, etc), I found a dataset I was interested in – England & Wales prison population numbers in the last month, ie September 2013. And I figured out what I wanted to do. To my total lack of surprise, it turns out a fair few prisons are currently overcrowded.

Wrong format, stupid

Now, helpfully (not), it was in a word format and it took me just a few minutes to learn something that I genuinely have been irritated about for some time and never knew was possible. How do you change a word document into an excel document?

Here you go:

  1. Open Word document
  2. File -> Save as… HTML
  3. Open Excel
  4. File -> Open the file you just saved

Ridiculously simple.

Gathering data

So I had the data in the right format, but I wanted to show it visually as ‘by county’ not by prison (to be honest, this was a terrible piece of decision-making because it serves zero purpose, but I wanted to see if I could do it. Now, after spending hours on it, I’ve decided to go back to where I was at the start – more on that in a minute.)

I scraped the table about where the prisons are located in England and Wales from Wikipedia, using a Google Spreadsheet and using the importHTML function. Then I merged the two tables together in Excel in a really complicated and time-consuming way, which I probably could have done automatically. [Copied & pasted in location data as extra column, deleted ones that didn’t match up…it took me a while]. I then added up all the figures for each county, then calculated the % of ‘operational capacity’ taken up.

But what does it all mean?

The column headings in the prison dataset aren’t immediately obvious. I had to do some searching to figure out what everything meant, and out of five or six columns, decide which columns I was going to use… Or, which columns are the most accurate to use, to find out the extent of overcrowding in prisons. There’s no point in using ‘operational capacity’ against CNA because that doesn’t take into account cells that can’t currently be used, eg. So I guess my point here, is make sure you know what figures you need to use.

Finding the borders

I must have spent about three to four hours just trying to find any way of getting what I wanted – county border data in a format that could be geocoded by Google Fusion. I cannot believe that there is not a file out there containing the data I need. I refuse to believe it. But I did try everything I could think of, and I’m still not there. I tried downloading shapefiles, converting those to KMLs, I tried importing KMLs directly, I tried merging other tables with geocoded data in… ARGH. Nothing works. I have found the files I thought I could use but I…couldn’t. I don’t know why.

When I eventually figure this all out, I’m going to make sure I keep a copy for future use. No way I’m letting it go!

Mapping

Having failed in my quest to find borders that work with the county names I have, I have fully given up with the idea of doing it by border. And in any case, it’s not really a true picture of the prison situation… Some areas have several prisons, some have one…plus, as they are all added together it doesn’t make sense.

But this is as far as I’ve got with my map:

Prison Population by County Map

Remember that this is *all prisons in counties*, NOT individual prisons. The map tells you where there’s overcrowding in prison/s. The markers are as follows:

  • Green: 0-50% full
  • Yellow:  50%-100%
  • Red dots: 100-150%
  • Red markers: 150%+

So you can see that the total capacity for Swansea & Bedfordshire have been exceeded, by 50% or over. As it happens, both counties only have one prison so those two are genuinely quite bad. Although the numbers are fairly small (we’re talking around 400 prisoners)- but they are still over capacity.

Lessons learned..

  • Make sure you figure out the right way to visualise something before you embark on something that will take you hours to do
  • Use something easy
  • I need to figure out how to get borders to work

What next?

I’m not happy that the info is *markers* by county (so they’re super vague) instead of boundary-lines with colour, but I think the best thing for me to do is to take the data back to how it was, scrape the actual prison addresses from somewhere, and plot them exactly where they are. I think that will give a much clearer picture of what’s going on. When I’ve done that I’ll publish and link it properly so you can have a play around.

, , ,

Leave a comment