I've been working with DfT Lab on a four week project looking at innovative methods of getting traffic data.
Normally this is done with sensors in the road, roadside cameras, or even getting someone to manually count at the roadside.
Researchers in the road freight stats team wanted a way to count vehicles in places that didn't have sensors, or were difficult to get to. Satellite imagery seemed like a potential solution.
The European Space Agency had a similar idea back in 2008, and spent 3 years and 150,000 Euro on the problem. They concluded that it wasn't quite viable or cheap enough yet, and ended the project in 2011. Naturally, we thought we could do better, particularly with the huge strides in machine learning and satellite image quality since then.
Britain’s Next Top Model
The 3 developers assigned to this project were all apprentices and had never touched machine learning before, let alone built a neural network. But, it was a surprisingly accessible topic. We learned the fundamentals in just a few days.
We started by using a machine learning framework called YOLO (You Only Look Once). The original is built in C (gulp!), but we found a great implementation in the wonderful Keras machine learning API. YOLO is a very fast convolutional neural network that can even analyse and identify items in real time in videos.
We trained the network on a small sample set of around 200 images. But despite the basics of machine learning being easy, there were obstacles along the way. Trucks from space look like small rectangles. Unfortunately, so do cars, vans, caravans, buildings, trampolines, trees…
Using such a small sample set made it very difficult for YOLO to learn to differentiate those objects. Ideally we would have thousands or 10 of thousands of images containing trucks to train the neural net on.
Despite the impression we had from Google Earth’s glorious imagery, satellite photos are quite hard to get hold of. Satellites will only get a few usable images of the same bit of land a year. This is partly to do with the way satellites orbit the earth, but also clouds and stuff get in the way. As you might guess, this is a bigger problem for the UK than most.
Near the end of the project, our neural network still wasn't identifying trucks in pictures accurately enough. We came to the conclusion that our small data set, and the variety in it, was just too challenging for our network to learn from. So, in one last ditch effort, we fired in a much larger dataset to train it on all vehicles, rather than just trucks. It worked a lot better…
Now we could fairly reliably identify vehicles in our images (70% ish accuracy on our dataset), we turned our attention to counting them. What could we do with just some snapshots of vehicles on a road at a given time and date?
Turning 5 minutes into a year
As it turns out quite a lot.
There’s been a lot of thought given over the years to the issue of estimating annual traffic from small amounts of data – see Larsen et al (2008), McCord et al (2002), Fu et al (2016). The basic theory is that if you take a count of about 15 minutes of traffic, you can with some accuracy predict what’s called the Average Annual Daily Traffic (AADT). On a road of vehicles travelling at 60mph, that’s images of about 15 miles of road, or 3 pictures of 5 miles of road at different times. In previous attempts, other teams have got to within about 10% of the correct count using this technique (using counts of under-road sensors as a comparison count).
Of course, if your data covers just a small area, you could count the vehicles reliably by hand. That’s where our neural net comes in.
We feed the network video or images of a massive area of land, and it quickly identifies all the vehicles in the video. You could count every vehicle in London in a few minutes (though not very accurately).
The problem with this method, though very quick, is it’s easy to double count vehicles. Much smarter people have given this quite a lot of thought. But we implemented quite a naive system (we ran out of time) which compares the predictions between each frame, basically AABB collision detection, for any game developers reading. Still, it worked pretty well!
So...is it a goer?
Kind of. Here’s what we think you need to do to make it work:
- Get some people with more machine learning know-how to help improve the detection model - maybe through collaboration with some academics
- Buying satellite imagery isn’t that easy in 4 weeks. Don’t try to do it on the cheap, reach out to suppliers and explain your needs. We had to use poor quality imagery and it made training and testing quite tricky.
We’re sharing our findings with the road freight stats team for them to decide if it’s worth pursuing. We think they could advance it, perhaps with academics or private sector providers.
Our code is here for you to play with: https://github.com/departmentfortransport/dftlab-yolo-vehiclecounting
If you want to hear more from us you can join our mailing list here or get in touch with us at firstname.lastname@example.org