activity distribution

Have you ever thought about how the way people are spread out in a city can tell us a lot about that city's landscape? If enough location data is available, the geographic distribution of the appearances (activities) will describe the urban area quite well. Let's take a look at some examples of cities like Toronto, London, Helsinki, and Dallas-Fort Worth. Here, we can see mobile positioning data in pink (MPD) and landuse data from the OpenStreetMap in green (OSM). The MPD is shown as 250-by-250-meter cells. If someone used their phone in a cell, it'll show up as pink. The map data (OSM) shows among others residential and industrial areas, but not parks, lakes, and so on. People like to spend some time in parks, so it's no surprise that the two are a bit different, but the overall shape is pretty similar.

finding the city location

Let's say we know the where people used their mobile phones in a city, but we don't know the city. As we've seen in the previous examples, the human activity is closely linked to urban areas. There's a cool technique in digital image processing called template matching. It helps us find the city silhouette, which is like a picture of the city made from mobile positioning data, on a map.

The illustration is by Laserlicht and Benjamin Watson, CC BY 4.0, taken from Wikipedia.

template matching example

Let's take a look at an example of how to find a city on the map using the template matching with mobility data. The activity can be visualized as a heatmap (marker a), where the low activity areas are dark blue and the high activity areas are yellow with a gradual transition. To make the template matching step a bit easier, let's use a 'binary' image, with two colors (marker c). We can find a threshold where the low and high activity levels are separated (check out the histogram, marker b). This gives us a "binary" template, similar to the activity example we saw before. But here's where it gets a little different from the explanation figure below, the two colors should match the map characteristics. Now, let's use dark blue for the water and yellow for the land. By simplifying the figures colors-wise. It's important to keep in mind that the template and the map should be generated in the same projection. The template matching works on images, and the result is a relative location (marker d) of the template image in the large image. This relative location can be used to determine the geographical coordinates using the map data and the applied projection.

privacy considerations

You might think that hiding the geographic location of mobility data is a good way to protect the users' privacy, when mobility data is published. But, publishing location data always comes with some privacy risks. This is especially true when additional information is available for an individual, such as their location at a specific time, which could potentially lead to their identification. Take, for example, the article in a New York Times article by Stuart A. Thompson and Charlie Warzel about the president of the United States.

photo by Paweł Zdziarski, CC BY-SA 3.0, from Wikimedia

template matching results

So, if concealing the location coordinates only means transforming them to an unknown plane, that won't solve the underlying issue. This is because the geographic distribution of human mobility is characterizes so well the urban landscape. It'll be possible to locate the observation area on the map, but the only challenge could be figuring out the applied transformation.

The next few figures show where the city was positioned by the template matching algorithm using mobility data as a template in the cases of Toronto, London, Helsinki, and Dallas-Fort Worth.

how can the privacy of individuals be improved?

There's a bit of a balancing act here: we want to share mobility data for research, but we also need to protect people's privacy. Concealing the geographic location doesn't help at all. If we completely remove location information, the data becomes basically unusable for urban planning. If we combine locations into larger areas, like 500 by 500 meters or even larger, it can make it harder to pinpoint exact locations. But at a city-scale, this aggregation has to be significant to make a difference. Research by de Montjoye and his colleagues showed that a user stands out from the crowd, if the top four locations (e.g., home and work locations and two frequent locations from the commuting route) are distinguishable. So, the spatial aggregation has to be strong enough to hide the differences between the four most frequently visited locations, which will inherently decrease the usefulness of the data.

background figure: modified version of Jonah Aragon's, CC BY-SA 4.0, taken from Wikimedia.

adding noise

Adding some noise to the data could be a great way to improve the individual privacy, while keeping the higher-level mobility trends intact. Here are some possible ways to do this: (i) randomly displace a number of observed locations of an individual by a given radius, or (ii) randomly include fictional locations into the trajectories or randomly remove locations. The tricky part is figuring out the right level of randomness for the noise, but that's a topic for another day.

The figures show what happens when random noise is added to every location point (in the Toronto data) with 1 km deviation. Blue squares represent cells where there were no activity before the noise addition, red cells lost their activity due to the activity relocation, and black cells had activity before and after the noise effect.

In the world of computer vision, the noise is like a blurring effect that removes the details, but keeps the general shape of the city, as long as the "blur" is applied within reasonable limits (without the intent of completely destroying the data). The template matching isn't affected much by some noise, and the city location can still be revealed, but individuals might be less traceable.