2. Literature Review

Understanding human movements and recognizing behavior patterns that occur during daily life in urban areas requires a systematic analysis of that human mobility. The evaluation of human travel is based on observations on the individual and group levels. In the last decades, several novel datasets, based on vehicular GPS and cellular network records or social media information, became available, which provided more accurate and sophisticated characterization of people’s movements. This chapter provides a brief introduction to this field. Figure 2.1, shows the distribution of the referenced papers from the last two decades.

Reference distribution of this work, from the last two decades.
Figure 2.1.: Reference distribution of this work, from the last two decades.

Mobile Network Data

The mobile phone network, during its operation, constantly communicates with the cell phones. This communication can be divided into two categories: (i) the passive, cell-switching communication that keeps the cell phones ready to use the mobile phone network at any time, and (ii) the active, billed usage of the mobile phone network, including phone calls, text messages or mobile internet usage. The Call Detail Records (CDR) collect the latter, containing information about the subscriber, the time of the activity, and the place (via the cell), where the activity occurred.

Strictly speaking, Call Detail Record (CDR) contains only call information, and the expression eXtended Detail Record (XDR) is used to denote other types of billed communication (i.e., mobile internet traffic). As CDRs are generated only when a user makes or receives a call, its temporal resolution is low. In that sense, XDRs has a higher resolution since it is a mixture of human- and device-triggered communication [16]. An application can request data transfer without human interaction, downloading e-mails, for example. Note that, within this work, the expression CDR is also used to denote XDR, as the obtained data does not contain any information regarding the activity type (see Chapter Data Sources for details).

The so-called passive communication (also referred to as Control Plane Record (CPR)) is network-triggered at cell-switching, or, for example, when the network status of the cellphone is monitored [16]. Both the active and the passive communication have the same geographical resolution, as associated with the same base stations or cells that are supposed to be more or less constant. However, the temporal granularity of the passive communication can be significantly finer.

Mobility Indicators

There are some indicators that are widely used in the literature to characterize human mobility, like Radius of Gyration, Entropy, or the distance between Home and the Work locations (Section Home and Work Locations). These indicators are determined for every subscriber.

Radius of Gyration

The Radius of Gyration [17] defines a circle where an individual can usually be found. It was originally defined in (1), where $L$ is the set of locations visited by the individual, $r_{cm}$ is the center of mass of these locations, and $N$ is the total number of visits of time spent at these locations. The $r_i$ is the coordinate of location $i$, and $n_i$ is the number of visits or the time spent at location $i$ [5]. $$\tag{1} r_g = \sqrt{\frac{1}{N} \sum_{i \in L}{n_i (r_i - r_{cm})^2}}$$

K-Radius of Gyration

The K-radius of Gyration is calculated using only the $k$ most frequent locations of the individual [5], defined by (2). Pappalardo et al, used this approach to classify individuals by their mobility customs. Two classes named “Returners” and “Explorers” have been defined by the value of gyration. Returners are who spend most of their time between the $k$ most frequent locations ($r_g^{(2)} > r_g / 2$), in contrast to the explorers whose activity area cannot be described with the $k$ most frequent locations ($r_g^{(2)} < r_g / 2$) [10]. $$\tag{2} r_g^{(k)} = \sqrt{\frac{1}{N_k} \sum_{i = 1}^{k}{n_i (r_i - r_{cm}^{(k)})^2}}$$

Number of Activity Locations

The number of activity locations (3) is simply the number of the visited cells during an observation period, defined by Xu et al. in [10].

$$\tag{3} A = \left|(s_1, s_2, …, s_n)\right|$$

Mobility Entropy

The mobility entropy (or mobility diversity) of the visited locations characterizes the diversity of the individual’s movements, defined as Equation (4), where $L$ is the set of locations visited by the individual, $l$ represents a single location, $p(l)$ is the probability of an individual being active at a location $l$ and $N$ is the total number of activities for an individual [18, 19]. $$\tag{4} MD = - \frac{\sum_{l \in L}{p(l) \log p(l)}}{\log N}$$

The generalization of the Mobility Entropy is the term, Travel Diversity [10]. Instead of the diversity of the locations, it determines the diversity of the travels between consecutive locations. It can be calculated to $k$-length transitions, where $k=1$ gives back the location entropy. It is possible to consider the transitions with or without direction, contingent on whether the difference between $L_1 \rightarrow L_2$ and $L_2 \rightarrow L_1$ is important or not.

Social Sensing

CDR processing is often applied for large social event detection, such as football matches [20, 21, 22, 23, 24], concerts [25], sociopolitical events [23, 26] or mass protests [27]. When thousands of people are in the same place at the same time, they generate a significant “anomaly” in the data, whereas small groups usually do not stand out from the “noise”. This is especially true when the passive, transparent communication between the mobile phone device and the cell is not included in the data, but only the active communication (voice calls, text messages, and data transfer) is recorded.

Wirz et al. estimated the crowd density during Lord Mayor’s Show 2011 in London [28]. In [20, 22, 21] and [26], the authors examined the location of stadiums, where the football matches took place. Traag et al. [20] and Hiir et al. [26] also found that the mobile phone activity of the attendees decreased significantly. Xavier et al. compared the reported number of attendees of these events with the detected ones. Furletti et al. also analyzed sociopolitical events, football matches, and concerts in Rome [23]. Barnett et al. analyzed the attendees of Kumbh Mela, a 3-month-long Hindu festival [29]. Tourism is also studied via mobile network data; Xu et al. followed traveler groups within South Korean cities [30] and international travelers all around the country [31].

Qian et al. found that tourists tend to rest near close to their destinations or in the city center, using Weibo (Chinese microblog app) and Flickr (international photo-sharing service) [32].

COVID-19

The data used in this work predates the COVID-19 pandemic and solely focuses on the “normal” life of Budapest.

Epidemiology is used to be mentioned as a potential application of human mobility studies, with some applications like [33], but the COVID-19 pandemic prioritized its applications in Digital Epidemiology, as mobile phone network data can reflect the mobility changes caused by the imposed restrictions. The term “Digital Epidemiology” can be used when working with data that was not generated with the primary purpose of epidemiological studies [34], that had considerable applications even before the COVID-19 pandemic [35]. Mobile network data is also utilized to analyze human mobility during the COVID-19 pandemic and the effectiveness of the restrictions.

Willberg et al. identified a significant decrease in the population presence in the largest cities of Finland after the lockdown compared to a usual week [36]. Romanillos et al. reported similar results from the Madrid metropolitan area [37]. W. D. Lee et al. examined the effect of the SES on the mobility changes during the lockdown and found that the mobility of the wealthier subscribers decreased more significantly in England [38]. This is in good agreement with [39], where Yechezkel et al. found that poorer regions of Israel showed lower and slower compliance with the mobility restrictions. Khataee et al. compared the effect of the social distancing in several countries, using mobility data from Apple phones [40]. Bushman et al. [41], Gao et al. [42], Hu et al. [43] and Tokey [44] also analyzed effects of the stay-at-home distancing on the COVID-19 increase rate, in the US. Gao et al. found a negative correlation between stay-at-home distancing and COVID-19 increase rate [42]. Lucchini et al. studied the mobility changes during the pandemic in four states of the USA [45]. Bushman et al. analyzed the compliance to social distancing in the US using mobile phone data [41]. Yabe et al. found that one week into the state of emergency, human mobility behavior decreased by around 50%, resulting in a 70% reduction of social contacts in Tokyo [46]. This also confirms that mobile network data analysis is an efficient tool to monitor the effect of restrictions, just as Google Mobility data [47].

Still, these analyses might not be common enough. Oliver et al. asked: “Why is the use of mobile phone data not widespread, or a standard, in tackling epidemics?” [48]. This, however, is not within the scope of this study.

Selecting the Subscribers

Csáji et al. took into account subscribers who had at least ten activities during the observation period (15 months) [49]. Xu et al. chose to use those subscribers who had at least one activity record on at least half of the days during the observation period [10]. Pappalardo et al. discarded the subscribers who had only one location, and the individuals have at least half as many calls as hours are in the data set. Furthermore, the abnormally active (more than 300 calls per day) SIM cards are excluded [5]. I selected the SIM cards that have activity for at least 20 days (out of 30), the daily mean activity number is at least 40 on workdays and at least 20 on weekends, but not more than 1000. The upper limit is especially important to remove SIM cards that possibly operate in mobile broadband modems, for example. More details in Section Selecting Active SIMs. Filtering by activity is not necessarily sufficient to keep only individuals in the data set. Type Allocation Codes (TAC), on the other hand, can determine the type of the device and the exact model of a cell phone (Section Device Types).

Commuting

Identifying the home and work locations of a subscriber is a common and crucial part of the CDR processing, as a good portion of the people live their lives in an area that is determined by only their home and workplace [5, 50]. Since these locations fundamentally determine the people’s mobility customs, the commuting trends can be analyzed between these locations. The commuting is studied within a city [50, 51], or between cities [52, 53, 54], and also examined by social network data, such as Twitter [55, 56].

Csáji et al. determined the subscribers’ most common locations, and based on weekly calling patterns, identified the home and work locations [49]. Home locations showed a strong correlation with population statistics. Diao et al. applied a regression model to travel survey data to predict the activity type (e.g., home, work, or social) of the mobile phone location data by considering the temporal distributions of different activities [57]. Xu et al. determined the home locations and then applied a modified standard distance to measure the spread of each subscriber’s activity space [58]. Pappalardo et al. used the Radius of Gyration (Section Mobility Indicators) to separate the subscribers based on their mobility customs and defined two classes: returners and explorers [5]. While in the case of returners, the radius of gyration is dominated by their movement between a few preferred locations, the explorers have a tendency to travel between a larger number of different locations. To demonstrate this dichotomy, they defined the k-radius of gyration, which refers to the gyration radius of the $k$ most frequent locations. The gyration radius of a two-returner is determined by the two most frequented locations, that is usually the home and work locations [5], so this method can also be used as a home detection algorithm. Pappalardo et al. [16] compared the estimated home locations of sixty-five subscribers with the known geographical coordinates of their residence location, using different types of mobile network data: CDR, eXtended Detail Record (XDR) and Control Plane Record (CPR). It has been found that XDRs should be preferred when performing home location detection.

Vanhoof et al. compared five different home detection algorithms (HDA), selecting the home cell by (i) the most activity, (ii) the most number of distinct days with phone activities, (iii) the most activities within a time interval (between 19:00 and 7:00), (iv) the most activities within a spatial perimeter, and (v) the combination of the temporal and spatial constraints [59].

Jiang et al. identified daily activity patterns (motifs, Figure 2.2), that can extend the home-work location based daily routine [50], the home locations were validated with census and household travel survey results.

Daily motifs, based on [50].
Figure 2.2.: Daily motifs, based on [50].

Whereas Yin et al. separated the different types of activity (home, work, leisure, school) with chains of activity [60], providing different approaches for a similar purpose. Mamei et al. computed origin-destination flows with road network mapping and also validated the home location estimation with census data [53]. Zagatti et al. studied commuting in Haiti [52].

Dannemann et al. [61] partitioned the city of Santiago (Chile) into several communities and identified the socioeconomic composition of these communities based on the home-work trajectories.

Connectivity

Traveling within a city or across cities is not necessarily bound to the home or work locations. Understanding the travel customs concerning recreational activities or tourism, for example, is also an important aspect of urban mobility.

Trasarti et al. used sequential pattern mining for identifying activity patterns in the CDR data and identified interconnections at the urban and national level [62]. Lee et al. analyzed commuting across ten Korean cities and determined the attractiveness of the cities based on which city attracted more commuters [63]. Fiadino et al. identified clusters where the visitors of Barcelona concentrate their activities and connections between the districts of Barcelona [51]. Using mobile network and POI data, Qian et al. determined that tourists tend to rest near sightseeing destinations and choose transportation hubs in the city center, furthermore connections between popular destinations were identified [32].

Ghahramani et al. investigated hotspots [64] and interaction flows between the parts of Macau and found that people tend to communicate within their close-proximity communities [65]. Fan et al. mapped the trajectories, extracted from CDRs to the road network, then the crowd flux was estimated [66]. Ni et al. extracted origin-destination information and confirmed positive connections between the population, key facilities (e.h., shopping malls or hospitals), transportation accessibility, and travel flows [67].

Gender Differences

Gauvin et al. revealed a gender gap in mobility, they found that women visit fewer locations than men, and spend their time less equally among those locations [68]. Goel et al. evaluated gender segregation by the social network interactions of the mobile network data, and found that Estonian speakers more likely interact with other Estonian speakers of the same gender [69]. Al‐Zuabi et al. predicted the subscribers’ age and gender based on mobile phone network activities [70]. Besides these examples, the other studies also attend to gender differences, like [71, 72].

Socioeconomic Status

The demographic metrics and Social Economic Status (SES) seem to have a significant relationship to individual travel behavior. Early studies aimed to investigate the correlation between the human travel characteristics and between SES [73, 74]. Utilizing mobility indicators, calculated from mobile network data, to infer SES is a current direction of mobility analysis, as the study of city structures (for example, Aung et al. classified land use types based on mobile network activity [75], Furno et al. fused GPS traces and mobile phone data [8]) led to the analyses of the socioeconomic structure of the population.

Cottineau and Vanhoof [19] developed a model to explore the relationship between mobile phone data and traditional socioeconomic information from the national census in French cities. Mobile phone indicators were estimated from six months of Call Detail Records, while census and administrative data are used to characterize the socioeconomic organization of French cities. The findings show that some mobile phone indicators relate significantly to different socioeconomic organization of cities. Pokhriyal et al. [76] used a computational framework to accurately predict the Global Multidimensional Poverty Index (MPI), in Senegal, based on environmental data and CDR. The methodology provides the accurate prediction of important dimensions of poverty: health, education, and standard of living. The estimations have been validated using deprivations calculated from the census.

Some investigations suggest that the mobile phone data can be used to predict individual SES [11] or regional socioeconomic characteristics [77]. Xu et al. [10] used an analytical framework on large-scale mobile phone and urban socioeconomic datasets to evaluate mobility patterns and SES. Six mobility indicators, housing prices, and income in Singapore and Boston have been used to analyze the socioeconomic classes. It was found that phone users who are generally wealthier tend to travel shorter distances in Singapore, but longer, in Boston. The research brought interesting findings but also showed that the relationship between mobility and socioeconomic status is worth investigating in other cities and countries as well.

Xu et al. [58] investigated the people’s daily activities in Shenzhen, China, and identified so-called “north–south” differences in human activity, which findings are in good agreement with the socioeconomic divide in the city. Zhao et al. proposed a semi-supervised hypergraph-based factor graph model to predict individual SES, using data of Shanghai [78].

Castillo et al. calculated Human Development Index for locally available data for Ecuador to describe the socioeconomic status and used in comparison to their mobile phone based approach [79]. Barbosa et al. found significant differences in the average travel distance between the low and high-income groups in Brazil [54].

The different social classes live in different parts of a city, but CDRs also have been used to analyze gender, and minority segregation [69]. Cottineau et al. [19] explored the relationship between mobile phone data and traditional socioeconomic information from the national census in French cities. Barbosa et al. also found significant differences in the average travel distance between the low- and high income groups in Brazil [54].

Ucar et al. revealed a socioeconomic gap in mobile service consumption [80]. Vilella et al. found that education and age play news media consumption patterns in Chile, using a dataset that provides information about the visited websites [81].

While Blumenstock et al. used the call history as a factor of socioeconomic status [11], Sultan et al. [12] applied mobile phone prices as a socioeconomic indicator and identified areas where more expensive phones appear more often. However, only manually collected market prices were used. Beiró et al. examined the visitors of 16 malls in Santiago de Chile and found that people tend to choose a profile of malls more in line with their own socioeconomic status and the distance from their home [82].

Lenormand et al. utilized Entropy as a measure of attractiveness and socioeconomic complexity and found a positive and exponential relationship between income level and entropy, based on mobile network data at Rio de Janeiro Metropolitan Area [83]. Based on mobile network data, De Nadai et al. found that socioeconomic conditions, mobility, and physical characteristics of the neighborhood explain the emergence of crime [84].

Leo et al. investigated socioeconomic correlations between mobile phone communication records and anonymous bank transaction history over eight months. The latter contains daily debit/credit card purchases, their monthly loan measures, billing postal code, age, and gender [85]. They demonstrated that people of the same socioeconomic class are better connected and live closer to each other within their own class.

Sleep-wake cycle

The studies cited before mainly focus on the spatial distance between the home and work locations, as it is hard to estimate the travel time [86], using sporadic CDR data, though it has seasonal nature due to the human biorhythm. Moreover, human mobility is highly regular [17, 87], and the individual activity has a bursty characteristic [88]. Jo et al. found that by removing the circadian and weekly seasonality, the bursty nature of the human activity remains [89].

In the digital era, the human sleep wake cycle (SWC) is also studied using the info-communication systems, such as smartphones [90, 91], websites [92, 93, 81], social media [94] and call detail records (CDR) [95, 89, 71, 96, 97, 98, 99]. Cuttone et al. used screen-on events of smartphones to study the daily sleep periods [90], and Aledavood et al. examined the social network of different chronotypes, using the same data set [91]. Monsivais et al. identified yearly and seasonal patterns in calling activity and resting periods [97, 98]. Lotero et al. found a connection between temporal patterns and the socioeconomic status of the subscribers, namely, the wealthier wake up later [96]. Diao et al. found a difference in the daily activity between different districts of Boston [57].

Using Call Detail Records, Roy et al. found that chronotype is largely dependent on age, and younger subscribers are more likely to be evening people, but also found differences between women and men [72].

Visualization

Practically all the previously referenced publications use Voronoi tesselation for representing the cells of the mobile network. I also chose this method, but this does not mean that the Voronoi polygons generated around the base stations or the cell centroids give a perfect representation of the cell coverage. In the case of base stations that usually serve multiple antennas in different directions, the Voronoi polygons may be closer to the real coverage area, apart from the fact that some antennas may cover a much larger area than the others of the given base station. When the cell centroids are known (as in the case of the “April 2017” dataset of this work, Section Vodafone April 2017), the Voronoi polygons are even less ideal approximation. On the other hand, close cells are merged during the data processing (e.g., in [51], or as I did). Especially after the merge, the Voronoi tesselation seemed the best available option.

Csáji et al. [49] and Ogulenko et al. [100] presented probabilistic positioning model as an alternative. Ricciato et al. also wrote about the issues of the Voronoi tesselation and compared different tesselation options, namely Naif Voronoi, Bi-layer Voronoi, and Proximus Voronoi [101]. Xu et al. applied Thiessen polygons to approximate the service areas of the base stations and converted the visitation frequency to hexagons [30].

When working with administrative boundaries or selecting an area to analyze, the cells can be associated by the centroid/base station coordinates or the Voronoi polygons (see Section Selecting an Area). Lenormand et al. used an interesting approach approximating the administrative boundaries with the Voronoi polygons of the mobile network cells [83, Figure A2].

Social Media

There are studies that try to utilize data from social media as an alternative to, or along with [102], mobile network data. The before-mentioned directions are present in the social media based works. This section provides non-exhaustive examples. Galeazzi et al. geo-located Facebook data of 13 million users from France, Italy, and the UK and found that where Value Added per capita, and Population Density are high, resilience to mobility disruptions are higher [103]. Shepherd et al. analyzed the mobility trends in the United Kingdom (both domestic and international) during the pandemic, based on Facebook data [104]. They found differences in the mobility between central London compared to the rest of the UK, but Scotland, Wales, and Northern Ireland showed significant deviations from England.

Scholz et al. analyzed tourist flows in Austria [105]. Hawelka et al. evaluated global mobility patterns with geotagged tweets [106]. Jurdak et al. examined mobility via geotagged tweets posted in Australia [107], while Bokányi et al. analyzed commuting via Twitter in the US [56]. Using location data generated by the Foursquare app, Li et al. proposed a solution to forecast socioeconomic status by visiting Points of Interest (POI) [108].

Top