6. Commuting

Hungary is a typical capital-oriented country. Budapest is the political, economic, logistical, and cultural center of the country, where almost 18% of the population lived [118]. In 2017, the population of Budapest was 1749734, and the population of Pest county was 1247372. In the agglomeration of Budapest, 837532 people lived according to KSH [142]. The river Danube divides the city into the Buda (Boo-da) and the Pest (Pesh-t) side. The former is the supposed more fashionable area, and the property values are remarkably higher (see Figure 3.14 and Figure 8.1a).

Due to its central role, Budapest attracts a workforce from a relatively large area. This process makes contact between the capital and the surrounding settlements, called commuting. According to Kiss and Matyusz [13], commuting is the relation between two locations. The inhabitants of one source location travel to work to another location; this is called “out-commuting”. The target location receives a workforce that is called “in-commuting”. The loss of the source location is: (i) the out-commuter does not use the local resources, (ii) does not create value, (iii) loosens their relation, as it is partly relocated to the target location, and (iv) although it brings back income, that is partly spent elsewhere. On the other hand, the target location (i) gains human resources, (ii) can create more value in place, (iii) the local relations and society become stronger, and (v) the local consumption increases.

Pálóczi analyzed the country-wide commuting in Hungary, using the methods of complex network analysis, based on the data of census 2011 [143]. He considered settlements as nodes, and commuters as a directed edge between the nodes, then applied the disparity ($Y_{2}$) parameter [144], which was developed to measure the heterogeneity of weighted relations. If $Y_{2}$ is close to one, it means that one destination dominates the commuting.

Disparity of out-commuting, based on census 2011. Partial replot of [143, Figure 3].
Figure 6.1.: Disparity of out-commuting, based on census 2011. Partial replot of [143, Figure 3].

Figure 6.1 displays a replotted part of [143, Figure 3] to illustrate Pálóczi’s results in respect of the area discussed in this chapter. As Figure 6.1 shows, settlements next to Budapest are darker, meaning that the connections of these settlements are one-sided. Pálóczi demonstrated that the out-commuting dependency is not the greatest around Budapest, but around Győr and Székesfehérvár [143], but also high at the centers of the employment regions (e.g., chief towns of the counties).

Kiss and Matyusz state that1, although commuting is an important and common phenomenon, its measurement is occasional and inadequate [13]. Commuting is predominantly analyzed by the census data, but that is performed only once in a decade and thus cannot follow sudden but permanent changes. They also stress that commuting should be examined continuously, and its methodology should be established.

As questioning the population is a slow, tedious and expensive task, it would be obvious to automate the process with the available info-communication technologies (ICT). In this chapter, the application of CDR processing is proposed to examine commuting, mainly to Budapest, and the findings are validated by the results of studies that analyzed commuting using census data. In many cases, the findings are presented in a form as close to those results as possible to aid the comparison.

Home and Work Locations

To analyze commuting, the subscribers’ home and work locations have to be determined. As the CDRs are anonymized, do not contain information like a residential address. The workplace is even not mandatory for a subscription, the operators cannot have that information. The process of home and work location estimation is described in Section Home and Work Locations. After these locations have been determined, they should be validated. Although the applied approach is practically equivalent to what can be found in the literature, the validity of the results is hard to confirm. Pappalardo managed to validate the home location in the case of sixty-five subscribers [16], but this is not possible in this case. So, the settlement and — in the case of Budapest — district based population data [142] is applied from the KSH.

Figure 6.2, shows a comparison of the population between the KSH and the CDR data. When comparing the two maps, there are a few differences: Some parts of Budapest are not dark enough on the CDR map. Districts 16 and 17 seem not as populated as in the KSH data. Nevertheless, the divergent districts are on the Buda side: Districts 2 and 3. The difference may have ensued from the inaccuracy of the home detection in that area or simply from the different preferences in mobile operators of the inhabitants of the Buda Hills.

(a)
(b)
Figure 6.2.: Comparing the ground truth [142] (a) and the estimated population based on mobile network data (b).

Apart from this, the findings based on CDRs correlate with the statistical data; the Pearson’s R is 0.9213 ($R^2$ is 0.8488), counting every SIM cards. Figure 6.3a, shows this as a regression plot. As described in Section Device Types, the obtained Call Detail Records contain information (TAC) about the devices that use the mobile network. Based on this information, more than 300000 SIM cards have been identified that do operate other types of devices like cellphones (e.g., 3G modem), indicating that they do not represent people. Figure 6.3b, illustrates the correlation without these SIM cards. Although the population values are decreased, the correlation does not seem significantly affected: Pearson’s R is 0.9125.

(a)
(b)
Figure 6.3.: Correlation between the population of the agglomeration and the districts of Budapest based on Központi Statisztikai Hivatal, Hungarian Central Statistical Office and mobile network data. In left figure (a), all the Subscriber Identity Module card were used, in the right figure (v), Subscriber Identity Module card that certainly operate in non-phone devices are excluded.

Numerically, the CDR data often shows significant mismatches, but they are not easy to objectively compare. The available mobile network data originated only from one operator, which had about 25% market share in the observation period [109]. This market share is about the subscriptions, not the number of unique people. Furthermore, it also has to be noted that this ratio represents a nationwide value. As spatially more detailed market share is not available, it has to be supposed that Vodafone Hungary had the same market share in every subregion to make this comparison. Although this is unlikely, one-fourth of the population values can be used as a rule of thumb.

Work Locations

Along with the home locations, the workplaces are the most important element of the mobility and commuting analysis. During the COVID-19 pandemic, this has changed. As part of the social distancing directive, to slow down the spread of the disease, working from home has come to the fore. Presumably, the prevalence of home-office will be higher than before the pandemic, as both the employers and the employees got used to this situation, but many scopes of activities will still require a work location, so the importance of this topic will remain the same. However, as the data sources used in this work predates the pandemic, this question could only be answered in another work.

The workplaces have been determined, defined as the most frequent place where a subscriber appears during work hours. See Section Home and Work Locations for the details. For example, querying the work locations of the inhabitants of an area, a settlement can be the initial step of the commuting analysis. Figure 6.5 shows the typical workplaces of three selected settlements and one district of Budapest, using Gaussian kernel density plots, in two different versions: with (left column) and without (right column) those subscribers who work in their home settlement. When the local workers are included, the darkest areas are the selected area itself, as many people work in the vicinity of their homes.

Connectivity

As the origin and the destination of the commuting are determined, it is possible to build a network, for example, considering the districts of Budapest nodes that are connected by the commuters. Figure 6.4 shows the connections between the districts of Budapest. The Buda districts are placed to the left, whereas Pest districts are to the right, and the colors of the nodes match with the district groups in Figure 3.15a. The edges represent commuters between districts, removing self-links, and the weight of the edges denotes the number of commuters. The weight is expressed by colors, using darker colors for the stronger edges. The weakest links ($w < 250$) are omitted to improve visibility.

Extending this topic to the level of the agglomeration, or the country, could be another research direction: for example, to analyze the in-commuting and out-commuting. Pálóczi’s work [143] could serve as a census-based reference.

Connectivity between Budapest districts, using the home and work locations. A link originates from the home district to the work district, excluding the ones who where lives. The weak links are omitted to improve visibility. The district nodes are colored by the district groups.
Figure 6.4.: Connectivity between Budapest districts, using the home and work locations. A link originates from the home district to the work district, excluding the ones who where lives. The weak links are omitted to improve visibility. The district nodes are colored by the district groups.
Ecser
(a) Ecser
Ecser, without local workers
(b) Ecser, without local workers
District 5
(c) District 5
District 5, without local workers
(d) District 5, without local workers
Budaörs
(e) Budaörs
Budaörs, without local workers
(f) Budaörs, without local workers
Dunakeszi
(g) Dunakeszi
Dunakeszi, without local workers
(h) Dunakeszi, without local workers
Figure 6.5.: Using kernel density plots (with Gaussian kernel) to display the typical working locations for three selected settlements and a district of Budapest, with (a, c, e, g) and without (b, d, f, h) local workers.

Validation by Census

In order to verify the reliability and accuracy of the method proposed for the home and work location estimation, a comparative study is performed on the mobile network data and the information processed from the census. In Hungary, a census is obtained every ten years and a micro-census with a 10% corpus at halftime. The last census was performed in 2011, while the last micro-census was in 20162. Based on these surveys, commuting to Budapest (and generally in Hungary) is analyzed in studies like [13, 145, 143, 146]. These studies are used as the reference for comparing the results.

Figure 6.6 shows the comparison between the CDR and the census-based [146, Figure 1] traveling ratios of the commuters by the districts of Budapest and the home location category. People who work in Budapest are represented, and the home location can be (i) the same district where one works, (ii) another district of Budapest, (iii) the agglomeration, and (iv) other settlements outside the agglomeration.

(a)
(b)
Figure 6.6.: Comparison between the census based (a) [146, Figure 1] and the Call Detail Record (b) commuting ratios to the districts of Budapest, from the same district, other parts of Budapest, the agglomeration or out of the agglomeration.

Good agreement mostly within Budapest has been found on the proportions of the commuters. The most significant difference can be seen with the “outside agglomeration” category. This deviation, however, originated from the content of the data source, as the mobile network data used in this study covers mainly the area of Budapest and its close vicinity. It also contains phone activities from the surrounding county, but by moving away from Budapest, the available data decreases.

The fraction of workers who have their homes in the same district is very close to the census data in the outer districts (15–23) but generally overestimated in the core districts (1, 5–9) and the inner districts (2–4, 10–14). The workers from other district groups show the best match to the census data (where the CDR should have the best quality), while the agglomeration is somewhat overestimated in many districts.

The Agglomeration

In [145], there is a more detailed analysis regarding the commuting from the agglomeration that is divided into six sectors, and the commuting was examined by origin (home sector, occasionally by towns) and destination (district group of Budapest).

Figure 6.7 shows the commuters’ distribution in the districts of Budapest, from the six sectors of the agglomeration, based on the CDR evaluation. In representation, Figures 6.7a6.7f are analogous to [145, Figures 2–9], and show to which districts the inhabitants commute from the given sector of the agglomeration.

Northern Sector
(a) Northern Sector
Eastern Sector
(b) Eastern Sector
Southeastern Sector
(c) Southeastern Sector
Southern Sector
(d) Southern Sector
Western Sector
(e) Western Sector
Northwestern Sector
(f) Northwestern Sector
(g)
Figure 6.7.: Commuting from the six sectors of the agglomeration, based on Call Detail Record evaluation.

Lakatos and Kapitány analyzed the commuting tendencies of some settlements to the districts of Budapest between censuses: 1990, 2001, and 2011 [145]. The same analysis has been made using CDR processing, and six settlements of the 13 thoroughly analyzed of [145] are presented in this study. The results are summarized in Figure 6.8, compared with the censuses. It contains a settlement from every sector of the agglomeration, so it also serves as a more focused analysis of Figure 6.7. The location of the settlements, in relation to Budapest, is also displayed on small maps to give context to the findings. For example, from towns west to the capital, the most common commuting targets are the Buda-side and the inner districts, for example. Moreover, in many cases, the mobile network based findings, which are six years older than the last census, indicate a clear continuation of the previous tendency.

In the case of Budaörs (Figure 6.8a), albeit North Pest, outer Eastern Pest, and South Pest are not significant commuting destinations, census data show an increasing tendency, which is confirmed by the mobile network data. The CDR based results of South Buda and Inner Pest also fit the trend, but in an opposite tendency. The most considerable discrepancy lies in the cases of North Buda and the inner Eastern Pest district groups. The Pearson correlation coefficient, regarding all the six district groups, between the census 2011 and the mobile network data is 0.8976.

Dunakeszi (Figure 6.8b) is east of the River Danube and north of Budapest, which implies the dominance of North Pest as the commuting destination, although its importance has been decreasing over the last few decades, as well as Inner Pest. While South Buda, South Pest, and the outer Eastern Pest have an increasing tendency, the inner Eastern Pest and North Buda do not show such clear tendencies. The correlation coefficient (Pearson’s R) between the census (2011) and the CDR based results is 0.9416.

Vecsés is in the southeastern sector of the agglomeration, from where the majority of the commuters work in the inner and outer Eastern Pest, Inner Pest, and South Pest regions. North Pest and Buda was not a notable destination for the commuters, but the results show increasing trends (Figure 6.8c). The correlation coefficient, in the case of Vecsés, is 0.924.

Dunaharaszti is in the Southern sector of the agglomeration and east of the Danube. Consequently, the main destination of the commuters was South Pest. Besides that, Inner Pest received considerable in-commuters, but its importance seems to be decreasing. The rest of the district groups had roughly the same trends (Figure 6.8d). Dunaharaszti has the strongest correlation out of the examined settlements: Pearson’s R is 0.971.

Érd has the largest population (65857 in 2017 [142]) in the agglomeration and also in the Southern sector. The detected commuting ratios fit into the trends of the last three censuses, although Eastern and South Pest seem overestimated, and North Buda underestimated by the mobile network data based approach (Figure 6.8e). The correlation with the ground-truth is 0.8488 (Pearson’s R).

In the case of Szentendre (Figure 6.8f), the mobile network based results might show the most significant discrepancy. Still, the correlation coefficient (Pearson’s R) is 0.9127. As located in the Northwestern sector, west of the Danube, the most obvious destination for commuting is North Buda. According to census data, it had the most in-commuters, even with a slightly increasing tendency. However, the CDR based results underestimate it, whereas Eastern and South Pest seem overestimated. The result of Inner Pest lags behind the census the latest census data, but that fits into the trend.

These detailed results demonstrate the applicability of the CDR processing for commuting analysis. It would be interesting to compare these results with the next census. That would reveal how precisely these findings fit into the trend of the changing commuting customs of the population of the agglomeration.

Budaörs
(a) Budaörs
Dunakeszi
(b) Dunakeszi
Vecsés
(c) Vecsés
Dunaharaszti
(d) Dunaharaszti
Érd
(e) Érd
Szentendre
(f) Szentendre
Figure 6.8.: Commuting to the seven districts groups of Budapest from selected settlements of the agglomeration, comparing census (1990, 2001 and 2011) and mobile network data. Next to the legends, the location of the settlements in question is displayed in a map.

Demography

As the available mobile network data contains information about the age and the gender of the subscribers — in the case of the 66.17% and 70.76% of the subscriptions, respectively — the commuting trends can be studied by age-groups.

Koltai and Varró provide reference data for this analysis [146, Table 1]. Figure 6.9a shows the distribution of the commuters by age categories and the sector as the home location. Only those commuters were examined who work in Budapest.

It is not clear from the paper what is the upper limit of the “60+” age category. The people who usually go to work are assumed to be younger than 65 years old (the current retirement age in Hungary), although people can work over 65. In the CDR based figure (Figure 6.9b), the 60+ means over 60 and less than 100. However, there are not many subscribers over 70, only 2.48% of SIM card owned by people older than 70 years. Furthermore, it has to be noted that the age information belongs to the owner of the subscription, not necessarily to the actual user of the phone.

Comparing data obtained by the micro-census and the cellular information, good agreements (Pearson’s R is 0.8977) have been found on the trends and measures of the distribution of the commuters by age categories. The most significant difference between the census and the CDR based data are within the “60+” and the “50–59” categories. The number of people in their fifties seems underrepresented by the CDR data, while the “60+” category is overrepresented, that might be caused by the different interpretation of the upper limit. On the other hand, the values are very similar in the other categories. Based on the similarity of the results (Figure 6.9), it is confirmed that mobile network data can be a reliable method for commuting analysis even regarding the demographic features.

(a)
(b)
Figure 6.9.: Distribution of the commuters by age categories and the sectors of the agglomeration (%). Comparison between micro-census data [146, Table 1] (a), and mobile network data (b).

Describing Mobility

Up to this section, the mobility is described by only the home and work locations, and the commuting between these two locations was in focus. However, mobility is a much broader concept. It should contain all the subscribers’ activities, not only the work-related ones. There are some widely used indicators in the literature (Section Mobility Indicators) to characterize the mobility.

These indicators — namely Activity Location Number, Radius of Gyration, and Entropy — have been calculated (Section Calculating Indicators) for every subscriber in the data sets. It is important to note that the home-work (beeline) distance can also be a numeric indicator of commuting, but as the Radius of Gyration takes into consideration every location where a subscriber appeared, it can give more insight about the mobility.

The distribution of the distance between the home and work locations shows exponential characteristics Figure 6.12c. The majority of the population investigated have chosen workplaces closer than 5 km, and a relatively small number of people need to travel more than 10 km to work.

The radius of gyration (Figure 6.12a) shows normal distribution. However, it is significantly skewed to the right (seems to be similar to the observations performed in Boston [10]). The majority of the inhabitants have a relatively small Radius of Gyration; on the other hand, some subscribers need to travel significant distances [87].

The Entropy also shows the normal distribution. The majority of Budapest’s population visits plenty of locations (Entropy between 0.2 and 0.6), and besides a solid fraction of people who remain in their home’s neighborhood (Entropy less than 0.2), there are only a few who visit many places in the city. This is in agreement with the findings of other studies [87, 5, 10].

Budapest districts and the settlements of the agglomeration colored by the mean Radius of Gyration (km) for workdays. The tendency is that the farther someone lives from the city center, the more one travels. The white border denotes the administrative border of Budapest, and the Danube is displayed in green. Settlement without sufficient data is represented by gray.
Figure 6.10.: Budapest districts and the settlements of the agglomeration colored by the mean Radius of Gyration (km) for workdays. The tendency is that the farther someone lives from the city center, the more one travels. The white border denotes the administrative border of Budapest, and the Danube is displayed in green. Settlement without sufficient data is represented by gray.

Figure 6.10 and 6.11 show the average Radius of Gyration of the individuals whose home location is estimated to be in the given area. The former shows the Budapest districts and the settlements of the agglomeration, and the latter focuses on Budapest at a cell level, using Voronoi polygons as representation. The broad tendencies are the same: The farther one lives from the center, the more one travels, but the cell level map reveals some local city centers that make the image more nuanced. The impact of the local city centers on mobility trends could be a separate topic of investigation.

As the activity time series shows (Figure 3.3a), the mobile network data has a daily periodicity. Besides, there are also fewer activities on holidays. The quantity is one thing, but the quality is another. People do not just use less the mobile network on holidays, but their behavior is quite different. As for one, most of them do not need to go to work, so the usual commuting-related mobility patterns are not applied. This appears in the daily aggregated mobility indicators (Figure 6.13). People are active at a fewer number of locations (Figure 6.13c), which might be caused that they stay at home to rest, which is also supported by the entropy results (Figure 6.13b).

The activities of the people during working days are significantly higher than on the weekends. In addition to, higher values of Gyration Radius, Entropy, and the number of active locations have been recorded on Fridays, which implies that the last working day of the week has a sort of privileged role (Figure 6.13). On the weekends, more activities and more vibrant travels can be observed on Saturdays, and many of the dwellers rest on Sundays.

Limitations

The evaluation and the validation have been performed based on the results of other studies that analyzed commuting based on census data. With direct access to the statistical data from KSH and other sources, more and finer aspects of the validation could be performed.

Conclusion

In this chapter, the evaluation of the subscribers’ home and work locations are presented, and the results were compared to the ground truth. Though the detected population numerically differs from the actual population, the distribution across the settlements shows a strong correlation. It can be explained by the fact that the CDRs were obtained from only one mobile network operator.

Based on the home and workplace detection, it was demonstrated that mobile network data could be an effective solution for commuting analysis. The findings are presented in a form as close to the results of other studies that examined commuting in the agglomeration of Budapest as possible to aid the comparison.

It was examined to which district people commute from the sectors of the agglomeration. In the case of some selected settlements, the destination districts of the commuters are also presented in contrast to the last three censuses. It was found that mobile network based results fit into the three-decade tendencies. The commuters were also analyzed by age groups, which also agree with the census-based studies.

These results confirm that mobile network data is capable of commuting analysis. Using activity records from all the operators of a country, a more precise and representative analysis could be performed. Given the fact that mobile networks are available in the most populated areas, mobile network data should be standardized for statistical and sociological usage while respecting privacy and personal data.

Cell voronoi polygons colored by Radius of Gyration (km), dark green polygons represent cell without enough data.
Figure 6.11.: Cell voronoi polygons colored by Radius of Gyration (km), dark green polygons represent cell without enough data.
Radius of Gyration
(a) Radius of Gyration
Entropy
(b) Entropy
Number of Activity Locations
(c) Number of Activity Locations
Figure 6.13.: The differences between workdays (light brown) and holidays (green) are clear. Orange columns represent the holidays, April 14, 2017, was Good Friday and April, 17 2017, was Easter Monday that are holidays in Hungary.

  1. In respect of Hungary, at least. ↩︎

  2. The next census should have been performed in 2021, but it was postponed due to the COVID-19 pandemic. ↩︎

Top