2 The Utility and Danger of Big Data

Urban Informatics is, in a way, a byproduct of the “deluge” or “proliferation” of data. As we as a society have progressed technologically, we have been able to capture and store data on an unprecedented scale. This has led to massive stores of data that are used primarily for record keeping, and are updated at near-real time. Every tweet or Facebook post is subsequently recorded in a remote database to ensure that it can be accessed at a later time. Dan O’Brien notes that this characteristic of big data is of the utmost consequence for “the advancement of science and policy in the digital age.”1 Since these data are capturing every day behavior we consider them “naturally occurring.” Naturally occurring data are so useful because they are essentially a track record of individuals’ behavior over time. This is as close as we can get to measuring behavior in real time.

In the urban context, the importance of big data becomes ever more apparent. The city government of Boston, for instance, has been keeping detailed records of property assessments and tax debts since the late 18th century, including the demographic characteristics of debtors, and even their locations at the ward level. These administrative records were kept on ink and paper until just a few decades ago. Through digitization efforts, these data are now accessible to historians, urban scholars, local government, and the general public. Having such data accessible provides a way to quantitatively inspect the development of the city from its geography, its policies, its demography, and much more.

There are a number of benefits that naturally occurring data provide. The first is that these are, in theory, comprehensive and contains information about all residents. Through administrative data we should, for example, be able to determine the number of employed tax paying citizens, as well as the underemployed who receive government benefits. Additionally, since these data are already being collected, the associated costs are minimal. In contrast to empirically collected data, administrative data are not just a representation of a single moment in time, but rather continually changing and updating. And due to the fact that administrative data are collected at the municipal level we are inherently dealing with geospatial data—data associated with a location.

While there are many benefits to administrative big data, there are dangers, too. The first is that even though big data are comprehensive in theory, we cannot always take them as objective observations of the natural world. We must be cognizant of the fact that the biases that humans have are also represented in data. We cannot and should not separate theory from data. To take from Dan O’Brien

“. . . the very point of science is to explain why things work the way they do. . .If we limit our inquiries only to correlation and eschew explanation, we are no longer conducting science.”2 — Daniel T. O’Brien

While using big data present some dangers, we ought not discard it entirely. To cope with these dangers we must be cognizant.