top of page

Can AI mirror human perception? Evaluating urban street images with ChatGPT

The interplay between urban environments and their inhabitants' perceptions is a subject of considerable significance in social science research. Perceptions of unsafety and disorder have been linked to an array of social challenges, including elevated crime rates, hindered educational opportunities, compromised health, and reduced mobility. These factors have a profound ripple effect, influencing variables such as property values.

Urban planners and social scientists have long been intrigued by the elements within urban landscapes that mold public perception. A groundbreaking study by the MIT Media Lab (Dubey et al., 2016) made strides in this area by amassing an expansive dataset of pairwise comparisons of street images, providing valuable insights into the visual cues that contribute to perceptions of safety.

The advent of sophisticated artificial intelligence models, such as Large Language Models (LLMs) like ChatGPT, presents new opportunities for analyzing and understanding these urban dynamics. By simulating the nuances of human perception, LLMs can offer a scalable method for evaluating urban images. We employed this method to gauge the perceived safety of various streets, using a Likert scale ranging from "Very bad" to "Very good." The objective was to assess whether an AI could effectively mirror human judgment in this context.

The analysis began with images from the MIT study, where it was observed that streets perceived as least safe commonly featured signs of disrepair, such as unfinished construction, and were often marred by graffiti. While some variance was evident in the assessments—attributable to the subjective nature of safety perception—the LLM provided detailed rationale for its ratings, showcasing its capability to parse complex visual information and articulate its deductions, much like a human analyst.

Figure 1. the image from the paper "Deep Learning the City". Source: Dubey et al., 2016

Figure 2. Low safety street view image

Further testing involved street views from New York City, revealing interesting contrasts. For example, a street scene adjacent to an exposed subway line (Figure 3), which potentially impedes the flow of pedestrians and diminishes the area's vitality, was rated between “Neutral” and “Bad.” Meanwhile, a view of a street near Washington Square Park, characterized by meticulous maintenance and attractive landscaping (Figure 4), received a “Good” rating from the AI.

Figure 3. randomly selected street view 1

Figure 4. randomly selected street 2

However, this street image-based analytics unveiled methodological constraints. The perceptual rating can be sensitive to the angle and scope of the image, as seen in the variation between Figures 4 and 5. Both depict the same area near Washington Square Park but yield different safety ratings due to the presence of scaffolding in one of the images.

Figure 5. the left side of the random street 2 showing scaffolding


The findings underscore the potential of LLMs in this kind of urban analysis. Subsequent analysis will expand on this foundation by evaluating a broader selection of street views across 2 or 3 distinct neighborhoods and find out the scalability of this approach.



Dubey, A., Naik, N., Parikh, D., Raskar, R., & Hidalgo, C. A. (2016). Deep learning the city: Quantifying urban perception at a global scale. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 196-212). Springer International Publishing.

Never miss an update!

Subscribe to get all the latest news and exclusive analyses, delivered directly to your inbox. We'll only send occasional emails, typically no more than once a month! To experience our product firsthand, hit the button located in our menu bar above to schedule a demo anytime.

55 views0 comments


bottom of page