Eskişehir Technical University Journal of Science and and Technology A- Applied Sciences and Engineering, cilt.26, sa.4, ss.399-415, 2025 (TRDizin)
Urban perception is a multidimensional phenomenon reflecting individuals’ evaluations of the urban environment and playing a critical role in planning and design processes aimed at improving quality of life. This study aims to predict six different themes of urban perception (beautiful, boring, depressing, lively, safe, wealthy) from street view images using regression-based deep learning methods. Three different deep learning architectures—ResNet18, VGG19, and EfficientNet-B1—were employed. The Place Pulse 2.0 dataset was utilized in the modeling process, with approximately 110,000 labeled street images processed through necessary preprocessing steps (resizing, cropping, tensor conversion, and normalization). Models were trained with an 80% training and 20% validation split. Performance evaluation was conducted using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R2 and validation loss graphs. Findings indicate that the EfficientNet-B1 model achieved the lowest error values, particularly in the “safe” and “lively” themes, while the ResNet18 model offered more balanced and stable performance in terms of validation loss. The VGG19 model generally yielded higher error rates and exhibited a clear tendency toward overfitting. It was observed that theme-specific visual complexity directly affected model performance. In conclusion, while deep learning architectures prove effective in modeling urban perception through visual data, both the choice of architecture and the inherent nature of the theme play decisive roles in model performance. This study highlights the importance of architecture- and theme-sensitive model design in AI-supported analysis of urban perception.