Spatial Cognition and Block Building



Spatial cognition refers to a set of skills that we use every day to think about objects moving in the world, the relationship between objects in space and the way we navigate through the world. Spatial cognition, especially early in development, predicts later spatial and mathematical skills, especially those related to the disciplines of science, technology, mathematics and education (STEM) fields. We use block building, an accessible skill for young children, as a window into understanding how this complex spatial skill develops, how it is linked to academic learning more generally and how it can be nurtured, moving children from “novice” to “expert” builders.


Study Goals

This effort is a cross-departmental collaboration between CIRL (computer science), the Language and Cognition Lab (cognitive science), and the school of education. By studying block building, we hope to

  • Understand the cognitive processes underlying spatial cognition
  • Design interventions to improve children’s spatial reasoning skills
  • Develop computer vision systems capable of parsing assembly processes automatically


Representing Block Play

Illustration of a block assembly process, represented as a graph. As blocks are incorporated into the assembly, edges are added to the graph.

To develop hypotheses and test them quantitatively, we need a formal representation that captures the precise details of block assemblies. In (Cortesa et al., 2017) and (Jones et al., 2019), we describe a graph-based data structure which represents block assemblies and models how they are altered by constructive or deconstructive actions. This representation is the end goal of the project’s computer vision facet, and the starting point for scientific analyses.


Parsing Assembly Processes

Our data collection platform includes an RGBD sensor and eight IMU-instrumented DUPLO blocks.

During behavioral experiments, we record data from multiple modalities: RGB-depth video and inertial (acceleration and angular velocity) measurements. In (Jones et al., 2019) we cast the problem of estimating block assemblies from these recordings as an instance of spatial assembly parsing, which has broader applications in collaborative robotics, industrial monitoring, and information retrieval. This task presents research opportunities in several areas of computer vision, machine learning, and signal processing:

  • Time-series inference from video and inertial measurement sequences
  • Occlusion-robust object recognition and tracking
  • Joint action recognition and state estimation


Cognitive Foundations in Spatial Construction

We visualize the states created by participants as a graph, where each edge represents a transition from one state to another. Thicker edges represent more frequent transitions. The goal state is highlighted in green, and an error state is highlighted in red.

In (Cortesa et al., 2017; 2018) we examine how adults and children perform this task, especially the common construction paths that most people follow. We find that both children and adults follow the same highly constrained construction paths, despite the incredibly numerous possible correct construction patterns. Specifically, most people construct their models in horizontal layers, building from the bottom-up.  However, young children make many errors in their paths, and both adults and children show variation in the specific block placement actions within the scheme of bottom-to-top construction paths. This finding has spurred new research questions:

  • Will participants build differently if the target model resembles a recognizable object with functional units or parts? What if the target model doesn’t naturally lend itself to being constructed in horizontal layers?
  • Does how a child builds with blocks (such as the smoothness or fluidity of their motions) relate to what they are able to build?
  • How might a naive observer differentiate between more- or less-skilled builders?


Barbara Landau

Amy Shelton

Greg Hager

Sanjeev Khudanpur

Cathryn Cortesa

Anand Malpani

Jonathan Jones

Mingyu Yang
Saena Sadiq Ana Fahey

Project Alumni:

Kwang Bin Lee
Hannah Manley
Harry Burke
Michael Lepori
Kelly Zhang
Wendy Wen
Anney Tuo
Lauren Maytin



Jones, J. D., Hager, G. D., & Khudanpur, S. (2019) Toward Computer Vision Systems That Understand Real-World Assembly Processes. IEEE Winter Conference on Applications of Computer Vision (WACV). (To appear)

Cortesa, C. S., Jones, J. D., Hager, G. D., Khudanpur, S., Landau, B., & Shelton, A. L. (2018) Constraints and Development in Children’s Block Construction. CogSci 2018 Proceedings, 246-251.

Cortesa, C. S., Jones, J. D., Hager, G. D., Khudanpur, S., Shelton, A. L., & Landau, B. (2017). Characterizing spatial construction processes: Toward computational tools to understand cognition. CogSci 2017 Proceedings, 246-251.






Comments are closed.