A Closer Look at CaST (Q&A Format)

Yutong Xia | Jun 5, 2025 min read

In our NeurIPS 2023 paper “Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment”, we introduce CaST, a causal framework designed to tackle two key challenges in spatio-temporal graph forecasting: temporal distribution shifts and dynamic spatial causation. We are truly grateful for the encouraging attention and feedback we’ve received from the community.

To help readers better understand the motivations and technical choices behind CaST, this post presents a Q&A-style deep dive into some of the most commonly asked questions.


Q1: What core challenges in Spatio-Temporal Graph (STG) forecasting does CaST address?

A: Traditional Spatio-Temporal Graph Neural Networks (STGNNs) often face two major hurdles when making predictions:

  1. Temporal Out-of-Distribution (OoD) Issues: Model performance can degrade significantly when the temporal patterns in the test data (e.g., due to holidays, sudden events, or data shifts) differ from those in the training data.
  2. Modeling Dynamic Spatial Causation: In the real world, the spatial influences (causal relationships) between nodes are dynamic and can change over time, rather than being static. Capturing this dynamism is challenging for many existing models.

Our CaST framework is designed to tackle these two core challenges using causal inference principles.


Q2: What’s the core idea behind CaST? How does it use ‘causality’ to solve these problems?

A: The central idea of CaST is to first build a Structural Causal Model (SCM) to understand the data generation process of STGs. Then, we leverage back-door adjustment and front-door adjustment from causal inference to handle temporal and spatial confounding factors, respectively.

  • For Temporal OoD: We designed an “environment disentanglement block” that uses back-door adjustment to isolate the effect of temporal environmental factors (like temperature or pressure) on the target variable, thereby reducing interference from environmental shifts.
  • For Dynamic Spatial Causation: We utilize front-door adjustment combined with “edge-level convolution” (specifically using the Hodge Laplacian operator) to model the “ripple effect” of causation between spatial nodes, allowing us to better capture dynamic spatial dependencies.

Q3: Regarding the Hodge Laplacian (HL) operator in CaST, what’s its specific role, and why choose it over other GNN components?

A: The HL operator is primarily used in our HL Deconfounder module to approximate spatial causal influence through a structured representation of edge-level signal propagation. While we do not claim to fully recover ground-truth causal relationships dynamics, we leverage the HL operator to better approximate how causal effects may propagate across space - modeling the ripple effect of causation between nodes.

We select the HL operator for its unique advantages in edge-centric processing. Unlike most GNN architectures that operate on nodes, the HL operator intrinsically operates at the edge level. In many physical or networked systems (e.g., traffic, pollutant dispersion), spatial influence is more naturally represented as directional flows between nodes. By modeling edge signals, HL allows us to better align with this flow-based notion of causation.


Q4: Regarding the assumption that ’temporal environment’ and ‘spatial context’ are decoupled, is it always valid in reality, considering factors like weather that can affect both?

A: We acknowledge that certain factors (like weather) can indeed possess both spatial and temporal attributes simultaneously. In our framework:

1. Different Focus: We are concerned with the impact these factors have on specific objects (e.g., traffic flow, air quality readings) at particular locations and times.

2. Generality of Variables: The “Temporal Environment (E)” and “Spatial Context (C)” defined in our SCM are not meant to be concrete sets of explicitly enumerated factors. Instead, they act as latent, generalized variables intended to encapsulate broad temporal or spatial effects.

3. Practicality and Simplification: Treating temporal and spatial effects separately to some extent is a common practice in many mainstream spatio-temporal models (e.g., GraphWaveNet, STGCN). This helps optimize computation, reduce model complexity, and has proven effective in practice.

This decoupling assumption is an approximation, but one that has proven effective in both prior literature and our empirical results.


Q5: How is ‘causal strength’ defined in CaST, and how is it learned?

A:

  • Definition: ‘Causal strength’ refers to the magnitude of the causal effect between a cause and its outcome. It measures how alterations in one variable directly impact another. A stronger causal strength indicates a clearer, more direct effect.
  • Learning: In the CaST framework, the strength of spatial causality, for example, is implicitly learned from the data by the model (e.g., through the edge convolution operations in the HL Deconfounder) during the training process. The model’s weights are optimized via backpropagation, enabling it to identify and quantify the strength of these edge-based influences. The entire CaST model is data-driven.

We hope this Q&A has clarified the motivation, assumptions, and technical designs behind CaST. If you have further questions or thoughts, feel free to reach out : )

🔗 Code: You can explore the CaST codebase for implementation details, datasets, and example use cases.

📚 More papers on Causality × Spatio-Temporal Data: If you’re interested in further exploring the intersection of causal inference and spatio-temporal data, check out this curated collection: Causality meets ST Data – Awesome Papers. Contributions are welcome!