When considering any image, beyond seeing it as a container of objects, among other things, it is natural for a human being to give it meaning or to infer the explanation of some event of interest captured in it, but how can such an inference be reached through artificial intelligence? Causal inference can be applied in many areas of science and technology, such as economics, epidemiology, image processing, and autonomous driving, which are areas where it is crucial to make accurate decisions. Currently, there are widely studied methods that, through correlation, recognize and classify objects using datasets such as (Deng
et al.,
2009) which has sufficient size and information to ensure high accuracy in such tasks (Zeiler and Fergus,
2014). However, in the last decade, as pointed out by Saeed and Omlin (
2021), explainable AI (XAI) has been proposed to respond to the need raised by important contributions in artificial intelligence, which have led to an increasing complexity of algorithms and lack of transparency of models, and to advance the adoption of AI in critical domains. Then, to obtain the explanation we are looking for about an event captured in an image, we would have to consider causal relationships that can either be inferred through expert knowledge (Martin,
2018) or intervene such data sets through experimentation, as indicated in He and Geng (
2008), taking into account that, in probabilistic language, not having a way to distinguish between giving value to a variable and observing it, prevents modelling cause and effect relationships (Perry,
2003). Thus, taking modelling as an essential step to achieve causal inference, Xin
et al. (
2022) discusses the role of causal inference to improve the interpretability and robustness of machine learning methods, and highlights opportunities in the development of machine learning models with causal capacity adapted for the analysis of mobility considering images or sequential data. In the punctual case on causal inference applied to images, Lopez-Paz
et al. (
2017) propose to use neural causality coefficients (NCCs) that are calculated by applying convolutional neural networks (CNNs) to the pixels of an image, so that the appearance of causality between variables suggests that there is a causal link between the real-world entities themselves, Lebeda
et al. (
2015) have proposed a statistical approach – transfer entropy – to discover and quantify the relationship between camera motion and the motion of a tracked object to predict the location of the tracked object, Fire and Zhu (
2013) presented a Bayesian grammar (C-AOG) model for human-perceived causal relationships that can be learned from a video, and Pickup
et al. (
2014) use the causality method, supplemented with computer vision and machine learning techniques, to determine whether a video is playing forward or backward by observing the “arrow of time” in a temporal sequence.
This paper consists of four sections, in Section
1 we state the motivation of the study, present some antecedents that have made important contributions in the area of causal inference applied to images and define our contribution as a starting point to address a problem area already detected by several authors. In Section
2, we present the method used to generate the data, define the causal model and validate it with the NOTEARS algorithm, and then query the model by means of interventions. In Section
3, we analyse the results obtained in the applied causal discovery and causal inference processes and, finally, in Section
4, we conclude that the graphical representation of a causal model makes it simpler to understand the problem, although for the validation using NOTEARS we had to make restrictions based on expert knowledge. Likewise, we recognize the importance of the structure of the dataset for causal inference in contrast to the structure of a machine learning dataset and, finally, thanks to the interventions and queries of the causal model, it was possible to deduce, with a high level of certainty, the cause of the shadow projection.
1.1 Related Work
Regarding the explainability of events or phenomena captured in an image or video, taking modelling as an essential step to achieve causal inference, Xin
et al. (
2022) discusses the role of causal inference to improve the interpretability and robustness of machine learning methods, and highlights opportunities in the development of machine learning models with causal capability adapted for mobility analysis considering images or sequential data. In the punctual case on causal inference applied to image analysis, Lopez-Paz
et al. (
2017) propose to use neural causality coefficients (NCCs) that are calculated by applying convolutional neural networks (CNNs) to the pixels of an image, so that the appearance of causality between variables suggests that there is a causal link between the real-world entities themselves, Lebeda
et al. (
2015) have proposed a statistical approach – transfer entropy – to discover and quantify the relationship between camera motion and the motion of a tracked object to predict the location of the tracked object, Fire and Zhu (
2013) presented a Bayesian grammar (C-AOG) model for human-perceived causal relationships that can be learned from a video, and Pickup
et al. (
2014) use the causality method, supplemented with computer vision and machine learning techniques, to determine whether a video is playing forward or backward by observing the “arrow of time” in a temporal sequence.