1 Introduction
Neural networks excel at capturing complex nonlinear relationships inherent in electricity demand data (Gifalli et al., 2024), effectively modelling consumer adoption behaviours, seasonal patterns (Román-Portabales et al., 2021), and environmental influences. This paper demonstrates how neural network modelling techniques, incorporating parameters that explicitly reflect electricity adoption dynamics, can significantly reduce demand estimation error. Such improvements are critical to advancing electrification efforts in isolated regions, ensuring reliable and sustainable energy access, and realising the transformative impact of electricity provision on local development.
Electrification, defined as providing electricity to populations who previously lacked access to a primary energy source, presents significant challenges. However, supplying electricity to isolated areas is crucial for improving quality of life, enhancing access to services like lighting, communication, healthcare, and education, and driving economic development by enabling businesses and job creation.
Where power lines are not available, microgrids provide a solution (Shufian and Mohammad, 2022). A microgrid integrates energy production technologies, such as solar photovoltaic cells, energy storage solutions (Hirsch et al., 2018), transmission lines, substations, and distribution networks, to control and manage electricity flow. These systems address electricity needs in isolated areas (Barelli et al., 2019) while promoting clean energy adoption.
The optimal design and operation of microgrids depend on electricity demand models (Ma and Zhai, 2019; Mir et al., 2020). These models must balance the risks of overestimating demand, leading to unnecessarily large systems (Mikita et al., 2024), and underestimating it, which could result in inadequate systems that fail to meet demand (Sanfilippo et al., 2023).
Key considerations in electricity demand forecasting include temporal resolution. Forecasts can be made at various intervals (Chicco and Mazza, 2020) such as 15 minutes, hourly, daily, or yearly (Chung and Jang, 2022; Kaur and Kaur, 2016). Minute-by-minute resolution is ideal for real-time control, while hourly resolution offers a practical balance, useful for optimising generation and managing renewable energy integration. Daily models are valuable for long-term planning and strategic decisions related to infrastructure and investment.
User behaviour is another critical factor shaping electricity demand (Lazzari et al., 2022). Exceptional events (holidays, sporting events) and weather conditions (Wassie and Ahlgren, 2023) significantly influence demand patterns (Yukseltan et al., 2020), and seasonal variations provide a structured framework for demand modelling (Fan et al., 2024). As the population’s electricity needs evolve (Venkataramanan and Marnay, 2008), existing users may increase electricity demand as they acquire more appliances, known as the degree of electricity adoption (Agrawal et al., 2020). At the same time, new households are added to the grid, further increasing overall demand. This dynamic nature of demand underscores the need for adaptive models that account for fluctuations in demand during periods of adoption (Jaramillo et al., 2024).
Another significant challenge in electricity demand modelling is the limited availability of reliable data (Morales et al., 2024; Huang and Zhu, 2016). Addressing this requires models that balance data limitations while ensuring low estimation error.
This paper proposes a novel approach to electricity demand modelling in microgrids, employing neural network techniques explicitly designed to capture complex nonlinear dynamics. By integrating a technological adoption parameter, our approach effectively represents the evolving patterns of electrification in isolated areas. Leveraging real-world data, the proposed methodology adapts robustly to dynamic demand behaviours, addresses challenges related to limited datasets, and accommodates varying temporal resolutions. Our approach, by significantly reducing the error associated with electricity demand estimation, provides insights to effectively support electrification strategies, foster sustainable development, and enhance energy access in isolated regions.
2 Electricity Demand Methods
Several methods have been developed to model electricity demand (Baba, 2021). The literature presents various classifications of these methods (Verwiebe et al., 2021). Depending on the technique, methods can be grouped into statistical analysis, artificial neural networks, metaheuristics, stochastic processes, fuzzy logic, grey systems theory, or engineering-based approaches.
Modelling methods can also be classified based on the type of data they utilise: causal methods and historical data-based methods. Causal methods examine the cause-and-effect relationship between electricity demand (the output) and various input variables, such as economic, social, and climatic factors. Conversely, historical data-based methods use past values to estimate future electricity demand, linking factors such as humidity to electricity demand. Hybrid methods, which combine elements of causal and historical approaches, offer an area for further exploration (Ghalehkhondabi et al., 2017).
Numerous studies have explored the development of electricity demand models, consistently emphasising the importance of data (Liu et al., 2023). Comprehensive data are crucial for building reliable models, as they enhance the ability to identify patterns, trends, and causal relationships within the electricity demand landscape. These studies collectively highlight that the robustness of demand estimation models is directly linked to the depth and breadth of the data available, including historical records, weather conditions, economic indicators, and other relevant factors. Consequently, efforts in data collection, cleaning, and integration are fundamental to improving the operational effectiveness of electricity demand models.
Existing models often fail to account for the transient nature of electricity demand, which evolves as users’ lifestyles change. Most studies focus on long-term estimation and do not dynamically adjust to shifts in user behaviour or the progressive adoption of electrical appliances. Incorporating behavioural aspects into energy system models (Huckebrink and Bertsch, 2021) is essential for understanding the socio-technical transformation of energy systems. A significant challenge in electricity demand modelling lies in effectively addressing consumer behaviour, while also integrating multiple spatial and temporal resolutions, managing uncertainty, and incorporating multi-energy systems (Fodstad et al., 2022).
For instance, in Stevanato et al. (2020), a long-term optimisation model is developed that considers the evolution of electricity demand, aiding informed investment decisions for capacity expansion over a defined time horizon. Another study presents a novel method for the optimal design of grid-connected microgrids based on long-term electricity demand forecasting (Faraji et al., 2020). Applied to a real microgrid in Tehran, Iran, this method uses HOMER software to address the gap in research on multi-year electricity demand growth-based optimal planning and design of microgrids. The study analyses the impacts of annual electricity demand growth through various scenarios and methods. However, these studies do not fully capture the dynamic nature of changing electricity demand patterns.
This work aims to provide an electricity demand model that can aid in optimising the design and operation of microgrids in isolated areas. Our focus is on addressing the challenges of modelling transient demand and dealing with potentially sparse data. The ability to function with sparse data makes the model particularly suitable for isolated areas, where data collection may be intermittent. In addition, this research extends our ongoing efforts to incorporate artificial intelligence techniques for optimising power grids (Evora et al., 2015; Evora-Gomez et al., 2015).
3 Hypothesis Statement
Modelling electricity demand in isolated regions presents significant challenges, particularly due to the nonlinear dynamics inherent in consumption patterns and the evolving nature of electricity adoption during the electrification process. Neural networks are particularly well-suited to address these issues, as they are capable of capturing complex nonlinear relationships in data and dynamically adapting to changing trends (Gifalli et al., 2024). Given the additional constraints posed by scarce or incomplete datasets, Artificial Neural Networks (ANNs) offer a robust and flexible estimation solution, effectively representing both the nonlinearities and the progressive nature of electrification, thus enabling precise and actionable estimates to support sustainable energy access.
H1: Incorporating a technological adoption parameter is necessary to effectively capture the evolving patterns of electricity usage and demand dynamics.
H2: Neural network architectures explicitly designed to model nonlinear relationships provide lower error and higher computational efficiency compared to traditional architectures.
Unlike traditional methods that rely on linear approximations, ANNs, modelled after biological neural systems, excel at capturing non-linear interactions. This enables them to uncover hidden patterns and emergent behaviours arising from complex interactions among individual agents (Ogunmolu et al., 2016; Ha and Jeong, 2021). Its generalisation capacity allows ANNs to discern underlying patterns in datasets and apply them to unseen data, particularly when new data (e.g. demand in a different location) resembles training data (Norouzi et al., 2019). This strength makes them particularly suited for electricity demand, where multiple interacting factors create complex patterns. Previous studies demonstrate their effectiveness in forecasting demand (Saravanan et al., 2012).
To ensure precision, the model decomposes estimations into temporal components, accounting for the hour of the day and the month of the year. This approach allows the ANN to capture localised patterns, enhancing the model’s reliability in identifying seasonal and hourly variations. By linking estimations to specific times, even sparse data can inform demand estimations. For instance, data specifying demand for a particular hour on a given day reveals patterns associated with that time, contributing valuable insights to the model’s training process, even when datasets are limited.
The model also analyses user behaviour and considers causal relationships between external factors influencing electricity demand. Furthermore, ANNs can be retrained or fine-tuned to adapt to evolving demand patterns as new data becomes available (Tajbakhsh et al., 2017). Continuous daily training allows the model to incorporate recent data, reflecting changes in user behaviour and external conditions in real time.
The proposed causal model outputs hourly power demand per person (kW/person) using the following input factors:
-
• Month of the Year: Seasonal changes affect electricity demand due to variations in weather and daylight hours.
-
• Hour of Day: Demand fluctuates throughout the day, with peaks typically in the morning and evening.
-
• Weekday vs. Weekend: Activity patterns differ, with weekends generally involving higher household energy usage.
-
• Temperature: Influences the use of appliances like refrigerators, which work harder in higher temperatures.
-
• Humidity: Affects perceived temperature, prompting increased use of cooling devices.
-
• Degree of Adoption: Reflects the extent to which a community relies on electricity for daily activities, such as refrigeration and communication.
Despite its potential, the ANN model faces challenges. Overfitting may occur if the input parameters are not carefully managed, reducing its ability to generalise to new locations. The model depends on the quality and diversity of available data; sparse or unrepresentative data could lead to suboptimal estimations. In the field of ANNs, high-quality data are particularly essential, as the training process relies heavily on extensive and detailed datasets (Hu, 2017). Additionally, societal and technological changes, such as the adoption of new energy technologies, may not be immediately reflected in training data.
If successful, this approach could significantly enhance electricity demand estimation, enabling optimal microgrid design and operations. Dynamic estimations would improve microgrid design and real-time management, enhancing resilience and stability by providing deeper insights into demand dynamics (Shankar et al., 2018).
4 Experimental Work
The experimental work consisted of several phases, including data collection and preparation, data analysis, architecture definition, training, and validation.
4.1 Data Collection and Preparation
The data used in this study corresponds to El Espino in Bolivia and was taken from the GitHub repository (Balderrama Subieta, 2022), over a period from 1 January 2016 to 31 July 2017, covering 578 days of recorded measurements. The dataset includes data from 128 households, a hospital, a school, and street lighting. The electricity demand data from El Espino was captured at 5-minute intervals. Given that the primary objective of the study is to create an electricity demand model on an hourly basis, the measurements for each 5-minute interval were averaged to form hourly data points. This process involves averaging all the values recorded within a specific hour by dividing the sum of these values by the total number of measurements taken during that hour.
In ANN architectures, the representation of cyclical variables such as the hour of the day and the month of the year is crucial. These parameters exhibit a circular nature where values wrap around cyclically. For instance, 22:00 is as close to 23:00 as 23:00 is to 00:00, or November is as close to December as December is to January. To effectively capture this cyclical relationship, these parameters are represented using Cartesian coordinates:
where i is the current value of the cyclical variable (e.g. the hour of the day or the month of the year), and ${i_{\text{max}}}$ is the total number of possible values for the variable (e.g. 24 for hours or 12 for months).
(1)
\[ x=\cos \bigg(\frac{2\pi i}{{i_{\text{max}}}}\bigg),\hspace{2em}y=\sin \bigg(\frac{2\pi i}{{i_{\text{max}}}}\bigg),\]Figure 1 shows both the linear and Cartesian representations of months. The top row represents the linear representation, where each month is sequentially ordered. It can be observed that the distance between months does not account for their cyclic nature. For example, the distance between January and December is shown as 11 when it should actually be 1. Conversely, the bottom row displays the Cartesian projections, maintaining equal distances between consecutive months.
Similarly, this method is applied to the hour of the day, using 24 points instead of 12. As a result, each of these parameters (Month of the Year and Hour of the Day) is represented by two values instead of one, helping the ANN model learn the inherent periodicity of the data, thereby improving its ability to estimate and understand time-dependent patterns. Additionally, a boolean variable was included to indicate whether a given day in El Espino is a weekend or not.
Another input parameter was introduced to account for the degree of adoption of electricity within the community. This parameter is a decimal value ranging from 0 to 1 that captures the gradual increase in electricity demand over time, reflecting both the economic constraints and the progressive realisation of electricity’s benefits in this area. Initially, not all inhabitants could afford a comprehensive set of electrical appliances. However, as time progresses and the advantages of electrification become more apparent, the degree of electricity adoption naturally increases.
The adoption process was modelled using the logarithmic adoption model
where the equation represents the degree of electricity adoption transitioning from 0 to 1 as t evolves.
Here, t is the time elapsed since the introduction of electricity, m and n are parameters that must be adjusted, and $a(t)$ represents the degree of adoption at time t. This equation is a specific form of a logarithmic transformation applied to a growth model. It captures how the adoption of a new technology initially grows exponentially and then slows as the community reaches saturation. This model is particularly useful for representing the S-shaped adoption curve characteristic of many technological adoptions (Shukla et al., 2015).
To determine the parameters m and n for calculating the degree of adoption at each point in time, a regression method was employed using the El Espino data. This method involves finding the best-fitting parameters that minimise the differences between the observed data and the values estimated by this logarithmic model. The estimated values of the parameters are: $m=0.1253$, $n=-0.1143$. By doing so, the underlying trend of how households adopt electrical devices was captured, allowing the adoption process to be parameterised effectively and the degree of adoption to be calculated.
Additionally, to enhance the causal model, temperature and humidity variables were incorporated. This data was retrieved from Open Meteo (Zippenfenig, 2023) and aligned with the electricity demand data from El Espino in terms of the hour of the day and month of the year.
The final step involved preparing the data for integration into the ANN. This consisted of preprocessing the raw data, addressing any inconsistencies, and formatting it into a structured, consolidated, and normalised dataset suitable for ANN application.
4.2 Data Analysis
In this section, we conducted an analysis of the data to understand the dependence between input variables and electricity demand. We focused on identifying potential causal relationships, as our model is specifically designed for this purpose. As part of the exploratory analysis, we analysed the temporal distribution of electricity demand throughout the day, as shown in Fig. 2. The middle curve represents the average electricity demand, the upper curve represents the maximum, and the lower curve represents the minimum. Electricity demand peaks at 10 PM and reaches its lowest point around 7 AM. While the figure highlights distinct patterns, it is important to note that this exploratory analysis does not establish statistical significance.
To validate the statistical significance of causal relationships between input factors and demand, we applied an Analysis of Variance (ANOVA). This method evaluates dependence by comparing variances attributed to different sources, such as individual factors and their interactions, with the residual variance, which represents random error.
As a statistical test, ANOVA evaluates the linear influence of input factors on electricity demand by contrasting the null hypothesis with the alternative, identifying key contributors to the output. The null hypothesis (${H_{0}}$) asserts that the independent variables (or input variables) have no significant effect on the dependent variable (output variable). Conversely, the alternative hypothesis (${H_{1}}$) posits that at least one input variable significantly affects the output. The standard ANOVA model (Christensen, 1996) is formulated as:
where ${Y_{ij}}$ is the observed value for the dependent variable (electricity demand) for the i-th level of the first factor and the j-th level of the second factor; μ is the overall mean of the dependent variable; ${\alpha _{i}}$ represents the effect of the i-th level of the first factor (e.g. Month); ${\beta _{j}}$ represents the effect of the j-th level of the second factor (e.g. Temperature); ${\gamma _{ij}}$ is the interaction effect between the i-th level of the first factor and the j-th level of the second factor; and ${\epsilon _{ij}}$ is the random error associated with the observation ${Y_{ij}}$, assumed to follow a normal distribution with mean zero and constant variance.
The test statistic used in ANOVA is the F-value, which measures the ratio of variance explained by the factor or interaction to the unexplained variance (random error). This F-value is used to compute the p-value, which quantifies the probability of observing such a ratio under the null hypothesis (${H_{0}}$). A small p-value (e.g. $p\lt \alpha $, where α is typically set at 0.05) indicates that the observed variance is unlikely to have occurred by chance, leading to the rejection of ${H_{0}}$ and confirming a significant effect of the factor or interaction. A large p-value suggests insufficient evidence to reject the null hypothesis, indicating that the observed differences may result from random variation.
Table 1
ANOVA results: statistical significance of independent variables and interactions.
Variable | F-value | p-value |
Month | 379.251 | $\lt 2\times {10^{-16}}$ |
Weekend | 44.579 | $2.55\times {10^{-11}}$ |
Hour | 4533.618 | $\lt 2\times {10^{-16}}$ |
Temperature | 875.502 | $\lt 2\times {10^{-16}}$ |
Humidity | 20.656 | $5.55\times {10^{-6}}$ |
Month : Weekend | 8.320 | $9.70\times {10^{-15}}$ |
Month : Hour | 19.389 | $\lt 2\times {10^{-16}}$ |
Weekend : Hour | 5.756 | $\lt 2\times {10^{-16}}$ |
Month : Temperature | 14.013 | $\lt 2\times {10^{-16}}$ |
Weekend : Temperature | 1.384 | 0.239 |
Hour : Temperature | 19.521 | $\lt 2\times {10^{-16}}$ |
Month : Humidity | 39.199 | $\lt 2\times {10^{-16}}$ |
Weekend : Humidity | 13.819 | 0.000202 |
Hour : Humidity | 6.461 | $\lt 2\times {10^{-16}}$ |
Temperature: Humidity | 0.316 | 0.573 |
Month : Weekend : Hour | 0.987 | 0.548 |
Month : Weekend : Temperature | 6.417 | $1.02\times {10^{-10}}$ |
Month : Hour : Temperature | 1.510 | $4.01\times {10^{-7}}$ |
Weekend : Hour : Temperature | 0.811 | 0.721 |
Month : Weekend : Humidity | 4.846 | $1.64\times {10^{-7}}$ |
Month : Hour : Humidity | 1.249 | 0.004701 |
Weekend : Hour : Humidity | 0.263 | 0.999 |
Month : Temperature : Humidity | 7.281 | $1.58\times {10^{-12}}$ |
Weekend : Temperature : Humidity | 17.559 | $2.81\times {10^{-5}}$ |
Hour : Temperature : Humidity | 1.359 | 0.116 |
Month : Weekend : Hour : Temperature | 0.719 | 0.999 |
Month : Weekend : Hour : Humidity | 0.604 | 1.000 |
Month : Weekend : Temperature : Humidity | 4.961 | $9.70\times {10^{-8}}$ |
Month : Hour : Temperature : Humidity | 1.040 | 0.320 |
Weekend : Hour : Temperature : Humidity | 1.543 | 0.0467 |
Month : Weekend : Hour : Temperature : Humidity | 0.811 | 0.987 |
In Table 1, we present the results of the ANOVA analysis. In hypothesis testing, the p-value represents the probability of observing a test statistic as extreme as, or more extreme than, the one computed from the data, assuming that the null hypothesis is true. As observed in the table, the variables Month (month of year) and Hour (hour of day) exhibit a statistically significant effect on electricity demand (p-value $\lt 2\times {10^{-16}}$), indicating seasonal patterns and diurnal variations in electricity demand. The interaction effects between certain variables, such as Month : Weekend and Month : Hour, also demonstrate significant influences on electricity demand. These findings underscore the complex interplay between different factors affecting electricity demand patterns.
The five-way interaction among all variables (p-value $=0.987$) suggests that the combined effect of Month, Weekend, Hour, Temperature, and Humidity on electricity demand is not statistically significant. However, this non-significance suggests that ANOVA’s linear model does not capture a meaningful combined effect of all variables on electricity demand. This does not exclude the possibility of nonlinear relationships that are not captured in the ANOVA linear model.
In any case, the ANOVA results provide valuable insights into the factors driving electricity demand variability, emphasising the importance of considering multiple variables simultaneously in electricity demand models. However, it is worth noting that while ANOVA may not be capable of detecting nonlinear relationships, ANNs can identify both linear and nonlinear dependencies, offering a powerful alternative for modelling complex relationships in electricity demand. In the following section, we will introduce the ANN architectures we have designed, demonstrating their adeptness in capturing intricate relationships within the data.
The Degree of Adoption parameter was excluded from ANOVA because it is an estimated, rather than observed, variable derived from a logarithmic adoption model. Since ANOVA is best suited for observed variables, we used an alternative approach to test hypothesis H1, which states that incorporating an electricity adoption factor is necessary to effectively capture evolving electricity usage patterns.
To validate this hypothesis, we conducted a systematic comparative analysis across multiple neural network architectures implemented within this study. Each architecture was evaluated under controlled conditions, explicitly comparing error when including and excluding the Degree of Adoption parameter. Model performance was quantitatively assessed using standardised error metrics, enabling objective measurement of the parameter’s impact. Results showed a statistically significant improvement, with the Degree of Adoption reducing estimation error by approximately 10%.
These findings provide strong empirical support for hypothesis H1, confirming that the Degree of Adoption significantly enhances the neural network’s estimative capability by explicitly accounting for the dynamic evolution of electricity demand. Consequently, the Degree of Adoption parameter was included as a standard input factor across all subsequent architectures in this study.
4.3 Architecture Definition
In this study, the causal factors influencing electricity consumption are explicitly established as input parameters to the neural network. Conversely, electricity demand, which is influenced by these factors, is defined as the network’s output parameter. All the ANNs tested in this study share a common input and output structure. The input layer contains 8 nodes corresponding to the input factors: Month of Year (2 components), Hour of Day (2 components), Weekends, Temperature, Humidity, and Degree of Adoption. Following this, a min-max scaling layer normalises the input data to the range $[0,1]$, ensuring proportional feature contributions, consistency, and improved convergence speed. The output is rescaled based on the same proportion by which the input factors were scaled, ensuring that the estimated hourly demand in kW/person reflects the real-world scale.
In developing this ANN model, various considerations and experiments were undertaken to determine the most appropriate architecture. Through successive iterations, we tested different input factors, architectures, layer sizes, and configurations to improve the model’s performance. Each iteration involved hyperparameter tuning and architectural adjustments to more effectively capture the underlying patterns in the data.
The selection of layer size is a crucial factor in model performance. In this study, we employed a heuristic approach to determine an appropriate layer size. While systematic hyperparameter optimisation methods, such as grid search, Bayesian optimisation, or metaheuristic algorithms, are often used in research to refine network architectures (Kaveh and Mesgari, 2023), they come with significant computational costs. Given the practical scope of our work, our heuristic approach provides a reasonable trade-off between error and efficiency without requiring an exhaustive exploration of the hyperparameter space.
Initially, we experimented with a simple feedforward linear network (Fine, 1999), but it proved inadequate due to the inherent non-linearities in the data. However, the linear model was insufficient for capturing the complex relationships between input factors and electrical demand. Therefore, we explored other architectures to achieve smaller error.
We experimented with Deep Feedforward Neural Networks (DFNNs), which are characterised by having at least five hidden layers. This depth allows the network to model complex patterns in data, with each layer applying non-linear transformations to the input data, effectively handling non-linearities. The architecture consisted of five hidden layers with 50, 250, 750, 300, and 150 nodes, respectively, as shown in Fig. 3, with ReLU activation functions incorporated alongside the linear layers.
We also explored a Multi-Layer Perceptron (MLP) architecture (Delashmit et al., 2005), incorporating activation layers such as the sigmoid and ReLU functions (Dubey et al., 2022), as illustrated in Fig. 4. Activation layers introduce non-linearities into the model, enabling it to capture more complex patterns. The sigmoid function, though historically popular, suffers from vanishing gradient issues, leading to slower convergence. In contrast, ReLU exhibited superior performance by maintaining the data’s dynamic range and offering computational efficiency.
After testing these architectures, we decided to explore Kolmogorov-Arnold Networks (KANs) due to their computational efficiency compared to MLPs (Liu et al., 2024), making them suitable for scenarios with limited resources. Unlike MLPs, KANs do not require fixed activation functions, which simplifies their architecture. Additionally, they excel at uncovering complex mathematical relationships even when the underlying patterns are unknown, making them well-suited for the task at hand.
We implemented a KAN, leveraging the Kolmogorov-Arnold Representation Theorem, shown here:
This decomposition forms the foundation of KANs, where the univariate functions ${g_{i}}$ and ${h_{ij}}$ work together. The functions ${h_{ij}}({x_{j}})$ independently transform each input variable ${x_{j}}$ into an intermediate representation, and these intermediate representations are aggregated by the outer functions ${g_{i}}$, which combine them into the final output. This approach ensures efficient approximation of complex multivariate functions through independent transformations and collective integration.
KANs approximate the functions ${g_{i}}$ and ${h_{ij}}$ using learnable activation functions, with B-splines being a common choice. B-splines are piecewise polynomial functions, known for their flexibility and efficiency in function approximation, defined using a set of knots ${t_{0}},{t_{1}},\dots ,{t_{m}}$ and recursively constructed as:
These basis functions are efficient for approximating the univariate functions ${g_{i}}$ and ${h_{ij}}$ due to their locality property. Adjusting a single control point influences only a localized region, ensuring computational efficiency.
Following the Kolmogorov-Arnold decomposition, the KAN structure processes inputs ${x_{j}}$ through the inner functions ${h_{ij}}({x_{j}})$ in the input layer. The intermediate representations are combined through summation in the intermediate layer, and the outer functions ${g_{i}}$ in the output layer synthesize these results to approximate the target function. This design ensures efficient approximation of complex multivariate functions, providing both theoretical and computational advantages, such as universal approximation and low-overhead computations via B-splines.
Unlike MLPs and DFNNs, which use fixed activation functions, KANs employ learnable multivariate functions that act as both weights and activation functions. These adaptive functions evolve during learning, capturing complex relationships in the data more effectively. KANs also offer enhanced interpretability and can interact seamlessly with human users, making them highly suitable for applications such as microgrids, where data may be scarce and retraining is crucial. Due to their architecture, KANs can be said to support continual learning (Verwimp et al., 2024) more effectively than MLPs.
The KAN approach allows seamless integration of new data, ensuring the model remains up-to-date, enhancing its practical utility in dynamic environments.
Figure 5 illustrates the architecture of the KAN designed to estimate hourly electricity demand. The network includes multivariate layers (depicted in dark grey) that replace traditional linear weights with learnable activation functions. The input layer has 8 nodes, corresponding to 6 input factors, with month and hour split into two components. The hidden layer has 50 nodes and is fully connected, employing the multivariate approach to learn the data patterns. Finally, the output layer produces the hourly demand in kW/person, processed through a scaling layer to match the real-world data scale.
4.4 Training and Validation
The training of the ANN is a critical aspect of the method, involving the application of various optimisation techniques and the iterative adjustment of the nodes and edges within the ANN. The choice of optimiser typically depends on the specific problem and dataset. It is common practice to experiment with different optimisers to identify the most effective approach for the modelled data. In the domain of optimisation algorithms for training deep learning models, one of the most widely established methods is Adaptive Moment Estimation (Adam) (Saad and Adnan, 2021). Due to its effectiveness and popularity, Adam was selected as the optimisation algorithm for training the ANNs in this study (Kingma and Ba, 2014). Adam combines the strengths of various optimisation techniques, providing efficient and adaptive learning rates during training.
This phase utilised the publicly available dataset comprising 13 872 hours, which was divided into three subsets: 8 878 hours for training, 2 220 hours for validation, and 2 774 hours for testing. This distribution follows standard practice commonly adopted in ANN applications. During the validation process, a learning rate of 0.01 was chosen, with a maximum of 20 epochs set. Within this range, the early stopping mechanism identified a sufficient number of epochs for termination. The training and testing data were randomly selected, in accordance with standard practices in machine learning experimentation.
The model was trained on a standard CPU, specifically an Intel Core i9-9980HK with a base clock speed of 2.4 GHz and 32 GB of RAM. Notably, it was unnecessary to use a GPU for this training process, demonstrating that even with modest computational resources, such as a standard personal computer, the model can be effectively trained. This makes the approach particularly feasible for deployment in environments where advanced computational infrastructure is unavailable.
During the training phase, the error of the trained architectures was evaluated using Mean Absolute Error (MAE) (Bhuyan et al., 2016). The error is calculated as the average of the absolute differences between estimated values and actual values as shown in Yan and Zhou (2024):
where ε represents the error, n represents the number of data points, ${Y_{i}}$ denotes the actual observed values, and ${\hat{Y}_{i}}$ represents the estimated values. MAE assigns higher weight to larger errors, making it sensitive to outliers. Consequently, significant errors have a more substantial impact on the overall loss.
This study also aims to validate the model and assess its error. Validation entails comparing the model’s estimations against actual measured data, providing a measure of its reliability. We evaluate the error (ε) which quantifies the model’s ability to estimate electricity demand across diverse conditions. The error serves as a key metric for determining the effectiveness of the model. By quantifying the degree to which the model’s estimations align with actual measured data, error provides a clear and interpretable measure of performance. Lower error values indicate stronger estimating capabilities, reinforcing the model’s suitability for estimating electricity demand in isolated areas. This metric not only validates the reliability of the model but also informs potential improvements for enhanced performance.
4.5 Results
Figure 6 presents the results from 20 experiments conducted for each ANN architecture: DFNN, MLP, and KAN. Each experiment involved training the respective ANN for up to 20 epochs. The figure illustrates the relationship between training duration and mean error across the 20 experiments, with key data points corresponding to specific epochs (1, 5, 10, 15, and 20). These points provide a clear trend analysis of the architectures’ performance over time, highlighting differences in efficiency and error.
The choice of stopping criteria in neural network training can be approached in two ways: either by setting a target error and determining the time required to reach it or by defining a fixed training time and evaluating the resulting error. In Fig. 6, we followed the first approach, selecting the number of epochs based on the observation that the obtained errors remained within an acceptable range for demand estimation models. This decision keeps the model within the target error threshold while avoiding unnecessary computational costs. In Fig. 7, we followed the second approach, setting the training time to a maximum of 20 minutes to prevent overfitting.
Figure 6 demonstrates that KAN achieves the lowest error, reaching 0.042 in under 2 min, significantly outperforming the other architectures in terms of performance. This performance highlights the computational advantages of KANs, especially their ability to rapidly converge to a low error. MLP converges faster but with a higher error (0.09), failing to achieve the rapid convergence and low error rates of KAN, which reduces the error by nearly 54% compared to MLP. In contrast, the DFNN takes longer than the other architectures, requiring over 6 min to achieve an error of 0.049, despite KAN reducing this error by 13%. Its increased complexity, with additional layers and nodes, appears to hinder rather than improve performance, indicating that greater model complexity does not necessarily lead to better or faster results.
Figure 7 demonstrates that even with longer training times, KAN remains the most effective architecture in reducing error, reaching 0.04. DFNN stabilises at an error between 0.044 and 0.049 after 5 min of training. MLP reaches its minimum (0.081) error after 4 min but then remains above 0.08.
Our experimental results confirm the effectiveness of KANs for modelling electricity demand in microgrids, consistent with their known capability to capture complex and non-linear relationships (Liu et al., 2024). An important finding is their reduced low training times and minimal resource usage, making KANs particularly valuable. This is important in isolated regions, where demand patterns change frequently, requiring models to be updated regularly. The efficiency of KANs allows for rapid retraining and deployment, even with limited computational resources, ensuring timely insights in these challenging environments. While DFNN provides a more balanced alternative, MLP’s slower convergence and higher error emphasise the importance of selecting the right architecture when designing ANNs for specific applications.
These findings emphasise the critical role of architecture selection in optimising neural network performance for electricity demand estimation. Specifically, they provide empirical support for H2, confirming that neural networks explicitly designed to model nonlinear relationships achieve lower error and greater computational efficiency compared to traditional architectures.
From this point onward, we conducted new experiments to explore whether we could leverage continuous learning (Ke et al., 2021) to further enhance the efficiency of KAN. By enabling the model to integrate new data incrementally without requiring complete retraining, it offers a promising approach to reduce computational overhead.
Our findings reveal that while continuous training effectively reduces training time, it introduces a trade-off: increased error, as shown in Fig. 8, leading to less reliable outcomes compared to full training. Full training consistently results in smaller errors, though it requires longer training times. The increase follows a linear rather than an exponential trend, as more data naturally demands more training time.
As a result, in this case the choice of trade-off is clear: in scenarios where lower error is key, reliability is prioritised over time savings. This approach ensures that the system retains essential knowledge, even at the expense of longer training time. A viable compromise may involve methods that mitigate catastrophic forgetting (McCloskey and Cohen, 1989) without excessive computational overhead.
5 Conclusions and Future Work
This study explores electricity demand estimation in microgrids, demonstrating that incorporating the Degree of Adoption reduces estimation error by effectively capturing the progressive electrification process. Specifically, this parameter reflects evolving demand patterns driven by increased electricity access and the integration of electrical appliances among consumers. Our findings empirically support hypothesis H1, confirming the key role of explicitly modelling electrification dynamics in improving the model’s estimation capability.
Another key contribution is that the experimental results demonstrate the superiority of KAN as a powerful solution for electricity demand modelling compared with MLP and DFNN. KAN achieved an error of 0.042 in less than two minutes, outperforming MLP (0.09 in under one minute) and DFNN (0.049 in over six minutes). These findings demonstrate that non-linear architectures, such as KAN, can outperform traditional architectures, supporting our H2. Therefore, the proposed KAN architecture proves to be a robust and scalable solution for electricity demand modelling, addressing the challenges posed by sparse and scarce data.
Beyond their low error, KANs are distinguished by their architectural simplicity and efficiency, consistently maintaining minimal error while requiring fewer computational resources. While continuous training is feasible, it seems unnecessary in this context due to the increased error. In this case, convergence time is not a limiting factor—rather, the primary objective is minimising error without compromise.
The modelling approach can be further extended to other geographical areas for validation and applied to electricity demand estimation in renewable energy communities or individual buildings. Future work will focus on integrating additional metadata, such as solar irradiation and socio-economic indicators, to reduce error, as well as conducting sensitivity analyses to assess the significance of different variables. Moreover, determining the minimum dataset size required to achieve an acceptable error threshold will provide valuable insights for deploying the model in data-limited scenarios. Further research could also explore the implementation of a structured optimisation process to systematically determine optimal hyperparameters, such as layer size, to enhance model performance.