Detection of cyber attacks in electric vehicle charging systems using a remaining useful life generative adversarial network

Hence, these methods can create an effective cybersecurity strategy for EVS systems. We conducted this study in Python using the Google Colab Pro platform. We preferred the robust hardware infrastructure supported by NVIDIA’s A100 GPU, which has a capacity of 2 × 16G (32GB) RAM.

Table of Contents

Dataset

The data was collected in a laboratory environment established by the Canadian Institute for Cybersecurity (CIC)²⁶. The dataset is structured around electrical measurements and cybersecurity events, capturing normal operations and attack patterns. The primary features consist of electrical parameters, including shunt voltage, bus voltage measured in volts (V), current measured in milliamperes (mA), and power measured in milliwatts (mW). The dataset includes two crucial target columns: a Label column identifying whether a particular instance represents an attack or normal behavior and a time_diff_to_next_attack column that measures the temporal distance to the next attack occurrence. Regarding attack patterns, the dataset specifically focuses on syn-flood attacks and denial-of-service attacks. This provides a focused lens for studying one specific type of cyber threat. The temporal aspects of the data are captured through time-based features. The primary time column records timestamps for each observation. This is complemented by the time_diff_to_next_attack feature, which provides crucial information about the temporal proximity of attack events. This combination of electrical measurements, attack labeling, and temporal features creates a comprehensive dataset for training and evaluating cybersecurity detection systems, particularly those focused on identifying syn-flood attacks in electrical infrastructure.

The comprehensive details of the dataset are clearly outlined in Table 2.

Table 2 The details of the dataset²⁶.

We took the dataset utilized in the current study from a real EV charging network on the campus of a large technology company. This included the CSV file for the EVSE-B’s power consumption under attack and benign conditions. The raw data consisted of 115,298 individual charging events, including the variables below: time, shunt_voltage, bus_voltage_V, current_mA, power_mW, State, Attack, Attack-Group, Label, and interface. The EVS data set consisted of data collected at one-minute intervals. Therefore, when the tag was created to calculate the time left for the attack, the problem was that all tags had the same value. This caused the applied models to fail.

We performed certain operations on the data set to solve the problem and create a more accurate labeling process.

First, upon examining the data set, we determined that more than 60 data were collected in the same minute in some cases, while less than 60 data were collected in other cases. This imbalance negatively affected model performance by causing inconsistencies in time series analysis. To solve the problem, we restructured the data set and limited the number of data in each minute to a maximum of 60 data.

We followed the steps below in the process of restructuring the data set:

Data Selection and Filtering: We selected 60 data points from the data collected within the same minute using the random selection method. The purpose was to form a more homogeneous data set by decreasing data density.

Updating the Time Column: We updated the time column, assuming that each data is collected at one-second intervals. We made this assumption to ensure data collection consistency and increase the effectiveness of time series analysis. Accordingly, the data was arranged between 0 and 59 s within a minute.

Label Creation: Based on the newly edited data set, we recalculated the labels representing the time remaining to the next attack for each data. This prevented tags from having the same value as before and ensured that more diversified and meaningful tags were created.

Afterward, we added the TIME_TO_NEXT_FLAG column; if there was an attack tag, we calculated the time remaining until the attack to be 0. If there was no attack tag, we calculated the difference in the time until the subsequent attack.

The novel data set obtained from these processes offers a more balanced time series structure. This structure ensures that more accurate and reliable labels are produced when calculating the time remaining for the attack. Thus, the performance of the models applied to the data set increased considerably, and more reliable results were yielded.

The data set consisted of 115,298 data. We obtained 116,298 data by adding 1000 more data with generative AI. We used 23,133 data as test data.

Model performance metrics

To comprehensively and objectively assess the models’ performances, we selected the Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared (R²) as the models’ evaluation indicators.

The MAE refers to a metric that measures the average magnitude of the errors between actual and predicted values without considering the error’s direction. It computes the absolute difference between the actual and predicted value for each observation and then averages these differences.The MAE is calculated as in Eq. (20):

$$MAE=\frac{1}{n}\mathop \sum \limits_{{i=1}}^{n} \left| {{y_i} – {{\hat {y}}_i}} \right|$$

(20)

$$MSE=\frac{1}{n}\mathop \sum \limits_{{i=1}}^{n} {\left( {{y_i} – {{\hat {y}}_i}} \right)^2}$$

(21)

The RMSE is capable of visually expressing the model’s error. Nevertheless, outliers substantially impact it and must be assessed comprehensively with other indicators. The RMSE is calculated as in Eq. (22):

$$RMSE=\sqrt {\frac{1}{n}\mathop \sum \limits_{{i=1}}^{n} {{\left( {{y_i} – {{\hat {y}}_i}} \right)}^2}}$$

(22)

The R² score, also called the coefficient of determination, represents a statistical measure indicating how well the model explains the variance in the dependent variable. The R² is calculated as in Eq. (23):

$${R^2}=1 – \frac{{\mathop \sum \nolimits_{{i=1}}^{n} {{\left( {{y_i} – {{\hat {y}}_i}} \right)}^2}}}{{\mathop \sum \nolimits_{{i=1}}^{n} {{\left( {{y_i} – \bar {y}} \right)}^2}}}$$

(23)

Where ${y_i}$ denotes the actual i^th value, ${\hat {y}_i}$ represents the predicted i^th value, and $\bar {y}$ refers to the average of the actual values.

Results and discussion

The developed model’s predictive performance in estimating Time_To_Next_Flag was assessed by comparing the predicted values to actual outcomes. The results highlight the model’s behavior in over- and under-estimation scenarios and its effectiveness in providing accurate time-to-event predictions.

When the model performance is assessed based on the warning system, how many of the estimated values calculated using the above formula can be caught before the failure time and work as a correct warning system is determined.

The numerical value under each figure (Figs. 8, 9, 10, 11, 12, 13 and 14) is the difference between the predicted and actual attack time and the predicted attack time being successful before the real attack time. Therefore, it should be considered a case, not an instance.

All models were meticulously trained for 50 epochs using the highly efficient Mean Squared Error (MSE) loss function. We split the dataset into 80% for training and 20% for testing, ensuring robust performance assessment with a batch size of 64. The Adam optimizer was employed with a learning rate 0.001, facilitating optimal convergence.

The experimental setup used a two-layer LSTM architecture, with 50 units in each layer. It employed Tanh and sigmoid activation functions. The model was optimized using the Adam optimizer, with a learning rate of 0.001 and a dropout rate of 0.2 to prevent overfitting. The GRU model is equally impressive, featuring two stacked GRU layers with 100 hidden units each. The first layer is configured to return sequences (Yes), providing valuable temporal information, while the second layer does not (No). The RNN model follows suit, consisting of two SimpleRNN layers of 100 hidden units each. It adheres to the same sequence return configuration as the GRU model, reinforcing its utility in time-series data. The CNN model harnesses the power of a Conv1D layer with 64 filters, followed by a Dense layer with 64 units, effectively eliminating the need for the “Return Sequences” parameter. The MLP model is expertly designed with three hidden layers of 128-64-32 sizes, rendering it fully connected and thus exempt from needing a “Return Sequences” parameter.

Finally, the Dense model is robustly structured with multiple layers: a Dense layer with 128 units followed by a LeakyReLU activation (0.01) and a Dropout layer (0.2), transitioning to another Dense layer with 64 units, which again employs LeakyReLU (0.01) and Dropout (0.2), culminating in a Dense layer with a single unit featuring a ReLU activation. Being a feedforward Dense model, it is not subject to the “Return Sequences” constraint, ensuring streamlined processing that reliably captures essential patterns in the data.

Prediction performance analysis using the GAN-LSTM hybrid model

Figure 8 illustrates the actual and expected outcomes of forecast Time_To_Next_Flag with the GAN-LSTM hybrid model. Out of the total cases, 1,423 cases exhibited predicted values greater than actual ones. This overestimation indicates that the model’s pre-warning mechanism may trigger prematurely, potentially leading to unnecessary system adjustments. On the contrary, the predicted value was lower than the actual one in 1,369 cases. This underestimation is crucial since it allows the system to issue timely preemptive warnings, improving its capability to prevent possible attacks. Reduced false negatives contribute directly to the system’s overall reliability and operational efficiency. Furthermore, in 20,441 cases, the remaining time until the attack was correctly estimated as zero, which indicates that the model accurately predicted imminent attacks. This shows the model’s ability to create an effective predictive mechanism to identify time-sensitive security breaches.

Prediction performance analysis using the GAN-GRU hybrid model

The actual and predicted outcomes of forecast Time_To_Next_Flag with the GAN-GRU hybrid model are shown in Fig. 9. Out of the total predictions, 1,467 cases had higher predicted values than the actual ones. This overestimation indicates a potential problem with pre-warning signals since early predictions may cause unnecessary system adjustments, which can disrupt the system’s stability. On the other hand, the predicted value was lower than the actual one in 1,382 cases. This underestimation is critical to the system’s reliability since it allows for the earlier detection of impending attacks. Reduced false negatives increase the system’s efficiency and improve its reliability in preventing potential threats. Additionally, in 20,441 cases, the time to the attack was correctly estimated as zero, indicating that the model effectively identified imminent attacks and provided accurate predictions. This confirms the ability of the model to establish an effective prediction mechanism for identifying system vulnerabilities.

Prediction performance analysis using the GAN-RNN hybrid model

Figure 10 illustrates the actual and predicted outcomes of forecasting Time_To_Next_Flag with the GAN-RNN hybrid model. In 1,552 cases, the predicted value was higher than the actual one, suggesting that the model triggered pre-warnings prematurely. While this may be considered a precautionary measure, frequent over-prediction can cause inefficiencies and unnecessary responses within the system. Excessive false positives can destabilize the system, emphasizing the need for the model to minimize such occurrences for enhanced system health and stability.

On the other hand, the predicted value was lower than the actual one in 1,297 cases. In contrast to over-prediction, under-prediction is crucial to decrease the likelihood of missed attacks by allowing preemptive actions. This lower number of false negatives shows that the system effectively identifies threats with sufficient warning, thereby increasing the overall reliability and efficiency of the prediction mechanism. A model with fewer false negatives is critical in robust defense systems since it ensures timely warnings without neglecting potential risks. Moreover, in 20,441 cases, the time remaining to the next attack was estimated as 0. The prediction was accurate in these cases. The result above indicates that the model can detect attacks when they are imminent and correctly predict the remaining time of the attack. Such precision in real-time forecasting is critical for system defense since it ensures that protective measures are executed timely, demonstrating the model’s potential as a powerful and reliable prediction tool. These observations demonstrate that the model accurately forecasts imminent attacks and effectively balances false positives and negatives. This performance, especially in reducing false negatives, enhances the model’s reliability, making it a critical component for any system that requires real-time prediction and defense mechanisms.

Prediction performance analysis using the GAN-CNN hybrid model

The actual and predicted outcomes of forecast Time_To_Next_Flag with the GAN-CNN hybrid model are given in Fig. 11. The predicted value exceeded the actual one in 1,434 cases, indicating that pre-warnings may not form a robust system structure based on these predictions. Conversely, the predicted value was lower than the actual one for 1,385 samples. This scenario, which includes fewer false negatives, can prevent potential attacks by allowing for advanced warnings, improving the efficiency and reliability of the system. Additionally, the remaining time to an attack was correctly estimated as zero in 20,441 cases, which showed the ability of the system to identify attacks and thus establish an accurate prediction mechanism.

Prediction performance analysis using the GAN-MLP hybrid model

The actual and predicted outcomes of forecast Time_To_Next_Flag with the GAN-MLP hybrid model are shown in Fig. 12. The predicted value exceeded the actual one in 1,418 cases, suggesting that pre-warnings based on such predictions may not establish a stable structure for the system. On the contrary, the predicted value was lower in comparison with the actual one in 1,431 cases, indicating the potential for preempting potential attacks through warning. Decreased false negatives increase the efficiency and reliability of the system by enabling more accurate attack forecasts. The remaining time until an attack was correctly predicted as zero in 20,441 cases. This demonstrates that the system effectively identifies attack scenarios and provides a robust attack prediction mechanism.

Prediction performance analysis using the GAN-dense layer hybrid model

The actual and expected outcomes of forecast Time_To_Next_Flag with the GAN-Dense Layer hybrid model are displayed in Fig. 13. The predicted values were more significant than those in 1,444 cases. Such overestimations suggest that issuing pre-warnings based on these predictions may not contribute to a healthy operational structure for the system. On the other hand, the predicted values were smaller than the actual ones in 1,375 samples. This scenario can preemptively prevent attacks since it allows for warnings, increasing the system’s efficiency and reliability through reduced false negatives. Moreover, the remaining time until an attack was accurately estimated as zero in 20,441 cases, demonstrating the system’s capability to identify attack scenarios accurately and confirming the effectiveness of the prediction mechanism employed.

The results illustrated in Fig. 14 indicate that all models achieve commendable R² values, ranging from approximately 0.70 to 0.75, demonstrating their overall efficacy. Notably, GAN-LSTM and GAN-GRU exhibit slightly higher R² values, around 0.73, whereas GAN-Dense Layer, while still respectable, has the lowest value at approximately 0.72. A review of the error metrics reveals consistently low figures across all models: the Mean Absolute Error (MAE) falls between 0.03 and 0.04, the Mean Squared Error (MSE) ranges from 0.01 to 0.02, and the Root Mean Squared Error (RMSE) shows similar values between 0.09 and 0.10. The sequential models, which include GAN-LSTM, GAN-GRU, and GAN-RNN, demonstrate marginally superior performance in terms of R² and slightly lower error metrics. Among these, GAN-LSTM stands out as the top performer. GAN-CNN exhibits results comparable to those of the sequential models, while GAN-MLP shows a slight decrease in performance, and the GAN-Dense Layer demonstrates the lowest overall effectiveness. Despite minor variations in performance, all models achieve impressive results with R² values exceeding 0.70. The error metrics remain consistently low across the different architectures. Notably, sequential models, particularly GAN-LSTM, show a slight advantage in accuracy. These findings strongly suggest that, while variations exist among the models, all tested GAN architectures represent robust and reliable options tailored to specific applications, making them valuable choices for enhancing predictive performance.

Table 3 summarizes the performance of diverse models integrating Generative Adversarial Networks (GANs) with various DL architectures. MAE, Mean MSE, RMSE, and R-squared (R²) are among the metrics used for evaluation. The MAE values across the models varied between 0.0281 and 0.0291. The GAN-GRU model exhibited the lowest MAE at 0.0281, indicating the highest accuracy in its predictions. On the contrary, the GAN-RNN model had the highest MAE at 0.0291, suggesting slightly lower accuracy than the other models. All models had relatively low MAE values, demonstrating that their predictions were generally close to the actual values. The MSE values varied between 0.0089 and 0.0092, with the GAN-CNN model performing the best at 0.0089. The GAN-RNN model had the highest MSE at 0.0092, indicating its tendency to produce more significant errors. All models generally maintained low MSE values, which implied that significant prediction errors were infrequent across all architectures. The RMSE values ranged from 0.0948 to 0.0958. The GAN-CNN model again outperformed the others with the lowest RMSE at 0.0948, reflecting a consistent and low prediction error rate. The GAN-RNN model had the highest RMSE at 0.0958, which showed slightly less consistency and accuracy in its predictions than the others. The R² values indicate the proportion of variance explained by the models, ranging from 0.7280 to 0.7379. The GAN-MLP model had the highest R² value of 0.7379, suggesting that it explained about 73.79% of the variance in the data, which is a strong performance. The GAN-LSTM model had the lowest R² value of 0.7280. Nevertheless, it still suggested a reasonable explanatory power.

Table 3 The performance of the GAN-based hybrid DL models.

Despite the promising results, several limitations remain. First, the models rely on historical attack data, which may not effectively capture emerging and adaptive cyber threats. The generative capabilities of GANs help mitigate this limitation, but further refinement is needed to ensure the models remain robust against novel attack strategies. Second, the high computational cost of training hybrid deep learning models can hinder real-time deployment, particularly for large-scale EVSE networks. Third, the potential for false positives and negatives, while minimized, still requires further optimization to enhance detection efficiency and prevent unnecessary system alerts.

Future research should focus on the following directions:

Enhancing Generalization and Robustness: Exploring advanced adversarial training techniques to improve model resilience against evolving cyber threats. Introducing reinforcement learning-based adaptation mechanisms can also enable models to self-improve based on real-time threat scenarios.
Real-Time Implementation: Optimizing computational efficiency to enable real-time deployment of GAN-based hybrid models in EVSE systems. Techniques such as model pruning, quantization, and edge computing can enhance performance.
Integration with Blockchain and Secure Communication Protocols: Machine learning-based attack detection combined with blockchain technology creates a secure and tamper-proof EVSE network. Implementing decentralized security protocols can further reduce vulnerabilities.
Transfer Learning and Few-Shot Learning: Investigating methods that allow models to learn from limited attack instances and generalize across different EVSE infrastructures without requiring extensive retraining.
Multi-Modal Cybersecurity Approaches: Combining deep learning models with traditional cybersecurity techniques, such as signature-based detection and heuristic analysis, creates a more comprehensive defense system.

Ethical and Regulatory Considerations: Examining the ethical implications and regulatory requirements for deploying AI-driven cybersecurity systems in critical infrastructure like EVSE. Ensuring compliance with global cybersecurity standards is crucial for widespread adoption.

By addressing these challenges, future studies can further enhance the security of electric vehicle charging infrastructure, ensuring a proactive defense against cyber threats while improving the reliability and resilience of EVSE systems.

link