Can GPT-4 Perform Data Mining for Building Energy Management?
A recent study evaluates GPT-4’s performance in completing data mining tasks for building energy management, including energy load prediction, fault diagnosis, and anomaly detection.
The recent dawn of ChatGPT (Chat Generative Pre-Trained Transformer) has spurred a slew of experiments testing the artificial intelligence (AI) language model’s ability to complete critical tasks otherwise performed by humans. In the energy field, a recent study explores using the fourth series of the generative pre-training transformer (GPT-4) model to automate data mining for building energy management.
ChatGPT. Image used courtesy of Pexels
Researchers from the Netherlands-based Eindhoven University of Technology and China’s Institute of Refrigeration and Cryogenics at Zhejiang University found GPT-4 can automatically generate energy load prediction codes, diagnose system faults, and detect anomalies in a human-like capacity.
GPT-4, released in March 2023, is the most advanced version of the multimodal large language model developed by Silicon Valley-based OpenAI. The study—recently published in Energy and Built Environment—details how GPT-4 could automate most data mining tasks for building energy management, opening up critical opportunities for the domain. However, a few notable limitations emerged under some of the prompts/tasks, leading the researchers to cite several areas for future research.
Yang Zhao, a research professor at Zhejiang University and one of the study’s authors, noted that while automated data mining tools are rare for building energy management, the study shows GPT-4 is promising for enabling computers to take on customized data mining with limited human assistance.
Chart outlining the researchers’ framework for evaluating GPT-4’s performance in energy load prediction, fault diagnosis, and anomaly detection tasks. Image used courtesy of the study authors (Creative Commons) – Figure 1
GPT-4’s Ability to Mine Data for Building Energy Management
In testing the cooling load prediction tasks using data from a real office building, the researchers found GPT-4 could accurately generate codes across most of the six steps of predictive modeling. Using operational data from an office building, the team ran five chats with the same prompt for each assessment, asking GPT-4 to generate a Python code based on the task requirements and operational dataset.
They then tested its code correction and consistency. The results found that GPT-4 can always churn out correct codes without needing modification in simple tasks. However, the codes generated in complex tasks required 3.2 revisions on average. The researchers also noted that when predicting the cooling load of an office building one hour ahead of time, GPT could achieve high accuracy in simple and complex tasks. Still, the codes generated in simple tasks typically involve fewer functions than complex ones.
Prediction accuracy of GPT-4: The left graph depicts the results of chats testing GPT-4’s ability to complete simple prediction tasks, while the right shows complex tasks. Image used courtesy of the study authors (Creative Commons) – Figures 5 and 6
In diagnosis tests, GPT-4 could identify most of the common faults of air handling units (AHUs), chillers, and variable refrigerant flow (VRF) components in HVAC systems with high accuracy, in addition to explaining the factors behind the results. In tasks assessing AHUs, the study found that using both fault data and normal data in the prompts can improve GPT-4’s accuracy, and using symptoms and fault labels in prompts can boost its consistency, as shown in the image below.
GPT-4’s performance when diagnosing faults in an air handling unit (AHU) based on symptoms with fault labels. Image used courtesy of the study authors (Creative Commons) – Table 6
Finally, GPT-4 could identify an HVAC system’s typical abnormal operation patterns in anomaly detection tests and explain the causes. Using two-year operational data from a chiller plant in an office building, the researchers sent prompts asking GPT-4 to identify three abnormalities involving supply chilled water temperature, temperature difference, and coordinated operation patterns between devices. In these categories, it only detected some anomalies with high accuracy, but it couldn’t identify others.
However, the researchers could boost its accuracy by inserting association rules (pulled from time series data) into the prompts: GPT-4’s average interference correctness grew by 80% compared to time series data-based anomaly detection. Using rules also improved its diagnosis accuracy by 7.7% on average.
Impressive Results, Limitations Remain
GPT-4 shows impressive analysis and automation capabilities in data mining tasks that target improved building energy efficiency. However, the researchers cited several limitations. GPT-4’s low stability reduces the reliability and reproducibility of its outputs. Further, since it lacks the domain knowledge of humans in this profession, it cannot reliably interpret load prediction models. It also couldn’t determine the causal relationship between some faults and symptoms nor understand the normal ranges of some anomaly variables in HVAC systems.
Finally, the researchers said GPT-4’s mathematical abilities are poor, typically making mistakes in calculating statistical characteristics of time series data. This weakens its effectiveness in analyzing data for anomaly detection.
The team proposed a few research topics to address these limitations in future studies, including developing automatic prompt input methods, training GPT-4 to use software platforms, and creating a customized model for the building energy management domain.