Can machine learning help predict energy usage? A project with one of the UK’s largest holiday providers is giving promising results, according to Ashley Whichelow, Bourne Leisure, Ioana Buzelan, UCL School of Management, and Gabe Friedland, Utilidex. 

To read the article in the October 2020 issue click here or read below. 


The process of setting annual energy budgets and then tracking and reporting on variations can be a challenging activity for organisations with a large energy spend.  In most cases not only are organisations tracking their energy consumption and costs, but they are also overlaying investment opportunities in energy efficiency and/or on-site generation, which are set to have an impact on their budgets.

Ordinarily this entails a large amount of spreadsheet work to create the budget in the first place, and then on-going monthly analysis and discussions to explain why spend either exceeds or is less than what was originally expected.  This process is not only not only time consuming, but also very difficult to pin down and explain the exact reasons as to why budget variations occur.

As an organisation, Utilidex is continually looking at how technologies can be utilised to help answer the challenges of the energy management community.  In our most recent R&D activity, we were fortunate to be presented with an opportunity to work alongside the UCL School of Management via an internship with their MSc Busines Analytics course.  Our theme was predicting commercial electricity consumption in travel, hospitality and the leisure sector using Machine Learning.  

Utilidex alongside a longstanding client Bourne Leisure, one of the largest providers of holidays and holiday home ownership in the UK, and the UCL School of Management undertook a research project to explore how data science techniques could help solve energy budgeting inaccuracies and improve the validity of energy management investment cases.

The energy management community has traditionally used a historical data set to create an energy forecast and extrapolated demand data forward. At this point they may make an allowance for new projects which might reduce energy consumption and also make provision for additional expansions, such as new site extensions that might cause an increase in demand.

The goal of the research was to understand budget deviations for electricity consumption by explaining how much energy should have been used according to exogenous features. To do this, we examined the impact that weather forecast indicators and seasonality may have on electricity consumption, with a view to create a predictive model for the energy consumption.

With a focus on their explanatory abilities and predictive performance, two widely used Machine Learning approaches were applied to perform a regression task in a supervised learning setting. They are used for both statistical inference and to predict electricity consumption. This included analysing data with Multiple Linear Regression models, as well as applying Decision tree analysis.

We incorporated energy meter data, with a variety of hourly weather data such as humidity, pressure, cloudiness and windspeed alongside the more traditional temperature and precipitation factors. For the Linear Regression models, the input data was also transformed creating interactions aiming for better explanation of the consumption variations. We also explored the significance of seasonality, months of the year, days of the week, bank holidays and seasonal holidays. 

We trained the models using 3 years worth of data and tested their performance when predicting a specific unseen budgetary month’s consumption. It was found that linear and non-linear regressors tend to be inconclusive and provide unreliable results when predicting, because they rely on too granular observations and too rigid assumptions, which cannot be addressed entirely with data transformation and feature interaction efforts. Overall, Linear Regression with interactions and Random Forest show the best performance, assessed by the fit to the data and prediction error on unseen data. From an inference perspective, it was discovered that seasonality features – months, weekdays, periods of the day –, as well as opening times andoccupancy tend to have a higher importance than weather indicators when explaining consumption variation. This stronger seasonal effect could be explained by a more general data influence, which overtakes the granular instances and changes of weather indicators at an hourly rate.

Nevertheless, the research is based on a limited dataset which restricts both models and findings from generalising. Linear Regression is based on specific and rigid statistical assumptions which tend to easily be violated, while the Decision Trees tend to easily overfit the data and become computationally more expensive.

The findings motivate further research of more advanced models, which can better forecast power consumption accounting for the exogenous variables specified across time, such as (S)ARIMAX or Neural Networks. Furthermore, we encourage to reconsider the influence of factors such as occupancy.

If you are interested in exploring a research opportunity with Utilidex, please contact [email protected]