Evaluating AutoML for Forecasting Domestic Electricity Data
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making decisions.
AutoML for Time Series Forecasting
Subsequently many systems are now developing and integrating AutoML tools for time series forecasting. In these systems the user only has to input a time series data source, and will be returned a trained forecasting model that has been selected and tuned using some automatic procedure.
Millions of Domestic Electricity Data Streams Incoming
With the rollout of “smart meters” across Europe we are suddenly capable of collecting near-realtime electricity data from each individual house hold. When predicting electricity load for entire grids we can rely on the law of large numbers, but personal consumption is notoriously hard to predict.
There are several high quality datasets of domestic electricity data that could be used, for example the IDEAL Dataset with 2 years worth of data at 1-second sampling frequency for 255 homes in the UK.
These are fairly large datasets (15-30GB), so expect some amount of data wrangling and pipelining. If you want the design of a nice and scalable pipeline architecture could probably be the entire project.
At DasyaLab we have built a prototype IoT device that reads the measurements from the type of meters used in the greater Copenhagen area and sends them over MQTT to an external database. This means you can try out your system on own electricity consumption (or mine!).
This will be an experimental study, comparing existing systems and analyzing experimental results in order to identify bottlenecks, shortcomings, and important features. This project is somewhat inspired by this recent paper from VLDB
Main research question:
How well do AutoML systems for time series data perform when forecasting domestic electricity data?
Other interesting aspects:
- How well do the systems integrate exogenous features? (weather, public holidays, pandemics, etc.)
- How well do the systems quantify uncertainty?
- How robust are these systems?
- How scalable are these systems? (time, memory, energy, etc)
- How does down-sampling or re-sampling affect forecast quality?
- How do these systems compare to simple baselines?
- Comfortable with Python
- Some experience with Machine Learning and/or Time Series Data, especially evaluation setups and metrics
- Preferably experience with Linux and working on remote machines via ssh (in order to do experiments on many machines at once)