Top 6 Sources of Free Real-Life Time Series Data

data science time series May 03, 2023
Photo by Malvestida on Unsplash

Practice makes perfect, and for data scientists working with time series data, that means that you first need access to data.

Here, we list the top 6 sources of openly available time series data, so that you can practice forecasting, classifying, or detecting anomalies.

We emphasize on real-life datasets here, so that we get used to working with missing values or noisy data.

Of course, while the datasets are free and open for anyone, make sure to properly cite them if you use them in a research paper or blog article.

Learn the latest time series analysis techniques with my free time series cheat sheet in Python! Get the implementation of statistical and deep learning techniques, all in Python and TensorFlow!

Let’s get started!

1. Statistics Canada

Statistics Canada is the national statistics office. They compile data on a wide range of subjects from census data, to agricultural, economical and social aspects of Canada.

We can search datasets by keyword, or filter by frequency from daily to annual. They even have lower frequency data, like every 2 years, every 3 years, and also occasional.

The datasets can be downloaded as a CSV file, and you can also modify your dataset, like adding columns or pivoting the table before downloading it.

If you want an example of what you can do with these datasets, check out my article on deploying a population forecasting model.

Website: Statistics Canada

2. NYC Open Data

As the name suggests, this website compiles free public data published by New York City agencies and other partners.

This is similar to Statistics Canada, in the sense that you have data in the fields of health, education, environment, and more.

They also curate the most popular datasets, which can be a great starting point if you feel overwhelmed by searching their entire database.

While you cannot filter by frequency, searching the database using keywords like “monthly” or “daily” will help you find time series data quickly.

Website: NYC Open Data

3. Monash Time Series Forecasting Repository

The people behind this repository aim to create a comprehensive list of time series datasets for forecasting to facilitate the evaluation of forecasting models.

It contains a list of 30 datasets, both publicly available and curated by their team.

Datasets come in different versions, depending on the frequency, and they also versions with missing values and without missing values, bringing the total number of datasets to 58.

The datasets cover both real-world data and competition datasets covering different domains. For example, you can find the data used in past M forecasting competitions.

For all the details on each dataset, make sure to read their paper.

They have also included the performance of various models on all their datasets, which can also help you discover forecasting techniques and see if you can reproduce the results.

Website: Monash Forecasting Repository

4. Papers With Code

Papers With Code is a website where we can consult research papers along with the code implementation of the paper.

They also have a section of all the datasets used in the papers, including time series data. We can also filter by task, whether you want to work on forecasting, classification or anomaly detection.

The only drawback in my opinion is that it’s not as intuitive as other websites to use the data, because we often need to use data loaders instead of just downloading a CSV.

Website: Papers With Code

5. Numenta Anomaly Benchmark

The Numenta Anomaly Benchmark repository contains the scripts and datasets that set benchmarks for anomaly detection in time series.

The repository has both real-life and simulated data for anomaly detection, and you can perform either point-wise anomaly detection (finding points in time that are anomalous), or pattern-wise anomaly detection (finding sequences in time that are anomalous).

Just note that the datasets and the labels are in separate files, so you have to combine information from two different files to have a complete dataset.

If you want an example of how to work with their data, check out my guide on anomaly detection in time series.

Website: Numenta Anomaly Benchmark

6. UCI Machine Learning Repository

Of course, the UCI machine learning repository makes it to this list, as it is probably one of the most popular data source to practice our data science skills.

At the time of writing, it contains 126 time series datasets, and you can filter by task (like classification or regression), by domain, and also by number of attributes and instances.

Website: UCI Machine Learning Repository

Conclusion

There you have it, a list of my favourite places to get open real-life time series data. We can only get better by practising and facing new situations, and I hope that these sources will help you do that!

For anything related to time series, make sure to follow me as I publish many articles related to working with time series data.

We can also keep in touch on LinkedIn!

Cheers! 🍻

Stay connected with news and updates!

Join the mailing list to receive the latest articles, course announcements, and VIP invitations!
Don't worry, your information will not be shared.

I don't have the time to spam you and I'll never sell your information to anyone.