The first step to using Featuretools yourself is loading in your data.

The end-to-end demos below exhibit how you might use Featuretools to augment your existing workflows. We think that we’ve captured a fairly wide range of use-cases, but let us know if there’s something you’d like to see!

Predict Next Purchase

In this demonstration, we use a multi-table dataset of 3 million online grocery orders from Instacart to predict what a customer will buy next. We show how to generate features with automated feature engineering and build an accurate machine learning pipeline using Featuretools, which can be reused for multiple prediction problems. For more advanced users, we show how to scale that pipeline to a large dataset using Dask.

Predict Remaining Useful Life

In this example, we demonstrate rapidly building a predictive model for the Remaining Useful Life (RUL) of an engine. Using time-series data, we perform automated feature engineering on data from running engines. This example can be used as an end-to-end workflow to automatically generate features for a common time series prediction problem.

Predict Taxi Trip Duration

Over four workbooks we go into depth in several aspects of Featuretools functionality while building a model which predicts how long a New York City taxi trip will take from the pickup location. We show how to augment a basic machine learning data science pipeline quickly with Featuretools and demonstrate how to write your own custom primitives.

Predict Appointment No-Show

We use Featuretools to predict whether or not a patient will show up to a doctor’s appointment. In this end-to-end demonstration we show how to automatically create valid features which use label information. By providing a little bit of human knowledge about the time relationships between columns, we can use historical missed appointment information without leaking labels.

Predict Olympic Medals

We show how Featuretools makes it easy to incorporate automated feature engineering into your workflow using historical Olympic Games data. This demonstration shows how Featuretools simplifies data science-related code and enables us to ask innovative questions, all while automatically generating hundreds of features, improving accuracy and avoiding classic label-leakage problems.

Predict Correct Answer

Using public data from CMU Datashop we predict whether or not a student will succesfully answer a question on a given attempt. We show how an Entity Set is useful for understanding the data and how it can be used to automatically generate features. We also demonstrate how automatically generated primitives can be useful beyond improving machine learning scores: they enhance our understanding of the problem itself.


Tweets from @featuretools_py


Get the latest tutorials, releases, and demos!