The first step to using Featuretools yourself is loading in your data.
The end-to-end demos below exhibit how you might use Featuretools to augment your existing workflows. We think that we’ve captured a fairly wide range of use-cases, but let us know if there’s something you’d like to see!
Predict Next Purchase
In this demonstration, we use a multi-table dataset of 3 million online grocery orders from Instacart to predict what a customer will buy next. We show how to generate features with automated feature engineering and build an accurate machine learning pipeline using Featuretools, which can be reused for multiple prediction problems. For more advanced users, we show how to scale that pipeline to a large dataset using Dask.
Automated vs Manual Feature Engineering
Automated feature engineering overcomes the limitations of traditional manual feature engineering letting data scientists build better predictive models faster. This presentation summarizes the benefits of automated feature engineering and presents a use-case comparison to manual feature engineering.”
Predict Remaining Useful Life
In this example, we demonstrate rapidly building a predictive model for the Remaining Useful Life (RUL) of an engine. Using time-series data, we perform automated feature engineering on data from running engines. This example can be used as an end-to-end workflow to automatically generate features for a common time series prediction problem.
Predict Taxi Trip Duration
Over four workbooks we go into depth in several aspects of Featuretools functionality while building a model which predicts how long a New York City taxi trip will take from the pickup location. We show how to augment a basic machine learning data science pipeline quickly with Featuretools and demonstrate how to write your own custom primitives.
Predict Appointment No-Show
We use Featuretools to predict whether or not a patient will show up to a doctor’s appointment. In this end-to-end demonstration we show how to automatically create valid features which use label information. By providing a little bit of human knowledge about the time relationships between columns, we can use historical missed appointment information without leaking labels.
Predict Customer Churn
This project shows an application of Featuretools to a common use case for subscription businesses: increasing active subscribers by decreasing the number of churned customers. In a set of Jupyter Notebooks and accompanying articles, we cover the concepts and implementation of the prediction engineering, feature engineering, modeling approach to solving problems with machine learning. The end outcome is a relevant solution to the customer churn problem as well as a general-purpose framework you can apply to problems across industries. Additionally, this project demonstrates using Spark with PySpark to scale feature engineering to large datasets.
Predict Loan Repayment
Predicting which members will pay back a loan is a critical need for credit institutions. In this Jupyter Notebook, learn the basics of applying automated feature engineering to a relational dataset. Automated feature engineering delivers hundreds of relevant features, saving us hours of tedious code-writing. Once you’ve mastered the basics of automated feature engineering on this real-world dataset, you’ll be able to apply Featuretools to problems across industries.
Predict Olympic Medals
We show how Featuretools makes it easy to incorporate automated feature engineering into your workflow using historical Olympic Games data. This demonstration shows how Featuretools simplifies data science-related code and enables us to ask innovative questions, all while automatically generating hundreds of features, improving accuracy and avoiding classic label-leakage problems.
Predict Correct Answer
Using public data from CMU Datashop we predict whether or not a student will succesfully answer a question on a given attempt. We show how an Entity Set is useful for understanding the data and how it can be used to automatically generate features. We also demonstrate how automatically generated primitives can be useful beyond improving machine learning scores: they enhance our understanding of the problem itself.
Featuretools v0.3.0 is out! We're particularly excited about this release because feature calculations now run 2x faster on average and over 10x faster in some cases. See all the changes in our documentation https://t.co/KhuRJ0Tvx4 pic.twitter.com/0rhFo4RBjd— Featuretools (@featuretools_py) August 28, 2018
Wow, it's been 1 year since we open sourced @featuretools. I'm so proud of the @feature_labs team and the Featuretools community who have helped it mature into the most popular library for automated feature engineering. From one year ago: https://t.co/KHVL6rIICg.— Max Kanter (@maxk) September 27, 2018