Update on 2022/4/13: We have open-sourced LinkedIn's battle-tested feature store, called Feathr, in GitHub: https://github.com/linkedin/feathr. Please also checkout our blogpost: https://azure.microsoft.com/en-us/blog/feathr-linkedin-s-feature-store-is-now-available-on-azure/
With the advance of Artificial Intelligence and machine learning, companies are starting to use complex machine learning pipelines in various applications, such as recommendation systems, fraud detection, etc. These complex systems usually require of features to support time-sensitive business applications, and the feature pipelines are maintained by different team members across various business groups.
In these machine learning systems, we see many problems that consume lots of energy from machine learning engineers and data scientists – duplicated feature engineering and online-offline skew. In addition to that, it is hard to serve features in production reliably and at scale, with low latency, especially for delivering real-time applications.
Duplicated feature engineering:
Online-offline skew:
Serving features in real-time:
To solve those problems, a concept called feature store was developed to:
Figure 1 Illustration on problems that feature store solves
With more customers choosing Azure as their trusted data and machine learning infrastructure, we want to enable customers to use their tools and services to access a feature store.
Feast (Feature store) is an open-source feature store and is part of the Linux Foundation AI & Data Foundation. It can serve feature data to models from a low-latency online store (for real-time prediction, such as Redis) or from an offline store (for scale-out batch scoring or model training, such as Azure Synapse), while also providing a central registry so customers can discover the relevant features. It also allows customers to define on demand transformations that are executed at request time.
By working with Feast team, we believe that we can bring an open, interoperable, and production ready feature store for Azure customers.
Figure 2 Feast Architecture. Source: Feast GitHub Repository
The high-level architecture diagram below articulates the following flow:
Figure 3 Azure Feature Store Architecture
To integrate Feast with Azure, we have created a dedicated repository with tutorials to show the technical steps and it is available here https://github.com/Azure/feast-azure). The table below shows these integrations:
Azure Feature Store component |
Azure Integrations |
Offline store |
Azure SQL DB Azure Synapse Dedicated SQL Pools (formerly SQL DW) Azure SQL in VM Azure Blob Storage Azure ADLS Gen2 |
Streaming input |
Azure EventHub |
Online store |
Azure Cache for Redis |
Registry store |
Azure Blob Storage Azure Kubernetes Service |
Ingestion Engine |
Azure Synapse Spark Pools |
Machine Learning Platform |
Azure Machine Learning |
Table 1 Azure Feature Store Integration with Azure Services
Azure Cache for Redis is a fully managed, in-memory cache that enables high-performance and scalable architectures. With Enterprise Tiers of Azure Redis, customers can take advantage of active geo-replication to create globally distributed caches with up to 99.999% availability, add new data structures that enhance machine learning, and gain massive cache sizes at a lower price point by using the Enterprise Flash tier to run Redis on speedy flash storage.
Roughly speaking, it’s an in-memory key-value store and enables feature store to retrieve features to meet the low latency requirements. For example, when a user ID was sent to Azure Feature Store, Azure Redis will retrieve the relevant features - the key will be user ID, and the values are the relevant features. Azure Redis makes this feature retrieval process straightforward and performant.
When users prefer to use Spark as the ingestion engine, they can use the updated feast-spark package built by Microsoft. By connecting to Azure Synapse, customers can ingest data from both batch sources such as Azure Blob Storage or Azure ADLS Gen2, as well as streaming sources such as Azure EventHub.
Azure Synapse Dedicated SQL Pool is a serverless SQL pool that automatically scales compute based on workload demand and bills for compute used per second. The serverless compute tier also automatically pauses databases during inactive periods. Simply put, this can help to save costs if customers expect features to be intermittently accessed.
Azure Machine Learning is the central place on Azure to train, test and deploy machine learning models. Feast SDK can be accessed from Azure Machine Learning that enables accessing features for model training and inference.
We’ve open sourced all the components in this GitHub repository https://github.com/Azure/feast-azure, which are consisted of two parts:
In this blog, we’ve demonstrated how customers can use Azure Feast Store with an open-source project – Feast. We are dedicated to bringing more functionalities into Azure Feature Store, so feel free to give any feedback to us by emailing azurefeaturestore@microsoft.com, or raise issues in our GitHub repo here: https://github.com/Azure/feast-azure
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.