Feature Store and Why is it needed

BluePi
2 min readJul 20, 2022

--

When you build ML models you don’t input raw data as it rarely is in a format that can be used by the ML Models. Instead, we transform the data into features. This process is called feature engineering. Feature Engineering is a very powerful and necessary tool for building optimal models.

For example, we engineer some very complex features for our patented forecasting model for retail stores. One such feature would be the rank of a product sale in a store across all products. This being a time series data this is calculated for every week. This is an expensive feature to build requiring time and computing resources. As you might imagine creating these features requires some sort of pipelining code, which could be batch, streaming or real-time. (How to build such pipelines and key performance considerations is a topic for another post.)

Now there may be a dozen different models that we use in tandem (ensemble) to arrive at an accurate forecast. Each model is statistically very different but many of them would want to experiment with the above feature. Not only this one feature but many other such features as well. This raises some interesting questions.

  1. How do we make these latest features available for various models?
  2. How do we standardize feature definitions?
  3. How do data scientists discover existing features?

The above problems are addressed by Feature Stores. In essence, we can think of a feature store as a data warehouse for features instead of raw data. Another interesting bit about features stores is that have to perform two contradictory functions -

a) Service production model requirements at low latency

b) Provide huge volume of data to data scientists for training

Hope this sheds some light on what is a feature store and why we need it.

In the next few articles, we will look at some open source and commercial Feature Stores with a special focus on the Amazon Sagemaker Feature Store.

--

--

BluePi

Premier #AWS Consulting Partner. #AI/ML driven solutions for #retail #QSR. #MicrosoftPartners #RetailAnalytics #CloudServices #DataScience