To view the original article, please click here.
A report from MIT says that digitally mature firms are 26% more profitable than their peers. According to Forrester, data-driven companies grow at an average of more than 30% annually. Despite the potential of data to improve business performance, data analytics projects have a poor success rate. Gartner says that only 20% of analytic solutions deliver business outcomes. A report in VentureBeat says that 87% of data analytics projects never make it to production.
There are many reasons for this poor success rate, one of which, from the technical side, is the problem of model drift in data analytics. What is model drift? Model drift is the degradation of data analytics model performance due to changes in data and relationships between data variables. Model drift occurs when the accuracy of insights, especially from predictive analytics, is significantly different from the insights derived during the model’s training and deployment periods. Specifically, there are three main sources or symptoms of model drift.
• Data Drift: When the characteristics of the independent, feature or predictor variables change.
• Concept Drift: When the characteristics of the dependent, label or target variables change.
• Algorithms Drift: When the algorithms, including assumptions lose relevance due to changes in business needs.
What are the root causes of these three main sources or symptoms of model drift? The primary reason for model drift is a change in business. Business strategies and objectives change due to mergers, acquisitions and divestitures (MAD), new product introduction, new laws and regulations, entry into new markets and more. Basically, a business is a constantly evolving entity. All these disruptions will change the way original data analytics models are used by the business. Knowing the sources of model drift will help you identify the right remediation measures you will need to get the model back to an acceptable or desired level of performance.
Why does model drift matter? What is the business impact of model drift? Today, data analytics models are increasingly becoming the major drivers of business decisions and performance. This trend will continue at a much faster pace, given the rate at which data is captured and the increasing maturity of machine learning (ML) platforms. In this reality, managing model drift is critical to ensuring the accuracy of insights or predictions. Fundamentally, reducing or eliminating model drift will enhance the trust you can place in models, thereby promoting the adoption of data and analytics across your organization.
So, how can you reduce or eliminate model drift? At its core, model drift is not a technology management problem; it is a change management problem. This change in the context of data and analytics can be effectively managed by implementing the following three strategies.
First, data is a reflection of reality, and often, the degradation of data results in the degradation of model and business performance. Thus, you need to manage data drift with effective data governance practices. We all know the fundamental principle of data processing is “garbage in is garbage out.” So, identify the variables in your hypothesis, define your data quality KPIs, set targets and thresholds and track these KPIs continually to stay up to date with and changes in data quality.
Second, continuously assess your business dynamics and constantly review the relevance of the existing data analytics models with your stakeholders. While talking to your stakeholders, ask these questions:
1. Why do you want to have insights? How much do you want to know? What is the value of knowing and not knowing these insights?
2. Who owns the insights coming out of our models? Who is accountable when it comes to transforming insights into decisions and actions?
3. What are the relevant data attributes required for the model to derive accurate and timely insights?
Lastly, integrate ModelOps and DataOps practices to enable the quick and ethical replacement of the deployed analytics model with another if the business circumstances change. Data is the fuel on which the models run; without data, models have practically no business utility. Basically, the sound integration of ModelOps and DataOps practices helps in quickly progressing analytics models from the lab to production.
Overall, the best way to manage model drift is by continuously governing and monitoring your model performance with the right KPIs. While deploying data analytics models is important, what really matters are the models that are actually consumable by the business for improved business performance. As they say, change is the only constant in life, and businesses change and evolve to stay relevant, as well. Involving business stakeholders early on, reviewing any change with metrics and continuously adjusting for improvements is critical in managing model drift.