If you’re a data scientist or working with machine learning (ML) models, you have tools for labeling data, technology environments for training models, and a fundamental understanding of MLops and modelops. If you have ML models running in production, you probably use ML monitoring to identify data drift and other model risks.
Data science teams use these essential ML platforms and practices to collaborate on model development, set up infrastructure, deploy ML models in different environments, and maintain models at scale. Others looking to increase the number of models in production, improve the quality of predictions, and reduce costs in maintaining ML models will also likely need these ML lifecycle management tools.
Unfortunately, it’s not easy to explain these practices and tools to business stakeholders and budget decision makers. It’s all technical jargon for leaders who want to understand the ROI and business impact of machine learning and AI investments and prefer to stay out of the technical and operational weeds.
Data scientists, developers, and technology leaders recognize that gaining buy-in requires defining and simplifying jargon so stakeholders understand the importance of key disciplines. Following up on a previous article on explaining DevOps jargon to business executives, I thought I’d write a similar one to clarify several critical ML practices that business leaders should understand.
What is the machine learning life cycle?
As a developer or data scientist, you have an engineering process for taking new ideas from concept to delivering business value. That process includes defining the problem statement, developing and testing models, deploying models to production environments, monitoring models in production, and enabling maintenance and enhancements. We call this a lifecycle process, knowing that deployment is the first step to realizing business value, and that once in production, models are not static and will require ongoing support.
Business leaders may not understand the term life cycle. Many still view software development and data science work as one-time investments, which is one reason many organizations suffer from technology debt and data quality issues.
Explaining the life cycle in technical terms about developing, training, deploying, and monitoring models will make a business executive’s eyes glaze over. Marcus Merrell, vice president of technology strategy at Sauce Labs, suggests providing leaders with a real-world analogy.
“Machine learning is somewhat analogous to farming: the crops we know of today are the ideal result of previous generations noticing patterns, experimenting with combinations, and sharing information with other farmers to create better variations using accumulated knowledge,” he says. “Machine learning is very much the same process of observing, cascading conclusions, and compounding knowledge as your algorithm is trained.”
What I like about this analogy is that it illustrates generative learning from one growing year to the next, but can also account for real-time adjustments that can occur during a growing season due to weather, supply chain, or other factors. Whenever possible, it can be beneficial to find analogies in your industry or a domain that your business leaders understand.
What is MLops?
Most developers and data scientists think of MLops as the DevOps equivalent of machine learning. Automating infrastructure, deployment, and other engineering processes improves collaborations and helps teams focus more energy on business goals instead of manually performing technical tasks.
But all of this is in trouble for business executives who need a simple definition of MLops, especially when teams need budget for tools or time to establish best practices.
“MLops, or machine learning operations, is the practice of collaboration and communication between data science, IT, and business to help manage the full lifecycle of machine learning projects,” says Alon Gubkin, CTO and Aporia co-founder. . “MLops is about bringing together different teams and departments within an organization to ensure machine learning models are effectively deployed and maintained.”
Thibaut Gourdel, Talend’s technical director of product marketing, suggests adding a few details for more data-driven business leaders. He says, “MLops promotes the use of agile software principles applied to ML projects, such as data and model versioning, as well as continuous data validation, testing, and ML implementation to improve repeatability and reliability.” reliability of the models, in addition to their equipment ‘productivity’.
What is data drift?
As long as you can use words that convey a picture, it’s much easier to connect the term to an example or story. An executive understands drift from examples like a ship blown off course by wind, but may have difficulty translating it into the world of data, statistical distributions, and model accuracy.
“Data drift occurs when the data the model sees in production no longer resembles the historical data it was trained on,” says Krishnaram Kenthapadi, Director of AI and Fiddler AI Scientist. “It can be abrupt, such as changes in purchasing behavior caused by the COVID-19 pandemic. Regardless of how the deviation occurs, it is critical to identify these changes quickly to maintain model accuracy and reduce business impact.”
Gubkin provides a second example of when data drift is a more gradual change in the data on which the model was trained. “Data drift is like a company’s products becoming less popular over time because consumer preferences have changed.”
David Talby, CTO of John Snow Labs, shared a pervasive analogy. “Pattern drift occurs when accuracy degrades due to the changing production environment in which you operate,” he says. “Just as the value of a new car decreases the instant you drive it out of the parking lot, a model does the same, as the predictable research environment in which it was trained behaves differently in production. Regardless of how well it is working, a model will always need maintenance as the world around it changes.”
The important message for data science leaders to convey is that because data is not static, models need to be reviewed for accuracy and retrained on more recent and relevant data.
What is ML monitoring?
How does a manufacturer measure quality before its products are packaged and shipped to retailers and customers? Manufacturers use different tools to identify defects, even when an assembly line begins to show deviations from acceptable output quality. If we think of an ML model as a small manufacturing plant producing forecasts, then it makes sense that data science teams would need ML monitoring tools to check for performance and quality issues. Katie Roberts, Data Science Solutions Architect at Neo4j, says, “ML monitoring is a set of techniques used during production to detect issues that can negatively impact model performance, resulting in poor-quality data.” .
Manufacturing and QA are an easy analogy, and here are two recommendations for providing ML model monitoring details: “As companies accelerate investment in AI/ML initiatives, AI models will dramatically increase from tens Thousands. Each one must be stored securely and continuously monitored to ensure accuracy,” says Hillary Ashton, Teradata’s director of products.
What is modelops?
MLops focuses on multidisciplinary teams collaborating on model development, implementation, and maintenance. But how should leaders decide which models to invest in, which require maintenance, and where to create transparency around the costs and benefits of AI and machine learning?
These are governance concerns and part of what modelops practices and platforms aim to address. Business leaders want models, but they won’t fully understand the need and what it offers until it’s partially implemented.
That’s a problem, especially for companies looking to invest in modelops platforms. Nitin Rakesh, CEO and Managing Director of Mphasis, suggests explaining the models this way. “By focusing on models, organizations can ensure machine learning models are deployed and maintained to maximize value and ensure governance for different versions.”
Ashton suggests including an example practice. “Modelops allows data scientists to identify and remediate data quality risks, automatically detect when models are degraded, and schedule model retraining,” she says.
There are still plenty of new ML and AI capabilities, algorithms, and technologies with confusing jargon that will seep into the vocabulary of a business leader. When data specialists and technologists take the time to explain terminology in a language business leaders understand, they are more likely to gain collaborative support and buy-in for new investment.
Copyright © 2023 IDG Communications, Inc.
Be First to Comment