It’s 8 am and a business leader is looking at a financial performance dashboard and wondering if the results are accurate. A few hours later, a customer logs into your company portal and wonders why his orders don’t show the latest pricing information. In the afternoon, the head of digital marketing is frustrated that the data feeds from his SaaS tools never made it to his customer data platform. Data scientists are also upset that they can’t retrain their machine learning models without loading the latest data sets.
These are data operations issues and they are important. Businesses should rightly expect accurate and timely data to be delivered to data visualizations, analytics platforms, customer portals, data catalogs, ML models, and wherever data is consumed.
Data management and data operations teams spend significant effort building and supporting data lakes and data warehouses. Ideally, they are powered by real-time data streams, data integration platforms, or API integrations, but many organizations still have data processing scripts and manual workflows that should be on the data debt list. Unfortunately, robust data pipelines are sometimes an afterthought, and data operations teams are often reactive in addressing source, pipeline, and quality issues in their data integrations.
in my book digital pioneer I write about the days when there were fewer data integration tools and manually fixing data quality issues was the norm. “Every data processing application has a record, and every process, regardless of how many scripts are daisy chained, also has a record. I became a wizard with Unix tools like sed, awk, grep, and find to parse these logs when looking for the root cause of a failed process.”
Today, there are much more robust tools than Unix commands for implementing observability in data pipelines. Dataops teams are responsible for going beyond connecting and transforming data sources; they must also ensure that data integrations work reliably and resolve data quality issues efficiently.
Dataops observability helps address data reliability
Observability is a practice employed by Devops teams to enable traceability across customer journeys, applications, microservices, and database functions. Practices include centralizing application log files, monitoring application performance, and using AIops platforms to correlate alerts with manageable incidents. The goal is to create visibility, resolve incidents faster, perform root cause analysis, identify performance trends, enable security forensics, and resolve production defects.
Dataops observability targets similar goals, only these tools analyze data pipelines, ensure reliable data delivery, and help resolve data quality issues.
Lior Gavish, co-founder and CTO of Monte Carlo, says: “Data observability refers to an organization’s ability to understand the state of its data at each stage of the data operations lifecycle, from ingestion to the warehouse or the lake to the business intelligence layer. where most data quality issues arise for stakeholders.
Sean Knapp, CEO and founder of Ascend.io, develops the data operations problem statement: “Observability should help identify critical factors such as the real-time operational status of pipelines and trends in the shape of data. “, says. “Delays and errors need to be identified early to ensure smooth data delivery within agreed service levels. Enterprises need to understand pipeline code breaks and data quality issues so they can be addressed quickly and not propagate to downstream consumers.”
Knapp singles out entrepreneurs as key customers for data operations pipelines. Many companies strive to become data-driven organizations, so when data pipelines are unreliable, leaders, employees, and customers suffer. Tools for data operations observability can be critical for these organizations, especially when citizen data scientists use data preparation and visualization tools as part of their daily jobs.
Chris Cooney, Coralogix Developer Advocate, says, “Observability is more than a few graphs rendered on a dashboard. It is an engineering practice that spans the entire stack, enabling teams to make better decisions.”
Observability in dataops versus devops
It is common for DevOps teams to use various monitoring tools to cover infrastructure, networks, applications, services and databases. It’s similar to data operations: same motivations, different tools. Eduardo Silva, founder and CEO of Calyptia, says: “You need to have systems in place to help make sense of that data, and no single tool will suffice. As a result, you need to ensure that your pipelines can route data to a wide variety of destinations.”
Silva recommends vendor-independent open source solutions. This approach is worth considering, especially since most organizations use multiple data lakes, databases, and data integration platforms. A data operations observability capability built into one of these data platforms may be easy to configure and deploy, but may not provide comprehensive data observability capabilities that work across platforms.
What capacities are needed? Ashwin Rajeev, Co-Founder and CTO of Acceldata.io, says, “Enterprise data observability should help overcome bottlenecks associated with building and operating trusted data pipelines.”
Rajeev explains, “Data needs to be delivered efficiently on time every time by using proper instrumentation with APIs and SDKs. Tools should have proper navigation and a breakdown that allows for comparisons. It should help data operations teams quickly identify bottlenecks and trends for faster problem resolution and performance tuning to predict and prevent incidents.”
Dataops tools with code and low-code capabilities
One aspect of data operations observability is operations: reliability and on-time delivery from source to data management platform and consumption. A second concern is the quality of the data. Armon Petrossian, Coalesce Co-Founder and CEO, says, “Data observability in data operations means ensuring business and engineering teams have access to properly cleaned, managed and transformed data so organizations can actually make business decisions. and data-driven techniques. With the current evolution in data applications, to better prepare data pipelines, organizations need to focus on tools that offer the flexibility of a code-first approach, but are GUI-based to enable enterprise scale, because not all they are software engineers, after all. .”
Therefore, data operations and thus data observability must have capabilities that appeal to coders who consume APIs and build robust real-time data pipelines. But non-programmers also need data quality and troubleshooting tools to work with their data preparation and visualization efforts.
“Just as DevOps relies heavily on low-code automation tools, so too does data operations,” Gavish adds. “As a critical component of the data operations lifecycle, data observability solutions must be easy to implement and deploy across multiple data environments.”
Monitoring Distributed Data Pipelines
For many large enterprises, reliable data pipelines and applications are not easy to implement. “Even with the help of such observability platforms, large enterprise teams struggle to avoid many incidents,” says Ramanathan Srikumar, director of solutions at Mphasis. “A key problem is that the data does not provide adequate insight into transactions that flow across multiple clouds and legacy environments.”
Hillary Ashton, Teradata’s director of products, agrees. “Modern data ecosystems are inherently distributed, which creates the difficult task of managing the state of the data throughout the entire life cycle.”
And then he shares the bottom line: “If you can’t trust your data, you’ll never become data-driven.”
Ashton recommends: “For a highly reliable data pipeline, enterprises need a 360-degree view that integrates operational, technical, and business metadata by looking at telemetry data. The view allows you to identify and correct problems such as data updates, missing records, schema changes, and unknown errors. Incorporating machine learning into the process can also help automate these tasks.”
We’ve come a long way from using Unix commands to analyze log files for data integration problems. Today’s data observation tools are much more sophisticated, but providing the business with reliable data pipelines and high-quality data processing remains a challenge for many organizations. Rise to the challenge and partner with business leaders on an agile, incremental implementation because data visualizations and ML models built on unreliable data can lead to potentially damaging and misguided decisions.
Copyright © 2023 IDG Communications, Inc.
Be First to Comment