Digital transformation continues to be one of the main initiatives for companies. As they embark on this journey, it is essential that they leverage data strategically to be successful. Data has become a critical asset for any business, helping to increase revenue, improve customer experiences, retain customers, enable innovation, launch new products and services, and expand markets.
To capitalize on data, businesses need a platform that can support a new generation of applications and real-time information. In fact, by 2025, it is estimated that 30% of all data will be in real time. For companies to thrive in this digital environment, they must deliver exceptional customer experiences when it matters.
The document database has emerged as a popular alternative to the relational database to help businesses manage rapidly growing and increasingly complex unstructured data sets in real time. It provides document-oriented data storage, processing, and access, supports scale-out architecture using a flexible, schema-less data model, and is optimized for high performance.
Document databases support all kinds of database applications, from engagement systems to automation systems to systems of record. All of these systems help create the 360-degree customer profiles that businesses need to provide exceptional service.
Support documents more efficiently
Document databases offer a data model that supports documents more efficiently. They store each row as a document, with the flexibility to model lists, maps, and sets, which in turn can contain any number of nested columns and fields, something relational models cannot do. Since documents are variable for each business operation, this flexibility helps address new business requirements.
These attributes allow document databases to provide high performance on reads and writes, which is important when there are thousands of reads per second. As businesses grow from thousands to billions of documents, they need more CPU, storage, and network bandwidth to store and access tens and hundreds of terabytes of documents in real time. Document databases can be elastically scaled to support dynamic workloads while maintaining performance.
While some document databases can scale, some have limitations. Scale is not just about volumes of data. It’s also about latency. Today’s businesses push the envelope with scalability: they need to support ever-increasing volumes of data, and they need low-latency access to data and sub-millisecond response time. Developers can’t afford to wait to get a document into an application in real time. It has to happen quickly.
As more businesses need to do more with fewer resources, a document database must be self-service and automated to simplify management and optimization, reducing overhead and enabling greater productivity. Developers shouldn’t have to spend a lot of time optimizing queries and tuning systems.
A document database also needs API support to help quickly build modern microservices applications. Microservices deal with many APIs. Performance will slow down if an application makes 10 different API calls to 10 repositories. A document database allows these microservices applications to make a single API call.
Aerospike real-time document database at scale
A real-time document database must have an underlying data platform that provides fast ingestion, efficient storage, and powerful queries while delivering fast response times. Aerospike’s document database offers these capabilities at scales previously unattainable.
document storage
JSON, a format for storing and transporting data, has transitioned to XML to become the de facto data model for the web and is commonly used in document databases. Aerospike’s Document Database enables developers to ingest, store, and process JSON document data as Collection Data Types (CDTs): flexible, schema-free containers that provide the ability to model, organize, and query a large store of documents. JSON documents.
The CDT API models JSON documents by facilitating list and map operations within objects. The resulting aggregated CDT structures are stored and transferred using the MessagePack binary format. This highly efficient approach reduces client-side computing and network costs and adds minimal overhead to read and write calls.
Figure 1: An example of the Aerospike collection data types.
Document scaling
Aerospike’s document database uses fixed indexes and secondary indexes for nested JSON document elements, enabling it to achieve high performance and petabyte scale. Indexes prevent unnecessary scanning of an entire database for queries.
Figure 2: Secondary Aerospike indices.
The Aerospike Document Database also supports Aerospike Expressions, a domain-specific language for querying and manipulating record data and metadata. Queries using Aerospike Expressions perform fast and efficient value-based searches across documents and other data sets in Aerospike.
document query
The CDT API discussed above includes the elements necessary to build the Aerospike Document API. Using the JSONPath standard, the Aerospike Document API gives developers a programmatic way to implement CRUD (create, read, update, and delete) operations through JSON syntax.
JSONPath queries allow developers to query documents stored in Aerospike containers using JSONPath operators, functions, and filters. In Figure 3 below, developers send a JSONPath query to Aerospike indicating the appropriate key and the name of the container that stores the document, and Aerospike returns the matching data. CDT operations use Aerospike-supported syntax (non-Aerospike-supported syntax is split) and the JSONPath library processes the result. Developers can also place, remove, and add items to a path that matches a JSONPath query. Additionally, developers can query and extract documents stored in the database using SQL with Presto/Trino.
Figure 3: JSONPath queries.
Document Database Transformation
Today’s document databases often suffer from performance and scalability challenges as volumes of document data increase. The increased richness and nested structures of document data expose scaling and performance issues. Developers typically need to redesign and modify applications to provide reasonable response times when working with a terabyte of data or more.
Aerospike’s Document Data Services overcome these challenges by providing an efficient and effective way to store and query document data for large-scale web applications, in real time.
Srini Srinivasan is the founder and chief product officer of aerospike, leader in real-time data platforms. He has two decades of experience in the design, development and operation of large-scale infrastructure. He holds more than 30 patents in database, web, mobile, and distributed system technologies. He co-founded Aerospike to solve the scaling problems he experienced with the Internet and mobile systems while he was a senior director of engineering at Yahoo.
—
New Tech Forum offers a place to explore and discuss emerging business technology in unprecedented depth and breadth. Selection is subjective, based on our choice of technologies that we believe are important and of most interest to InfoWorld readers. InfoWorld does not accept marketing guarantees for the publication and reserves the right to edit all content contributed. Please send all inquiries to newtechforum@infoworld.com.
Copyright © 2023 IDG Communications, Inc.
Be First to Comment