Apache Cassandra 4.1 was a major effort by the Cassandra community to build on what was released in 4.0, and is the first of what we intend to be yearly releases. If you’re using Cassandra and want to know what’s new, or if you haven’t looked at Cassandra in a while and wonder what the community is up to, then here’s what you need to know.
First, let’s discuss why the Cassandra community is growing. Cassandra was built from the ground up to be a distributed database that could run in dispersed geographic locations, on different platforms, and be continuously available despite what the world might throw at the service. If you asked ChatGPT to describe a database that today’s developer might need, and we did, the response would look a lot like Cassandra.
Cassandra meets what developers need in terms of availability, scalability, and reliability, which are things you just can’t implement later, no matter how hard you try. The community has strived to produce tools that define and validate the most stable and reliable database possible, because it’s what supports their businesses at scale. This effort is supported by everyone who wants to run Cassandra for their applications.
Railings for new Cassandra users
One of the new features in Cassandra 4.1 that should interest those new to the project is Guardrails, a new framework that makes it easier to set up and maintain a Cassandra cluster. Guardrails provides guidance on the best deployment configuration for Cassandra. More importantly, Guardrails prevents anyone from selecting parameters or taking actions that would degrade performance or availability.
An example of this is secondary indexing. A good secondary index helps you improve performance, so having multiple secondary indexes should be even more beneficial, right? Mistaken. Having too many can degrade performance. Similarly, you can design queries that could run on too many shards and touch data on all nodes in a cluster, or use queries in conjunction with replica-side filtering, which can lead to reading all memory on all nodes. of a cluster. For those of you who have experience with Cassandra, these are known issues that you can avoid, but Guardrails makes it easy for operators to prevent new users from making the same mistakes.
Security measures are configured in Cassandra’s YAML configuration files, based on configuration, including table warnings, secondary indexes per table, partition key selections, collection sizes, and more. You can set warning thresholds that can trigger alerts and fault conditions that will prevent potentially harmful operations from occurring.
The security measures are meant to make Cassandra easier to manage, and the community is already adding more options for others to use. Some of the newcomers to the community have already created their own Guardrails and offered suggestions for others, indicating how easy Guardrails is to work with.
To make things even easier to get right, the Cassandra project has spent time simplifying the configuration format with standardized names and units, while still supporting backwards compatibility. This provides an easier and more consistent way to add new parameters for Cassandra, while reducing the risk of introducing bugs.
Improve Cassandra performance
In addition to making things easier for beginners, Cassandra 4.1 has also seen many improvements in performance and extensibility. The biggest change here is the ability to connect. Cassandra 4.1 now enables feature plugins for the database, allowing you to add capabilities and features without changing core code.
In practice, this allows you to make decisions in areas such as data storage without affecting other services such as networking or node coordination. One of the first examples of this occurred on Instagram, where the team added support for RocksDB as a storage engine for more efficient storage. This worked great as a one-off, but the Instagram team had to support it themselves. The community decided that this idea of supporting a choice in storage engines should be built into Cassandra itself.
By supporting different memory table or storage options, Cassandra allows users to tailor their database to the types of queries they want to run and how they want to implement their storage as part of Cassandra. This can also support more durable or persistent storage options. Another area of choice given to operators is how Cassandra 4.1 now supports the pluggable scheme. Previously, the cluster schema was stored only in the system tables. To support more global coordination in deployments like Kubernetes, the community added external schema storage like etcd.
Cassandra also now supports more options for network authentication and encryption. Cassandra 4.1 removes the need to have SSL certificates co-located on the same node and can instead use external key providers like HashiCorp Vault. This makes it easy to manage large deployments with many developers. Similarly, adding more options for authentication makes it easier to manage at scale.
There are a few other new features such as the new SSTable Identifiers which will make it easier to manage and backup multiple SSTables, while Partition Denylists will make it easier for operators to access entire data sets or reduce the availability of that data. to establish areas. to ensure that performance is not affected.
Cassandra’s future is full of ACID
One of the things that has always counted against Cassandra in the past is that it did not fully support ACID (atomic, consistent, isolated, durable) transactions. The reason for this is that it was difficult to get consistent transactions in a fully distributed environment and still maintain performance. Starting with version 2.0, Cassandra used the Paxos protocol to manage consistency with lightweight transactions, which provided transactions for a single partition of data. What was needed was a new consensus protocol to better align with the way Cassandra works.
Cassandra has filled this gap using Accord (PDF), a protocol that can complete consensus in one round trip instead of multiple transactions, and that can accomplish this without leading failover mechanisms. Moving towards Cassandra 5.0, the goal is to offer ACID-compliant transactions without sacrificing any of the capabilities that make Cassandra what it is today. To make this work in practice, Cassandra will support lightweight transactions and Accord, and make more options available to users based on the modular approach that exists for other features.
Cassandra was created to meet the needs of Internet businesses. Today, all businesses have to deal with similar large-scale volumes of data, the same challenges around distributing their applications for resiliency and availability, and the same desire to continue to grow their services rapidly. At the same time, Cassandra must be easier to use and meet the needs of today’s developers. The community’s work on this update has helped make that happen. We look forward to seeing you at the next Cassandra Summit, where all of these topics and more will be discussed!
Patrick McFadin is vice president of developer relations at DataStax.
—
New Tech Forum offers a place to explore and discuss emerging business technology in unprecedented depth and breadth. Selection is subjective, based on our choice of technologies that we believe are important and of most interest to InfoWorld readers. InfoWorld does not accept marketing guarantees for the publication and reserves the right to edit all content contributed. Please send all inquiries to newtechforum@infoworld.com.
Copyright © 2023 IDG Communications, Inc.
Be First to Comment