Welcome Cassandra, let’s start with a quick introduction of who you are and what you can do.
I am a high-performance distributed scalable database. I am really good at doing lots of operations and growing as your data needs increase, and to top that all off I run on commodity hardware which means I’m really great for running on the cloud.
Awesome, oh by the way, what beer are you drinking tonight?
Tonight I’m drinking a Shiner from good old Texas.
Excellent. First, I heard it on the grapevine and you did tell me that you’re highly available. Can you confirm or deny these rumors, and what does that really mean?
Highly available means that a distributed system can lose one or more of its servers and still keep the cluster as a whole up and running. In other words, because I’m designed to run on commodity hardware, failure should be expected. In my scenario, I can lose a server and keep on chugging along without the cluster going down.
Cool. So hey, what’s up with this no SQL stuff? What exactly does it mean to you Cassandra?
No SQL to me means not only SQL, and I think that that’s really important. It’s not that the no SQL race is replacing all traditional relational databases, but it means that we’re taking some of the workloads such as those that involve large amounts of data and large numbers of writes and reads per second and taking those away from the relational databases so that they can focus on the things they’re best at.
I’m seeing a lot of hype right now about clouds and big data, how’s Cassandra working in that space?
I play really nicely in both the big data and cloud space. First if you talk about cloud, the notion of the cloud is the ability to add more resources on demand as you need them. In a Cassandra cluster, every single node has the same role. That means that it’s really easy to simply add more – you don’t have to worry about adding more machines of different server types which a lot of the competition have to worry about. That means complex, and no one likes complex. As for big data, Cassandra was bred for big data from day one because it came out of two white papers from two big data giants. It was created at Facebook based off of the Google big table whitepaper as well as the Amazon dynamo paper, and so we took the architecture from Amazon and merged it with the data model from Google.
What compelling reasons would an organization choose to use you?
There are several use cases that we’re really, really good at. First and foremost, we’re a very scalable database, so if you have a use case where you data needs are going to grow – whether that’s raw storage or requests in terms of reads and writes over time, and you want to grow with them on commodity hardware then Cassandra is the logical choice. We’re also very, very good at several enterprise-class features such as durable writes which means that every write is persisted to disc – so in other words, if the power flickers you’re guaranteed not to lose data, as well as multi data center support, and in this case we have best-in-breed – or best-in-class multi data center support because we support more than two data centers. You can have as many data centers as you desire.
So what makes you so unique? What’s different about Cassandra versus the other no SQL solutions?
A few of the key points are the big ones I’ve already mentioned which are our multi data center support is the best out there, every node having the same role means extreme simplicity and yet the ability to have extreme scale, and on top of that we simply don’t lose data. We were bred to always be a high-performance database and to be a database you can simply – you have to treat data as the most important thing you have, and in our case a single machine has every write be durable as well as multiple copies replicated across machines and data centers.
So you talked a lot about real-time performance, what kind of performance bottlenecks are you guys typically solving within your customer applications?
Well there’s a couple that we solve. First, we eliminate the need for a separate caching layer because we make really smart use of RAM and memory in our cluster. That means there are write performances coming straight out of memories extremely fast. Second, we have extremely fast write performance and our writes are so fast because they don’t involve disc seeks. That’s a typical bottleneck that relational databases choke on because of write in place technology. Lastly, if anyone has an issue with putting data in multiple data centers, that’s a great one for us because you can do your writes at any of multiple data centers and you don’t have to send them all to a single centralized master.
Other than cloud and no SQL, have you seen any other technology trends in – specifically in data emerging?
The chances are, most “future” data will not be transactional, and will not be stored in a traditional relational database. It will be unstructured user data – Web click-streams, Tweets, blog posts, Facebook updates, picture uploads, documents, customer product reviews, metadata, health records, audio, video as well as machine generated data – I think you get the picture.
The number one problem facing enterprises today is HOW to store all this data and leverage it, not simply WHERE to put it. Certainly a lot will migrate to the cloud over time, but for the foreseeable future, data will reside in multiple places. And require a new way at looking at storage. Capturing, scaling, reading, and writing high volume unstructured data is a tough problem.
Enterprises need a solution capable of tracking transactions and associated data in real-time for large volume applications – remember all that unstructured data! – There is actionable intelligence buried in that data and in order to use it, today’s applications require immense storage and fast infinite scale. It’s time to look at different approaches, outside of the traditional relational DB model.
What other software companies inspire you?
We love our friends at AppDynamics because of the inventive minds and hard dedicated work they do there to make their customers successful in innovative ways.
Who’s your favorite superhero?
Batman was always my favorite super hero because he was the only super hero that did not have special powers.
And finally a lot of my friends think that you’re a super-hot chick that works in IT, is this true?
But of course. NOT!