90% of the data generated in the next decade will be unstructured yet most of today’s databases are built to handle structured data. NoSQL provides one alternative.
By Troy Cogburn – Trace3 Contributor
What is NoSQL?
NoSQL (Not only SQL) is an emerging open-source database approach that uses a schema-less design and distributed architectures with built-in horizontal scalability and fault tolerance. This makes it ideal for today’s large data volumes, high velocity, and wide variety of data types, AKA Big Data. NoSQL databases come in several different flavors:
- Columnar – like Cassandra and HBase
- Key-value Store – Redis, Riak, and Aerospike
- Graph – Neo4J, Objectivity
- Document Store – MongoDB and Couchbase
With more people and devices being connected to the internet, the amount of data being created is on the rise. In today’s market structured data (predefined tabular data organized in rows and columns) is growing linearly, while unstructured data (text, log files, click streams, blogs, tweets, audio and video) is growing exponentially. This indicates that the “scale-up” model of traditional databases, like RDBMS, will struggle to handle this Big Data, while NoSQL will just “scale-out” with predictable performance. Need to scale your Big Data? Just add another commodity server.
Another compelling driver for NoSQL adoption is its cost when compared with RDBMS alternatives.
One could argue that open-source RDBMS systems are a cheap, but even a free structured database like MySQL requires costly DBA labor to manage and operate. NoSQL solutions require some installation and configuration, but after that they need only minimal administration. Being open-source, NoSQL requires an enterprise support provider, but even with this overhead, they remain significantly cheaper that their RDBMS cousins.
A NoSQL Example – Cassandra
Cassandra is a columnar store NoSQL database based on Google’s BigTable and Amazon’s Dynamo projects and originally developed by Facebook for their peer-to-peer network, becoming an Apache open-source project 2009. Cassandra is extremely scalable. It uses a master-less design with no single points of failure and supports redundancy across multiple data centers and/or clouds. Its distributed storage spreads data across a swarm of commodity servers providing massive throughput and very high availability. This makes it ideal for Big Data stores, online transaction processing, online web retail, social media and write intensive applications.
Several well-known companies have adopted Cassandra. Netflix migrated from Oracle to Datastax’s Cassandra in 2010 and now stores 95% of their data on it. Companies like Nutanix are using open-source Cassandra as an embedded database while Ebay is using Datastax Cassandra as the backbone for their fraud detection infrastructure. Other adopters include Instagram, Cisco, Adobe, Hulu, Rackspace, Microsoft, and GitHub.
When NoSQL? When RDBMS?
While enterprises are eying NoSQL due to runaway structured database costs and ballooning Big Data, RDBMS still remains the best solution for most of today’s “Traditional Business Data”. If a system requires full ACID (Atomicity, Consistency, Isolation, Durability) compliance and solid referential integrity, which guarantees transactions are reliable, then a traditional RDBMS database is likely called for. On the other hand, if a Big Data solution is required then NoSQL is the ticket. NoSQL provides high availability and scalable performance for high throughput systems. While these are only rules of thumb, generally this is the pattern in which enterprises are finding the most value.
Where from here?
Even though in the next decade most data will be unstructured this doesn’t mean that RDBMS systems are on their way out. Big Data will find a home in NoSQL databases while Traditional Business Data will remain in RDBMS solutions.