Online education portals like Udacity and Coursera are really changing the world of remote learning in significant ways. By making free and high quality education accessible to a global audience, these platforms are opening up undreamt of possibilities for communities around the world to improve, grow, and prosper in the digital economy of the 21st century. Education at top tier colleges and universities has traditionally been a social and economic privilege, but now anyone can join in the learning revolution by sitting in virtual classrooms with the world’s best and brightest educators. Whether this involves learning how to code and build smart phone apps, or starting up a new business, or learning about public health literacy, the sky is the limit of what’s now possible.

Everything about Web and Network Monitoring

An Overview of BerkeleyDB

NoSQL databases have generated a lot of buzz in recent years, but in this installment we will take a look at a NoSQL database which seems to have been around forever. BerkeleyDB – the grand-daddy of NoSQL databases – started out as a project at UC Berkeley aimed at providing a simple but powerful database management system for BSD Unix. The project was so successful that it was soon spun off to create Sleepycat Software, which was aquired by Oracle in 2006.

In a nutshell, BerkeleyDB is a high-performance, lightweight, in-process database toolkit, providing full-blown ACID transactions to high-concurrency applications that need them. While it does not come with a SQL engine out-of-the-box, SQLite can be layered on top of it as an add-on. It is also possible to configure it for RPC-based network access although that is rarely done.While in this article we will concentrate on the “original” BerkeleyDB (or BDB for short), it is worth mentioning that Oracle has two other related offerings – BerkeleyDB Java Edition – a pure Java embeddable version, and an XML Edition –  which uses XML storage and provides data access through XQuery, XPath or JSON standards.

Even if you have never heard of it, chances are you are already using BerkeleyDB – due to its high scalability and reliability, coupled with with its small footprint, it has been incorporated in many products such as:

  • Asterisk open-source PBX
  • OpenLDAP directory server
  • OpenDS directory server
  • postfix SMTP Server
  • RPM – the Redhat Package Manager
  • Subversion source code control

As its simplicity stems in fact from the lack of a SQL engine or built-in network access functionality, BerkeleyDB is essentially a shared library that applications link to in order to use its capabilities. As such, it does not require any daemons or any other processes external to the application. While BerkeleyDB is written in C, bindings exist for perl, Python, Java (via JNI), C++ and numerous other languages.

From an application development perspective, BerkleyDB treats data as key-value pairs. In that respect, it could be compared to a single table in a relational database such as mysql or Oracle. Similar to relational database tables, each BDB database (a collection of key-value pairs) has only one primary key, but additional keys can be created using so-called “secondary indices”.  A secondary index is essentially a separate database where the key is the secondary key and the value is the primary key in the first database. As an example, a database containing records of type Person can be indexed by first_name (primary key), as well as last_name and date_of_birth as secondary indices.

Like all databases, BerkeleyDB provides the usual “CRUD” semantics (Create, Read, Update and Delete). There are four underlying storage types (“Access Methods” in BerkeleyDB parlance): Btree, Hash, Queue and Recno, with Hash being by far the most widely used. As its name implies, Hash stores the data in a hash table on disk and is the preferred method for large data sets.

BerkeleyDB supports transactions through a mechanism called write-ahead logging. In essence, this means that changes are written to a log file before the database file is modified. At each checkpoint, the pending transactions are “flushed” to the database file; if the write fails for some reason, the transaction is rolled back. Nothing comes free though – in a multi-process or multi-threaded transactional application, the developer has the responsibility to issue checkpoints periodically by using a dedicated thread or by running the the db_checkpoint utility.

If you are curious to see a BerkeleyDB database in action, look no further than the /var/lib/rpm directory on any Redhat-derived Linux distribution. That is where RPM keeps data on all packages installed on the system, along with their versions, dependencies, checksums, signatures and more.

While BerkeleyDB is as close as it gets to “zero administration”, there are a few performance considerations you need to be aware of. In a future article, we will look at how Paid Monitor can help you monitor and tune up your Berkley DB database for optimum performance.

Post Tagged with

About Drago Z Kamenov

Drago has been fascinated with technology ever since he learned to program on Apple II clones as a teenager. Over the years, he has worked in enterprise application development as a C++, perl and Java/JEE developer, architect and team lead, primarily in unix and Linux environments, with occasional forays into system administration. A passionate open source enthusiast, Drago has been using Linux since 1998, more recently as his primary desktop OS. A native of Sofia, Bulgaria, he currently lives with his family in Cedar Falls, Iowa. In his spare time, he enjoys scuba diving, hiking and sailing his 16-foot catamaran.