Blog

Ruby's built-in databases - meet PStore and YAML::Store

Ruby keeps amazing me! Did you know it has not one, but two databases built right into its standard library? Okay, the two are basically the same under the hood, but still!

When you write a Ruby script, it's not unusual that you come to the point where you want to persist some data, so that when you run the script the next time, it can access that data from the previous run. Examples are the result of an API call you don't need to execute every single time the script runs, configuration values that rarely change, or a timestamp of the last time a certain action was performed.

Your first instinct will be to hook up a database like MySQL, Postgres or (shudder...) MongoDB, but you'll quickly realize this would give the word "overkill" a whole new meaning, since you basically just want to save a few simple values, and maybe you're not even sure when you will run the script the next time.

So your next thought will be to simply save the data to a file. This makes more sense, and you know you can write to a file in only one line in Ruby, so how hard can it be? But once you start writing, you notice the amount of boilerplate code quickly exceeds the amount of code the actual logic needs.

What to do? Fret not, respected sir, Ruby has you covered!

What are PStore and YAML::Store?

Let's start with PStore, since YAML::Store inherits from it.

PStore is basically a Hash that can be persisted to a file. You can store anything in that hash, save it, and load it back up later. Even Ruby objects could be saved as hash values and should be restored properly, but I'd suggest sticking to strings, Integers, arrays and hashes if possible. The contents of the hash can be nested as many layers deep as you want, so you can model the structure based on the data it needs to hold.

When saving the hash in a PStore, it is Marshaled, which means it is converted to a bytestream before writing to disk. The advantage of this is that it is very fast and space-efficient, the disadvantage is that the created file is not human-readable.

This is where YAML::Store steps in! It offers exactly the same methods and features as PStore, but uses YAML instead of Marshal to store the data. This may not be as fast as Marshal, but the result is human-readable, and since most of the time you won't save several megabytes of data, the speed difference will be unnoticeable. And since you might want to have a look at the created file to see what was saved (and possibly even change stuff right in the file), YAML::Store is often more convenient than PStore.

Should I use PStore or YAML::Store?

Use PStore if it's more important to you that reading and writing the data happens as fast as possible or if you care about the size of the created file (in which case a data store like Redis or a full-blown database like MySQL or Postgres might be better suited). Otherwise use YAML::Store.

Let's see some code!

Using PStore and YAML::Store is very straightforward. The only thing you need to remember is to always wrap each access to the store, whether you read from it or write to it, in a transaction.

require 'pstore'

# Initialize the store.
# The file will be created if it doesn't exist.
store = PStore.new('store.pstore')

# Load data from the store.
data = store.transaction { store[:data] }
# We could also use store.fetch, which does the same as Hash#fetch:
# Return the value if it exists, otherwise return the default value.
# data = store.transaction { store.fetch(:data, 'default value') }

# Do something with the data.
data[:foo] = 'bar'

store.transaction do
  # Save the data to the store.
  store[:data] = data

  # Oh wait, let's check something first, and if it's
  # not what we expect, abort the transaction and don't
  # write anything to the store.
  store.abort unless is_this_thing_on?

  # Another option is to commit early, which also returns
  # from the transaction, but writes what you have done
  # so far. In this case, store[:data] would be written
  # but store[:last_run] would not.
  store.commit if lets_commit_it

  # Save the current time so next time
  # you know when this was last run.
  store[:last_run] = Time.now
end

To use YAML::Store instead of PStore, simply replace the first lines of the code above with:

require 'yaml/store'

store = YAML::Store.new('store.yml')

After that, the YAML::Store works exactly like the PStore.

Conclusion

So there you have it, a super-simple database that is built right into Ruby, ready to be used for your next script!

Discuss this post on Hacker News

Ideas? Constructive criticism? Think I'm stupid? Let me know in the comments!