Persist your data in YAML files instead of SQL database. Wait what?

Why YAML Record?

At Miso, we occasionally ran into situations where we wanted to persist simple data to a text (or yaml) file. Examples include landing page email forms, contact forms, feedback forms, about us “team” pages, etc. In these situations, we wanted a way to persist data to YAML and easily view the results in a text file but also manage the data through forms and controllers as any other data. To achieve this we have created a YAML-backed persistence engine called YAML Record that allows access to YAML based data using a familiar ActiveRecord API.

If you have a small amount of data to persist which changes infrequently but that you do need to update from time to time, you should investigate using YAML Record. In other cases where you have frequently updated information, several query requirements or a medium/large set of data, a SQL or NoSQL solution would probably be a better fit. Using YAML for persistence doesn’t work in every case so use this approach wisely.

YamlRecord is a standalone, simple and lightweight way to map your data from a yaml file, and can easily be integrated along with ActiveRecord and we try to provide the same APIs. On the other hand if you’re using DataMapper, I would recommend checking out dm-yaml-adapter which allows for similar functionality.

How do we use this at Miso?

We have several use cases for YamlRecord. The first one is to persist our team members profiles on our about page. Using YAML Record, we can easily add, update or remove any of this information with a RESTful architecture and standard controllers. We even implemented full DragonFly support with YAML Record so we can link up to each team member with a picture. I’ll probably cover this functionality on my next blog post.

In another case, we use YAML Record for our FAQ. From an relational data perspective, storing this questions requires only a single table without any relationships to other tables or any indexes. For cases like this, a SQL solution and the overhead of creating a migration and a table doesn’t always make sense. We definitely could use a SQL table, but having one with 10 rows and only 3 columns seems almost silly.

------------------------------------
| id | question | answer |
------------------------------------
| 1 | foo? | blablabla |
------------------------------------
| 2 | bar? | blablaaaaa |
------------------------------------
| 3 | L33t? | Y3s |
------------------------------------
|
...
|
------------------------------------
| 10 | n00b? | n0! |
------------------------------------

I would rather have a YAML file which looks like the example below. The main upside of using a YAML file is Rails is going to load entire file in memory once, that way your application doesn’t need to go back and forth to read your file each time.

---
- :id: 08cc5f757b26903b9e7b6fcc3a3fbe
:question: foo?
:answer: blablabla
- :id: 4f2e73f896d0443c2b66f57cbdb2bb
:question: bar?
:answer: blablaaaaa
- :id: 5197b037524a1629d27c69f7aa9891
:question: L33t?
:answer: Y3s
- :id: f77c91577af4bf9b1e994e1b5b9ecd
:question: n00b?
:answer: n0!

Using the same APIs as ActiveRecord

Here’s a quick sneak peek on what you can achieve with YamlRecord. Given a class BlogPost:

class BlogPost < YamlRecord::Base
# Declare your properties
properties :title, :body

# Declare source file path (config/posts.yml)
source Rails.root.join("config/posts")
end

Here’s some of actions you can do:

# Create your instance as on active record
@post = BlogPost.new(:title => "New version", :body => "This is a test post") # @post
@post.save # => true

# Alternatively
@post2 = BlogPost.create(:title => "Another post", :body => "This is the body")

# You can get your items easily
BlogPost.all # => [@post, @post2, ...]

# Update the record attributes
@post.update_attributes(:title => "Updated title") # => true

# Destroy post
@post2.destroy => true

We tried to provide all essential APIs ActiveRecord provides and more are coming. If you want to know about the available APIs, you can find it on the README or Rdoc

Feedback?

We would love to hear your feedback on this concept to persist data to a YAML file in certain simple cases and about how we can improve this library in the future. The source code is available on Github and we welcome all patches and pull requests. Feel free to share your thoughts about it.

Conclusion

We really believe this concept of persisting data in YAML file makes a lot of sense for us in particular scenarios. Building this was actually a challenge for me as it was my first gem. I want to thank Nathan for his support building the gem and I really learned a lot in the process. After all, I guess we did it for fun and as an experiment to see how this would work.

We also found these 2 gems in same vein as YAML Record, which are worthwhile to check out:

This entry was posted in All, Engineering. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback: Trackback URL.

Add a Comment

Your email is never published nor shared.

*
*

14 Comments

  1. Eric says:

    Thanks for the clear writeup on YAML Record.
    A similar approach was taken by ActiveHash/ActiveYaml, which might be of interest to your readers:
    https://github.com/zilkey/active_hash

    1. nesquena says:

      Very cool. That does seem like another excellent attempt at achieving the same goal. Thanks!

  2. José Mota says:

    This might be just what I needed to index my blog posts with Blogrite (github.com/josemota/blogrite). I was looking for something like this, thanks Nico!

  3. Abdoe says:

    just wonder why you guys dont use sqlite?

    1. nicotaing says:

      Actually at some point I’ve considered to use sqlite, I guess building a gem and persist datas in a YAML file was more fun and more challenging.

  4. I hate to be ‘that guy,’ but I was nodding my head in agreement until the “HotGirl” example. It detracts from an otherwise great project. See http://radar.oreilly.com/2011/07/sexual-harassment-at-technical.html

  5. myfreeweb says:

    “HotGirl” is fine unless you call .destroy :D No, really, that’s uncool.

    BTW, disk I/O’s gonna kill that great idea. Unless you own the fastest SSDs on the earth.

  6. Wow….that “HotGirl” example is not cool. It ruins an other wise helpful article.

  7. I don’t know – the hot girl example isn’t really that wacked. Had you named it ‘Celebrity Popularity’, people would have swallowed it better (pun intended), but in reality, it would have equated to the same exercise of rating physical attractiveness.

    That said, how do you scale the YAML technique? My DB lives in one place but is accessed by 5 web app servers. I could store the file remotely on S3 I guess, and Rails will handle the caching..but it will have to do at least a header request to see if the file has changes on every request. Any better solutions?

    -Nelson

    1. nesquena says:

      Good question. Of course it is common to have multiple app servers that all read/write data. I would say the best ways to solve it are either 1) Host the file on S3 with Rails caching 2) Use an NFS shared volume 3) Persist the YAML data in another store such as Redis (using YAMLRecord adapters).

  8. Johnny Hall says:

    If you were to use sqlite, how would you ensure that some models are persisted to sqlite and some to your principal data store?

    You are also going to have the same issue with regards to scaling with sqlite – if you have more than one web server (or dyno?) then you are going to run into problems when you save or update the file. How do you ensure that every web server has the same copy of the file? The “normal” solution to that would be to persist to some shared location. Which brings its own problems and challenges.

    Excellent work, I’m interested to see if/how you tackle the scaling issue. Or, indeed, if you need to.

    1. nesquena says:

      As I mentioned above the approaches I can see are NFS shared volume, persist to S3 or persist the yaml data in another store such as Redis. I see YAMLRecord as really helpful for quick one-off things on a small app. In our case, we are choosing to scale up some of our YAMLRecord data because it is so convenient.

  9. paulbonner says:

    Nice work. I’ve thought of doing something, but never got around to it. We make fairly heavy use of YAML for things like report definitions, RSS feed definitions, etc., that are essentially read-only once they’ve been deployed on our web servers. But editing them by hand has become increasingly tedious as their content grows. It looks like it would be pretty easy to throw together a nice YAML editor built on top of this.

    1. nesquena says:

      Right, we have all sorts of admin editing tools with standard controllers that allow management of the YAMLRecord data

  10. [...] month we released YAMLRecord, a lightweight way to persist a small dataset into a simple YAML file which is fine if you’ve [...]

One Trackback

  1. [...] month we released YAMLRecord, a lightweight way to persist a small dataset into a simple YAML file which is fine if you’ve [...]