Matching Notes

Class table inheritance in Rails

I enjoyed reading a post by James Coglan on the FutureLearn blog about refactoring a Rails app out of single table inheritance. If you haven't read it then go and read it now because it's a great walkthrough of some of the problems of STI and how FutureLearn managed to migrate away from the pain.

I found the article really interesting because I've recently dealt with a database schema that is almost identical to the one they describe. It was also for an e-learning site and the domain model was pretty much the same - courses comprised of an ordered collection of different types of content.

James writes about how the project originally used a STI approach to store the different content types and how they refactored to split each of the content types into their own table.

Schema before

+--------+  +---------+  
|courses |  |steps    |  
+--------+  +---------+  
| id     |  |id       |  
| title  |  |type     |  
+--------+  |course_id|  
            |position |  
            |title    |  
            |body     |  
            |asset_id |  
            |url      |  
            |copyright|  
            +---------+  

Refactored schema

+--------+  +------------+  +---------+  +---------+  +---------+
| courses|  |steps       |  |articles |  |videos   |  |exercises|
+--------+  +------------|  +---------+  +---------+  +---------+
| id     |  |id          |  |id       |  |id       |  |id       |
| title  |  |course_id   |  |title    |  |title    |  |title    |
+--------+  |content_type|  |body     |  |asset_id |  |body     |
            |content_id  |  |copyright|  |copyright|  |url      |
            |position    |  +---------+  +---------+  +---------+
            +------------+

The refactored schema is far better, the content types tables can be individually maintained without knock-on effects. From a data integrity point of view this means that the correct constraints can be added to each table, something which was problematic before. From an ActiveRecord point of view, adding columns to a table for one of the content types will not increase the surface area of the API for the other content types.

The refactored schema uses a join table and a polymorphic relationship to link courses and content types. However, reflecting this change in the ActiveRecord models results in a wide ranging API change. Every call to an attribute of a content type need to be changed from e.g. step.body to step.content.body.

This pattern of linking tables via a polymorphic join table is called Class Table Inheritance. Unfortunately ActiveRecord doesn't support this pattern out of the box.

In the e-learning application I have been working on, we have a very similar database schema to the refactored one shown above. However, I have had the luxury of being able to use Sequel rather than ActiveRecord and so I've been able to take advantage of using class table inheritance.

Class table inheritance in Sequel

For the past few years I've been using Sequel in all my Rails projects. More often than not I'll use Sequel Model as a straight replacement to ActiveRecord. Yes, the models still get coupled to the database schema in the same way that they do when using ActiveRecord, but I find Sequel gives me much more flexibility when it comes to database work.

So, let's look how you could approach the FutureLearn example using class table inheritance in Sequel.

The schema can be created with the following migration:

Sequel.migration do
  change do
    create_table :courses do
      primary_key :id
      String :title
    end

    create_table :steps do
      primary_key :id
      Integer :course_id,    null: false
      Integer :position,     null: false
      String  :content_type, null: false
    end

    create_table :articles do
      foreign_key :id, :steps
      String :title
      String :body
      String :copyright
    end

    create_table :videos do
      foreign_key :id, :steps
      String  :title
      Integer :asset_id
      String  :copyright
    end

    create_table :exercises do
      foreign_key :id, :steps
      String :title
      String :body
      String :url
    end
  end
end

The only difference to the FutureLearn schema is that the steps.content_id column is no longer needed, this is because we have set up foreign key relationships between the steps table and the article, video and exercise tables. The id of row in the steps table will always be the same as the id of the row in the content table it links to.

Using the class table inheritance plugin that ships with Sequel, our models can be defined as follows:

class Course < Sequel::Model
  one_to_many :steps, order: :position
end

class Step < Sequel::Model
  plugin :class_table_inheritance, key: :content_type
  many_to_one :course
end

class Articles < Step; end
class Videos < Step; end
class Exercises < Step; end

After setting this up and populating some data, we can work with the models in a way that will be familiar:

Step.all # [<#Article>, <#Video>, <#Exercise>, <#Article>, ...]

Notice here that our objects have been correctly typecast for us. We asked for the all the steps and got back a collection of articles, videos and exercises.

Similarly, this works when accessing the steps through a relationship:

course = Course.first
course.steps # [<#Article>, <#Video>, <#Exercise>]

Essentially, the CTI plugin is performing the polymorphic join for us as a single step. There's no need to access intermediary Step objects and then ask them for their content.

The objects are built and created as you would expect:

Video.new(course: course, position: 1, title: 'Learn X')     # Unsaved video object
Video.create(course: course, position: 1, title: 'Learn Y')  # Persisted video object

Note that the course and position are attributes of the content items themselves, there is no need to create Step objects.

And the properties are accessed as normal:

step = course.step.first # <#Video>
step.body                # "body text"
step.position            # 1

Note that position is accessed in the same way as body even though the columns exist on different tables.

There is no need for the extra step in the API chain that was introduced in the ActiveRecord way:

# From the FutureLearn post
step.content.body        # "body"
step.position            # 1

Since Article, Video and Exercise all have a title attribute we could have structured our schema such that title was a column on the steps table rather than a column on each of the individual tables. This would have been fine with the CTI approach, the API would remain step.title and we'd just have one column in the database that was common to all content types. Attributes that are common to all content types go in the steps table and the individual tables only include the columns that differentiate that type.

However, with the ActiveRecord approach, adding the title to the steps table would have resulted in an API inconsistency (step.title vs step.content.body) and so it's likely that you'd make sure all the content type tables included all the common columns.

Wrapping up

The point of this post isn't to show how, in a different reality, if FutureLearn had been built using different tools, they could have made a different change. There's not a whole lot of point in that.

Instead it was to highlight some of the features available in Sequel. People often ask me why I choose Sequel over ActiveRecord and my stock response is that it gives me more flexibility. When I read the FutureLearn post, I thought it was a good, concrete example that I could use to demonstrate Class Table Inheritance, a database pattern that is supported in Sequel but not in ActiveRecord.

If you're starting a new project, I'd encourage you to take a look at using Sequel, it really is a great library.

rails new myapp --skip-active-record