A quick foreword. While this article and examples may be written specifically for Ruby on Rails, the technique should function equally well across different servers.
To set the stage – let’s say we have built a website for this new COVID-friendly world. Teachers can post homework assignments, give tests, etc. One of your stakeholders has come to you and noted they want one central way for their teachers to be able to monitor the state of all the assignments, tests and everything else active on the system.
Obviously, we need a new view of the data that can be access. Say we generate a new model that will encapsulate the idea of this view. Theoretically, the model could look something like this:
{
homework_id: NUMBER,
test_id: NUMBER,
status: STRING,
status_at: DATETIME
}
We see that we need to include the homework and the test ids as reference, because these statuses can be applied to either of them. While this may work at first, it’s apparent the weakness – every time we need to add another item to be tracked, we need to add yet another column to the model to track that ID. The more columns we add, the more empty columns we are going to have per row and the more messy our data is going to be.
To combat this kind of situation, we are going to use a technique that Rails refers to as Polymorphic Associations. Instead of maintaining individual columns for every type of this association, we’re going to instead maintain two columns to represent the entire association. One will indicate the type of the referenced model and the other will indicate its id. (There is obviously an assumption here about single-column ids, but this scheme could support multi-column primary keys with some refactoring).
IMPORTANT NOTE – Although we are specifically discussing Rails, the idea of Polymorphic Associations works equally well in any environment, tied more with your architecture than with your server. Furthermore, the techniques outlined later to work with dynamic permissioning are equally platform independent, relying more on how we organize our search objects than anything else (but more on that later!)
Our new model looks like the below
{
status_for_type: STRING,
status_for_id: NUMBER,
status: STRING,
status_at: DATETIME
}
In Rails, specifically, this can be indicated with an association, as seen below:
class Status < ActiveRecord::Base
belongs_to :status_for, :polymorphic => true
end
So, in this way we’re able to gracefully link our new Status class as belonging to many other classes, i.e. Homework and Test.
class Homework < ActiveRecord::Base
has_many :statuses, as: :status_for
def name
"Homework #{self.id}"
end
end
class Test < ActiveRecord::Base
has_many :statuses, as: :status_for
def name
"Test #{self.id}"
end
end
Now, if we wanted to generate a Dashboard or a Calendar with a list of statuses, we could easily do so by looking up all the matching statuses from the single table, with what the status is for as a simple lookup for each row.
statuses = Status.where(status_at: Date.today)
statuses.each do |status|
puts status.status_at.strftime("%H:%M %p") +
" - " + status.status_for.name
end
Search Permissioning
Let’s make our use case a bit more selective. In a school, there are multiple different types of users. There are teachers, students, non-teaching faculty, etc. It is likely that we won’t want everyone to see every status on their calendar. Sometimes, this might be because they won’t want to know about certain items but other times it might because certain items are private, and should only be shared with those who have access.
First Attempt
The easiest way to build out a system like this is to just check every item against a method that indicates if the user has permission. Say this method is called has_access?(user), where the user indicates who is requesting access.
We could them update our code from above as follows:
statuses = Status.where(status_at: Date.today)
statuses.select! {|status| status.has_access?(user)}
statuses.each do |status|
puts status.status_at.strftime("%H:%M %p") +
" - " + status.status_for.name
end
I’ve put the filtering on a separate line to make each statement represent one portion of the logic.
Now, on line 3, we can see that we are looping through all the returned results to ensure that we are only outputting those that the user has the ability to see. This is completely valid and will work to solve our use case.
It is, however, also extremely wasteful and very nonoptimal. Say we return 1000 statuses that are happening in the school today, but this user can only see 5 of them. We have returned from our database 995 statuses that are unnecessary. That is a lot of extra information.
Furthermore, if we are expecting to have more than one page of results, this makes paging our results significantly more difficult, for (at least) the following reasons. (Feel free to add your own!)
- We cannot correlate our offsets with the page numbers. The offsets need to be aware where the last page ended. (This means to move to page 5, we must already know the offsets for pages 1,2,3 and 4).
- We either need a very large window, to ensure that we can fill our results, or we may need to query the data multiple times in order to actually collect all the items to fill our results
Second Attempt
Another method is to find all the Tests and Homeworks that a user has permission to access and then use those values to verify the statuses we are going to pull (as the access to the status is a function of the polymorphic association).
The code for this attempt could look like the following:
class Test < ActiveRecord::Base
# ...
has_many :test_users
# ...
def self.for_user(user)
all.joins(:test_users).where(test_users: { user_id: user.id })
end
# ...
end
class Homework < ActiveRecord::Base
# ...
has_many :homework_users
# ...
def self.for_user(user)
all.joins(:homework_users).where(test_users: { user_id: user.id })
end
# ...
end
tests = Test.for_user(user)
homeworks = Homework.for_users
statuses = Status.where(status_at: Date.today)
statuses = statuses.where(status_for: tests).or(statuses.where(status_for: homeworks))
statuses.each do |status|
puts status.status_at.strftime("%H:%M %p") +
" - " + status.status_for.name
end
The `# …` in the code above is used to denote random other lines that may have been included previously in the classes, or any other methods that may normally find their ways into a standard Rails class.
Reiterating the idea we stated before, we can see above that we first gather all the homeworks that a user has permission for and then we gather all the tests that the user has permission for (in the above example, permission is represented as the user having an existing relationship to the Homework or Test in a relationship table). We then gather all the statuses that have a status_at of today as well as either matching an item in the homework list OR matching an item in the tests list.
The above method solve both our issues from before. Our offsets will not depend on anything other than the natural order of the response and will not need to window our responses to ensure we have “enough”.
Attempt Two and a Half
Where this method starts to look less appealing is when the list of valid homeworks or tests starts to grow unwieldy. After being in school for several years, a student is going to start to accrue a goodly amount of tests and probably a larger amount of homework. Our query is going to start to get rather large heading over the wire into the database.
Relational databases are well aware of our pain, however, and come bearing the gift of JOINs. (You may have noticed them before and either didn’t know what they were or were confused why I skipped their natural conclusion). I’m not going to spend a lot of time describing JOINs in detail, nor going over their pros and cons, but I will simply note that a JOIN is a way for our database to combine multiple tables so that one query can be issued across all of them.
Because the query is a bit nontrivial, I am going to write it out in SQL.
SELECT statuses.*
FROM
LEFT JOIN homeworks ON (statuses.status_for_type="Homework" AND statuses.status_for_id=homeworks.id)
LEFT JOIN homework_users ON (homework_users.homework_id=homeworks.id)
LEFT JOIN tests on (statuses.status_for_type="Test" AND statuses.status_for_id=tests.id)
LEFT JOIN test_users ON (test_users.test_id=tests.id)
WHERE
statuses.status_at = TODAY()
AND
(test_users.user_id=OUR_USER_ID OR homework_users.user_id=OUR_USER_ID)
Most of the heavy lifting here is how we set up our JOINs (specifically, our LEFT JOINs). We see that we are connecting our statuses to our users through their connection with either a Homework or a Test. The two paths we are targeting are “status -> homework -> homework_user” and “status -> test -> test_user”. One we collect all of that information, it’s just a simple matter of whittling down statuses that occurred today and statuses that match our user as either a homework_user or a test_user.
The reason for LEFT JOIN is because of that big “OR” on line 10. A homework status will not be connected to a test and a test status will not be connected to a homework, but to satisfy our needs, we’ll need both of them. A LEFT JOIN means that we will collect ALL of the “left” side of the join, whether or not there is a matching item on the “right” side (but we will not do the same in the opposite direction). It’s best understood using Venn Diagrams, which w3schools does an amazing job showing in their tutorial on the subject.
And with that, we have a single query that gather only the relevant statuses for our user, no matter how many items are added into the system.
Mission Accomplished…
… but not entirely. There is still one big problem with the technique. In order to permission properly, our query needs to not only be aware of all the different types of objects a status represents but also how they permission themselves. For our example, we used a rather simple permissioning scheme that allowed us to solve this in a simple (albeit long) SQL query, but there’s no assurance that all permissions will be as simple.
Say, for example, we didn’t have the test_users table, but instead were expected to permission based on the role of the user, the classes they may have taken or taught, and the year they were involved with those classes. All of a sudden, our JOIN table becomes significantly more complex, pulling more and more tables into the mix.
But even that doesn’t address the following problem: What if we wanted to add statuses to class projects? There’s no reason why we shouldn’t have that status to show up with all the other statuses. But, for that to happen, every area we pull our statuses would have to be aware of the existence of this new polymorphic relationship and rebuild themselves accordingly, when that shouldn’t be their responsibility. It should be the responsibility of the Project Status to permission itself!
In my next post on the subject, we will explore how to solve this dilemma and create a status solution that will seamlessly scales to any number of “status_for” relationships.