The $30,000 Gem: Part 1
Opening your database to the world is a scary thought! But that’s exactly what we wanted to do by implementing a GraphQL endpoint. Feeling stuck with the classic REST-ish JSON API, there were a multitude of problems that we were looking to get rid of. Not being able to reuse endpoints because of under- or over-fetching, having to worry about security at every field, and dealing with inconsistencies were becoming a real headache. GraphQL seemed like a promising tool to solve that. It would allow front-end engineers to execute arbitrary queries to fetch any data they needed. Moreover, GraphQL would allow us to concentrate any and all internal data fetching calls to a single endpoint. Meaning that engineers wouldn’t have to worry about doing the right ACL check for every individual endpoint. Everybody would be able to query the GraphQL endpoint and rely on it always returning safe data.
However, as the internal discussion around GraphQL started to take off, it was clear that everybody was anxious about how we would be able to make sure that the endpoint is secure. It became apparent that it would be necessary to somehow restrict access to certain objects and attributes based on the requesting user’s relationship with those objects. For example, if you created a report, you’re allowed to see all of the details. The program that you reported it to should also certainly see the details. But other users should not be given the same privileges. In this example, there are three users who each have three different relationships with the report. Based on their relationship they are allowed to view different attributes for different reasons.
Having learned from a healthy number of security issues -- hence “the $30,000 Gem” -- we went to work on designing a layer between GraphQL and our database. This layer would be responsible for making sure that querying GraphQL would always return safe data -- and never something you would not be allowed to access. This is what this post is about. We hope to walk you through the security layer that we’ve implemented, the core principles that helped shape this layer, and what we hope to do with it next. Furthermore, we hope that this opens the door for more discussions with and suggestions from the open-source community on how we can further improve this functionality.
Lacking the creativity to come up with a funny name, we settled for a boring one.
Hiding entire rows is an easy problem. By using the correct predicate for a where-statement, we can easily hide an entire row for users who are not allowed to interact with it. It is far more challenging to provide a “filtered” version of the same object, where you are given partial access. Some fields are viewable, and others are not. For example, in HackerOne we have the concept of reports. A report can be private, or it could be publicly disclosed. Participants, meaning the reporter and the various team members, are allowed to see both the title and the vulnerability. Other users, however, are not allowed to see these attributes. This complicates the issue even more. A report could, at some point during its lifetime, be publicly disclosed. At that point, anybody would be allowed to see the title and vulnerability.
In this example, we’re continuously looking at the same object, but from the perspective of different users. For every type of relationship (reporter, member, other), there is a different set of attributes that the user is allowed to access. Also, for the publicly disclosed case, the privilege to access certain attributes depends on the state that the object itself is in. If the report is publicly disclosed, you are given access to more attributes than when it is not publicly disclosed. Not to mention that this is a clean-room example. The actual report model in HackerOne has many more properties, making the situation even more complicated.
To cope with all of this, we needed a framework that would allow us to determine the relationship between the current or acting user, and the object with attributes that they were trying to access. Given this information, we would then be able to define what attributes the user is allowed to view based on their role. Essentially this allows us to present different versions of the same object to different users, hiding information where necessary. Ideally, this framework would be easy to read and extend. It should be hard to make mistakes, and it should be easy to explore and review. Put concretely: the question “How do we determine if a user is allowed to see the vulnerability of a report?” should be answerable by anybody within seconds.
Alright, enough armchair-developing! Let’s see what this thing looks like in practice. Here is an example of how Protected Attribute would be able to protect our report model:
class ReportProtector include Protector allow :id, has_property(:public) | has_role(:participant) allow :title, has_property(:public) | has_role(:participant) allow :vulnerability, has_role(:participant) | (has_property(:public) & has_property(:full_disclosed)) allow :assigned_to, has_role(:team_member) allow :some_unreleased_feature, has_feature(:here_to_win) end
Let’s walk through this example step-by-step, and see what’s going on.
allow :id, has_property(:public) | has_role(:participant) allow :title, has_property(:public) | has_role(:participant)
Right off the bat, we start defining what attributes a report has, and who is allowed to read those attributes. For both the id and the title the same rules apply: you are allowed to view this attribute if you are a participant, or if the report is publicly disclosed. This is expressed by combining the two predicates with an OR operator.
allow :vulnerability, has_role(:participant) | (has_property(:public) & has_property(:full_disclosed))
Much like id and title, the actual vulnerability is also only visible for participants. However, not only must the report be publicly disclosed, it must be fully publicly disclosed. You see, a report can be made public in several ways. Fully public means that all the information is published, while partially public means that only select information is made public.
allow :assigned_to, has_role(:team_member)
This attribute is private information that is strictly meant for team members. No other roles, including the reporter, are allowed to view this information. As you’ll note, this means that the user making the request actually has multiple roles: they are part of both the participant's role as well as the team member role.
allow :some_unreleased_feature, has_feature(:here_to_win)
For the last attribute, you’ll note that we’re not using roles at all to determine the visibility. Here we have an unreleased feature that is in development, and by using a feature toggle, we can switch it on or off for particular users. Similarly to roles, when the feature is not available for the current user, the field will not be accessible.
Okay, now that we know how to define a protected version of a model, it’s time to figure out how to use it. Let’s get started!
def report_protector_factory(current_user, report) features = current_user.enabled_features.pluck :name role = if current_user == report.reporter :reporter elsif current_user.in? report.team.members :member else :other end properties = [ (:public if report.public?), (:full_disclosed if report.full_disclosed?), ].compact ReportProtector.new \ report, features: features, properties: properties, role: role end
Now, this is fairly essential. Here’s where we determine what the role of the user is. Similarly to has_feature, this is how Protected Attribute knows what to do when the has_role function is used. As you can see, even for this fairly simple example we already have three distinct roles. Depending on your use case, your objects may be even more complex. Also note that every user is only permitted to have one, distinct role. There is currently no support for having a user with multiple roles.
properties = [ (:public if report.public?), (:full_disclosed if report.full_disclosed?), ].compact
Aside from roles and features, there is one more type of check that can be done. Here we instantiate a list of features which describe the state that the object is in. This is used for the has_property method.
ReportProtector.new \ report, features: features, properties: properties, role: role
Now that we have all the information we need, it’s time to simply instantiate the protected version of the report. From here on out, Protected Attribute will protect the instance. When calling an attribute that you are not allowed to access, Protected Attribute will return an empty value. Conversely, if you are allowed to view that attribute, the call will be proxied to the protected object (report), and you’ll be able to see the actual value.
And this is pretty much how to use the Protected Attribute gem! You can specify what conditions should be met before somebody is allowed to access a certain attribute. Be it their role, the state of the object, or even something external, such as a feature toggle.
Request For Comments
This concludes the guided tour of Protected Attribute. We hope it was a valuable and informative experience. While we tried our best to be confident, to be completely frank, we’re not sure how useful our solution actually is. So far, Protected Attribute performs great in our GraphQL endpoint, and our coworkers find it easy to understand and extend. But there’s always room for improvement. That's why we’d love to hear suggestions from you! Is there a use-case we’ve overlooked, or is there something else we can improve upon? We’d also love to compare Protected Attribute against solutions that you build. Perhaps there are ideas that can be borrowed between multiple solutions. In short, we hope you give Protected Attribute a try real soon and tell us what you think about it on GitHub.
The Grand Plan
Alright. Up until this point, we’ve focused solely on direct object references. This is where an object is presented to a user, but not all of its attributes. There’s another side to this story that we haven’t touched upon yet.
Imagine that, instead of trying to access a single report, we want to show you a collection of reports. Moreover, you are allowed to filter and sort this collection. Now, it could very well be the case that this collection contains one or more reports of which you are not allowed to see. For example, you’re not allowed to see the title of some reports. However, if you’re still able to sort on the title, you might be able to implicitly infer the title from the report’s position in the list. If it is listed alphabetically, and the report is listed after reports starting with ‘C’, but before reports starting with ‘E’, you could assume the report title starts the character ‘D’. Given enough time, you would technically be able to infer the entire title this way.
The issue here is that our database back-end (e.g. PostgreSQL) doesn’t share the notion of protected objects that our application has. It has no idea what attributes (read: columns) the current user is allowed to see, and so cannot take this into account. This problem is something that we hope to tackle in an upcoming sequel to this blog post. We’d love to share more details with you, so please, stay tuned for Part 2!