Hypermedia APIs: hypermedia is the state.

In my previous post in this guidelines series I discussed many reasons why versioning should not be introduced into your API, despite the existence of convenient tricks to hide some of the side effects for a time.  Many leading tech organizations argue the opposite, which might be reasonable when the likelihood of any particular API aging long enough to get past v1 is extremely low.  However, these organizations aren’t in the business of creating reliable, flexible, and enduring APIs for consumers in the long term.  In this post I’ll discuss perhaps the most mispronounced and ill-conceived acronym in an industry obsessed with acronyms: HATEOAS.

The terrible and often thrown about acronym stands for “Hypermedia as the engine of application state” and is possibly the most frustrating part of Roy Fielding’s entire dissertation.  Once understood the concept is simple, the problem is for better or more likely worse this short sentence represents the entire discussion of hypermedia in the paper.  As one of the primary fundamental tenets to a RESTful application, Roy spent precious little time in his dissertation to expand upon this rather arcane phrase.  To be fair to Dr. Fielding, he is quoted at saying he did want to actually graduate, his dissertation is foundational to many movements within technology, and an in depth discussion of HATEOAS in his paper would have been a large additional undertaking.  With that in mind, I’ll go through it in a quick manner.

The concept boils down to a very simple principle, the state transitions in the application should be invoked by hypermedia driven links.  You can throw the endless discussions on ‘nounifying’ verbs, and shoehorning data models into a CRUD paradigm to match the 4 commonly used HTTP methods.  Your resources are your resources, and their representations are whatever is required by the domain models, and that’s good because as previously discussed you should spend a lot of attention on your resource representations.  By clearly separating stateful transitions from your representations you maintain a stateless interaction and greatly reduce the complexity of your representation design.  This still leaves us with the need to present the current actions available to any particular resource to a consumer of the API.

The good news is, we’ve already set the stage with all the requirements to utilize hypermedia to control the actions of the resources through the vocabulary definition in the profile.  Regardless of the hypermedia format you are going to use, there are two components to the hypermedia you need to present for resource actions, these are the link, the rel name.  The link is URL for the consumer to follow to submit the next request.  This link may be templated in the certain cases but should generally be provided to the consumer fully composed.  This URL is provided by the service and is not constructed by a consumer, however it can be augmented with query parameters like filtering, sorting, sparse field sets, and more.  The rel name is a word in the vocabulary which corresponds to an action a particular resource can take.

The final piece of the puzzle is how do we take the vocabulary of resource and action representations and turn it into an engine of application state.  Up until this point the actions described in the vocabulary have consisted solely of a name, the rest of the definition is the type of transition and the messages or templated messages to send.  In general terms, you have safe, unsafe, and idempotent actions which can be performed on resources.  Through the use of a protocol binding profile we can get a good mapping of the profile semantics to HTTP request types.

With all of these pieces in place, we have all of the components necessary to start our engines!  The hypermedia in a representation is included dynamically in the message as the state of the resource requires to give the user choices in interacting with that resource.  Suppose you had a collection of person resources, your home document guides your hypermedia client to the root of the person resources, which is a collection of all person resources.  In this case it would be helpful to include such links as self, profile, and next to provide both documentation context to the client if necessary through profile, and an easy way for the client to know how to interact with the collection immediately.  Perhaps the first person in the collection was of interest, and the client navigates to the link for the individual person resource.  Suppose the person had a ‘doing’ property which could be sitting, standing, walking or running, and the current value of this person resource is standing.  The server would be able to include hypermedia controls with names ‘sit’, ‘walk’, and ‘run’.  The exciting part is that the client can navigate entirely based on the vocabulary presented to it, as the message and protocol bindings are provided, the client is merely responsible for composing the message as described from the parts available to it in links provided by the service itself.  The client composes a run message, and submits it using the appropriately bound http method to the URL provided in the link and the resource state transition is handled without the client ever having to consider anything about the protocol itself.

With the power of these orchestrated interactions using the right clients you can quickly interact with a hypermedia API with very little previous knowledge of the service or it’s domain.  With this type of interaction model, you don’t try to squeeze behavior into a 4 verb vocabulary, you write the vocabulary and messages which makes sense for your task, and use a protocol binding to directly map them to interactions.  Now we have a more in depth understanding of the mechanism which makes the URL pattern irrelevant to the consumer.

Leave a Reply