Hypermedia APIs: Swagger is not user friendly

As a developer designing and implementing APIs for the past five years, integrating external services has always been a key component of developing the products.

As you sit in your design meeting, scrum, or stand-up the moment a new integration is mentioned, you can see the pause spread throughout the room and trepidation go through each colleague’s mind.  It is painfully obvious everyone is sharing variations on the same thoughts and questions.  How good is the documentation for this service? How long will it take to wade through the idiosyncrasies and bugs to a stable implementation?  Without knowing the specifics, everyone in the room is instantly aware of the landmines waiting for them.

These common concerns are entirely with cause, the quality range for services you may have to integrate provides a near limitless combination of difficulties.  The service being entirely undocumented isn’t even the worst case, as untrustworthy but thorough documentation can be much worse than discovery by trial and error.

Unfortunately, when implementing our own services, we often overlook or deprioritize the ease of use of our designs for the end user.  It’s an easy trap to fall into with deadlines and deliverables, that is precisely why it is so important we use designs and tools which make this simple.  Through the specification wars of the last 5 years, the CRUD-REST industry has settled on the Swagger specification (Open API Specification – OAS) as the standard for application design. While this represents real improvement over snowflake services, the use of a vocabulary driven hypermedia approach gives us all the beneficial properties of OAS as well as the long term benefits of flexibility, adaptability, and easing the burden on initial design perfection.

There are two primary problems with the solution provided by OAS namely, it tightly couples clients to the service through URLs, and requires orchestrating client changes in step with the service changes.

The first problem is easier to understand, by hardcoding the resource heirarchy to a URL and specific representation you now require tight and explicit versioning for clients to safely consume the service.  Any developer familiar with SOAP web services should be able to notice the similarities to OAS as the WSDL for a SOAP-like service without an envelope, using curly braces, and 3 extra HTTP methods.  The same arguments against the tight binding of the interface in SOAP services are becoming increasingly relevant when discussion the cons of OAS services.  The ramifications for this are immediately felt, but similar tooling has silenced detractors enough to satisfy the majority into adoption of this specification.

The second problem is much more nuanced, but far more frustrating to contend with as it is not immediately felt.  The design of SOAP and OAS lend themselves well to situations where the same group or company has control over both the service and the client.  If you distribute an SDK to wrap your service calls, or you distribute your own mobile applications, or support web applications under your control then the negative effects of the style aren’t felt until you need to perform the first major upgrade to these clients.  In this situation you can manage the negatives to a degree.  This difficulty is entirely unnecessary, but resisting the temptation to wait and deal with that problem when it comes up is hard to do.  You certainly are aware the process will be difficult while consuming resources and time, but the time and resources you are committing to the change management are in the future and your current deadlines are fast approaching.

The worst effects of this ill-advised tradeoff is felt when you are not in control of any portion of your APIs consumers.  This will be felt in cases as small as an internal microservices architecture or as large as your companies external APIs, and it will hit your bottom line directly.  If you deploy microservices which are tightly coupled to URLs and representations, you will need to manage the service dependency trees to fully deploy changes.  Assuming no changes made in one service results in a break in another, you have invited the complexity of a massive organization like Netflix to solve relatively small problem.  If this does result in a breaking change, you have lost a large portion of the benefits of a microservices architecture in tightly coupling two or more services which should be independent.  The benefits of the architectural style to the development team are obvious, but you may lose more time and resources managing the DevOps than you gain from development.  If your public facing APIs are forced to change, frequently requiring your consumers to modify their clients to meet your needs then you shouldn’t be surprised to see some of those clients explore or exit to your competitors.  When breaking changes are introduced this forces a slow release pattern, and requires your consumers as well as your team internally to manage multiple versions of your API.  As the difficulty of maintaining an integration with your service increases, the likelihood of your clients looking for alternative providers goes up from a real chance to a near certainty.

The obvious question you are probably asking is how is hypermedia any different?  If my clients bind to a domain vocabulary hasn’t this just moved the binding point with the same result?

The answer is no.  When transitioning from a CRUD API to a hypermedia API, you have moved from the realm of statically binding consumers to services to dynamic binding.  Hypermedia APIs by their nature should to be discovered at each use.  Hypermedia consumer clients should only ever have the root URL of the service statically bound.  The vocabularies can and should change over time to support changes to the understanding of the domain, or actual changes to the domain itself.  However, it is now possible to gracefully support clients as they migrate themselves at their own pace to newer portions of the vocabulary.  The client is no longer responsible for managing which version or effective version of your service they are interacting with on per call basis, the service handles this for the consumer.  Architecturally it may be necessary or easier for deployment to include multiple effective versions of a service to support this graceful transition, the key takeaway is the consumer is completely unaware of these URL changes.  The consumer is simply discovering, caching, and composing resource representations with metadata through links by their interaction with the service.  Any changes made would propagate to all clients by the end of the maximum caching period delay set by the service.  Any interactions with the service with now malformed or expired resource representations or moved resources can be managed by the ETag headers and HTTP 3xx response codes.  Clients are bound to the vocabulary, which means they are simply looking for resources and link rel-names they know, while caching information to reduce extra calls to the service for resource and service metadata.

This is a slightly more complex integration model, but the development of libraries to manage the increased complexity can release consumers from even more of their burdens, allowing them to focus on their true goal whether it is creating a UI or consuming the service for some useful purpose.

Hypermedia vs CRUD: An exaggerated comparison of API design strategies

As I have been ramping up my evangelizing of hypermedia APIs through various channels, I have noticed a common argument being thrown out against hypermedia.  It’s taken a few forms, but the crux of the argument has been something like ‘hypermedia APIs just move the hard coded binding from URLs to link names but they don’t actually solve any problems’.  In most of these discussions, the flexibility and maintainability benefits of hypermedia APIs have already been brushed aside as unimportant and irrelevant.  Now I clearly have some things to say about those points, but as the people I’ve had this recurring conversation with had little interest in those properties I’ve decided to instead address the direct and immediate benefits of a hypermedia over CRUD APIs.  Namely, these are the greatly enhanced usability and proper hiding of service implementation details.  To accomplish this, I have put together a portion of an API description in the crud pattern which is intentionally not optimized to exaggerate the point I’m making, namely that crud APIs are less usable and require the consumer to know too much about the internal implementation details of a service to consume it.  I will also list and describe the usability and proper hiding of implementation details of a hypermedia API using my hypermedia API design guidelines.

Ridiculous requirements with a CRUD API

A movie theater has created a CRUD API to managage their lighting system in order to provide optimal viewing experience for all patrons at the lowest cost for the theater.  The following API is used to manage the lighting on an individual seat basis, to provide the best lighting conditions for each screen at the lowest costs.

/screen/{screen_id}/view/{view_id}/type/{type_id}/seat/{seat_id}/lightsource/{source_id}/natural/{mirror_id}/status
/screen/{screen_id}/view/{view_id}/type/{type_id}/seat/{seat_id}/lightsource/{source_id}/artificial/{light_id}/status
/screen/{screen_id}/view/{view_id}/type/{type_id}/seat/{seat_id}/lightsource/{source_id}/natural/{mirror_id}/status
/screen/{screen_id}/view/{view_id}/type/{type_id}/seat/{seat_id}/lightsource/{source_id}/artificial/{light_id}/status
/screen/{screen_id}/orientation
/weather/current
/weather/current/sun
/calendar/{day_id}/day/light-concentration-index?longitude={longitude},latitude={latitude}
/electricity/sources
/electricity/{source_id}/cost
/electricity/current_distribution
/usersuppliedcalculation
/usersuppliedcalculation/operations
/usersuppliedcalculation/types
/usersuppliedcalculation/{user-supplied-calculation_id}/calculate
/usersuppliedcalculation/{user-supplied-calculation_id}/result/{result_id}

Documentation

Screens have views where you can see them from, the view will have a type, and those types will have seats.  Those seats will have a source of light, which is natural or artificial.  In order to best optimize the viewing experience for the members of the audience, over the course of the day the lighting requirements will change seat to seat as the cost of electricity and the availability of sunlight fluctuate.  It is also extremely important to keep track of the natural lighting conditions in order to balance the cost of providing artificial light with power provided by utilities with power supplied from the on-site solar installation.

During normal operating conditions at no point should the cost savings of providing natural light reduce the optimality of viewing by more than 5%.  Any optimality below 80% should be immediately disregarded, unless the cost savings exceeds 90%.  Views from a balcony will increase the optimality by 10% for natural light, as it requires fewer redirections on mirrors.  The orientation of the screen will decrease the optimality of natural light by 50% at 0 degrees from north, and at 180 degrees from north there will be no reduction.  There is an exponential growth curve of the decreased optimality from true south to true north.  However, the position of the sun relative to the screen will offset some or all of this decreased optimality in logarithmic fashion as the relative position of the sun approaches 180 degrees.  This factor will be then used in conjunction with sun elevation which will completely offset the orientation degradation at greater than 70% of maximum annual elevation and reduce the factor linearly as the offset approaches 50%, at which point the relative orientation factor is entirely eliminated.

The service will internally validate any request supplied and reject any status changes which do not adhere to these constraints.  The switching mechanism has a finite life and it is critical to reduce the number of attempted switches.  The service is metered to protect the switching mechanism to 1 attempted switch per hour per seat.  Additionally, only 10 switches may be switched within a 60 second rolling window.  If a seat is in the wrong state it will cause extra wear on the switching mechanism at the damage rate of 10 switches per hour that it is in the wrong switch state.

Resource Representations

weather/current
+++haze-rate – a measure of the transparency of the air.
+++overcast-percentage – a measure of the overcast percentage.
weather/current/sun
+++elevation – elevation of the angle of the top of the sun above the horizon.
+++orientation – the degrees from true north of 0 to the center of the sun.
light-concentration-index
+++value – the index value from peak of the intensity of the sun on this day for the given logitude and latitude.
electricity/sources
+++source_id – id.
+++name – name.
cost
+++source_id – source_id for this type of electricity.
+++value – cost of the electricity per unit.
+++unit – the unit of count.
current_distribution
+++sources – source name percentage value pair for current electric use. e.g solar:50%, grid 50%.

many other objects

In order to facilitate the optimization of these resources we have supplied a system for you to create calculations to simplify the process.  The format for the calculation parameter is as follows: (? parameter_name : operation_id : parameter_name )?+ .  Additionally, any open parenthesis is required to be closed, or it will fail validation.  Names of all parameters must be unique.

usersuppliedcalculation
+++type – the numeric type to be used in this calculation, ex integer, float32, float64… etc.
+++parameters – the name of the parameters used in this calculation.
+++calculation – the calculation formula to be used in this calculation.
calculate
+++parameter_values – the name value pairs for the parameters to be used to create this calculation result.  A calculation result_id can be used as a value in the format result_id={result_id}.

Most likely use case:

The consumer will obtain this document from some out of band process, and will begin to read it.  First creating objects to contain the represented data within the service as described in the documentation, which is both shown and not shown in this example.  Afterword the user will begin to exercise their client by retrieving data.  After noting the physical ramifications of an incorrectly switched light source, they will go about creating local logic in order to perform the prescribed data calculations to accurately manage the lighting.

Depending upon how thoroughly the reader had gone through the entire document to fully understand the system before writing any code, the user will most likely stumble upon the helper endpoint which allows them to declare calculations to be defined and run by the service.  This may or may not result in a rewrite of some of the functionality to leverage the service provided functionality to reduce traffic.

The clients are very tightly coupled to the servers’ representation of the resources in the current hierarchy.  Any effort to simplify the service by changing this URI pattern will result in a broken client which needs to be completely rechecked, against the new version of this documentation which may have changed many of the hierarchies and relationships between resources.  Additionally, the use of any versioning, especially in the case of a breaking change in the above hierarchies will require a complete regression testing of the consumer code in order to verify the service changes do not cause undue wear on the switching mechanisms.

This service requires the consumer to be extremely well versed in the internal workings of the switching mechanisms in order to avoid extremely undesirable outcomes.  It also requires the consumer to duplicate logic from the service creator in order to prevent bad things from happening.  The consumer is forced to take on the responsibility the service designer has decided to ignore.

Ridiculous requirements with Hypermedia

A movie theater has created a semantically driven hypermedia API to manage their lighting system in order to provide optimal viewing experience for all patrons at the lowest cost for the theater.  The following profile is used to orchestrate the service to manage the lighting on an individual seat basis, to provide the best lighting conditions for each screen at the lowest costs.

screen
+++properties …
+++affordances …
+++relationships …
+++goals …
++++++optimize-screen-lighting
view
+++properties …
+++affordances …
+++relationships …
+++goals …
seat
+++properties …
+++affordances …
+++optimize-lighting – this will calculate the optimal lighting source and power source for a given time optimize-at – date time for the service to optimize the seating
+++relationships …
+++goals …
lightsource
+++properties …
++++++type – natural or artificial
+++affordances …
+++relationships …
+++goals …
weather
+++properties …
+++affordances …
+++relationships …
+++goals …
calendar
+++properties …
+++affordances …
+++relationships …
+++goals …
electricity
+++properties …
+++affordances …
+++relationships …
+++goals …
usersuppliedcalculation
+++properties …
+++affordances …
+++relationships …
+++goals …

Documentation

There is none.  The service’s semantic profile contains the human readable descriptions of the resources, their properties, and their affordances which are freely discovered by requesting the root resource “/” of the API.  These documents serve as both the services bounded domain and human readable documentation.  The profile is defined by domain semantics and not by technical terminology, the profile semantics are then separately bound to protocol specific definitions to facilitate the implementation, but these bindings are opaque to the human consumer.

Most likely use case:

The consumer will be given the root URL of the service; with a hypermedia aware client the user will browse the available resources from root “/” where the home document will be served.  Among the contents of this document will be links to the profile which will contextualize the resources returned in the document, as well as begin to populate the local cache-controlled copy of the profile to reduce redundant and unnecessary calls for meta-data.  The user can see all of the resources available to them, where all resources are likely root resources as the hypermedia service has been appropriately flattened and complex representations are composed by link relations.

The user notes the goal ‘optimize-screen-lighting’, and that this sounds like a requirement of current effort, and navigates to the link provided by the home document for the root of the ‘screen’ resources.

The collection of ‘screens’ is returned, the service provides hypermedia metadata about the current affordances and relationships of each ‘screen’, which are rendered as links and forms with helpful descriptions to contextualize the information.  The user notices the screen has the goal noted earlier, navigates to the link within the goal and the service then curates the experience guiding the user through all interaction necessary to accomplish the goal.  However, the interaction necessary is very little, as the goal will supply the user with a link to the seats resource collection, already filtered for seats related to this screen.  Each seat in this collection will have a related link named optimize-lighting.

The client will have already cached the description and documentation which cautions the caller from following this link more than 1 time an hour, and also provides the appropriate message structure to send to perform the optimize-lighting action.

The user will then be able to write this simple procedure within their more advanced client, which would simply follow the semantic links of interest at the root through optimizing every seat.  This client would be able to skip any of the discovery steps which it has a valid cache value for, and if there is a question of validity or the cache period has expired the client can validate the ETag of the response through a HEAD call. In this way the client is utilizing HTTP caching tiers between itself and the service to enhance the apparent performance of the service, and preventing excess load on the application and better response time to the client.  The client can then simply follow the final steps of the hypermedia discovery process in order to achieve its goal, in a much more streamlined, efficient, and dynamic manner.

At no point in this process is the user required to know anything about the technical requirements or implementation details of the service.  By leveraging semantic hypermedia, the use of the service has become as intuitive as possible within the bounded domain of the profile definition, which should be written to maximize human readability and interoperability.  Once understood by a human, a machine can easily follow the same steps.  Ideally this process would be bound to user interface constructs to create generic clients with interface bindings dynamically responding to stateful hypermedia messages.

Disclaimers, conclusions, and more words

As hinted at in the very beginning of this novella of a blog entry, I do understand this example is almost over the top exaggerating the negatives of the CRUD pattern.  I am not writing this with the intent of removing the CRUD pattern from any particular toolbox.  I do hope to demonstrate to more API designers, developers, and most importantly consumers there really is a better way to do this API thing.  The tooling for the CRUD pattern is fantastic, so good in fact that even consumers are happy to take on many burdens of the service provider in order to gain the ability for rapid prototyping and code generation.

I’ll pose some questions though, what if we could remove the entire requirements for rapid code generation to stand up a prototype?  What if all we needed to do was to create the domain profile, and we could have a mock service running immediately?  Would that be enough to start the trend of demanding more usable APIs?

I hope so, because I believe it is time we stop writing snowflake services on the internet, justifying them claiming our use-case is somehow special.  Somehow we have convinced ourselves we don’t need to make our services easy to use.  I can’t see how in the hysteria of speed to market rush, we all seem to have forgotten we need to build good products first before anything can go to market.

Perhaps I’ve been spending so much time looking at the oasis of the future that this example felt so ridiculous to me.  I showed this example to a colleague today, prefacing this as a ridiculous example of a CRUD API, his first reaction “I’m never using this service, but why do you say it’s ridiculous?”.  I think we can do better; I think we should all do better.

API Evangelist and Storyteller: Checking in.

Throughout most of my career as a Software Engineer, one constant trend I have seen has been the increasing need to provide context to technical solutions and how to adapt a business requirement to technical limitations.  Sometimes the tool was not mature enough to accomplish what was wanted, and a bargain was struck on what was feasible in the short term using the tools on hand.  Other times the tools might have existed, but the solution was prohibitively costly in time, money or both.  There were countless other causes as well, but the crux of the issue was business was bending towards the will of technology, because computers don’t work the way humans do.  It was becoming my job to explain, convince, cajole, and bargain an adequate solution to a requirement instead of working towards the right solution.

History has shown tools which have provided even a small portion of the right solution, have enjoyed vast success by allowing the technical problem to be circumvented to deal with the business requirement.  WordPress, the very software this blog is running on is testament to the wild success a human tool can achieve when released into the wild.  Salesforce.com has enjoyed massive growth and penetration, despite being largely disliked by the technical community because it satisfies real business needs in a more human way.  As a developer who held a salesforce certification as a requirement of employment for a consulting company, I know how painful the experience can be for developers, but I was also able to see how well loved the platform was by users who leveraged its power with great success.

It is absolutely vital to take the next step in APIs that we don’t lose sight of the human perspective.  We must internalize the lessons of past success and push towards a human centric API space.  We are poised to fundamentally alter the course of human history.  Soon we will be able to build distributed systems upon other distributed systems, laying the groundwork functional composability on a unfathomable scale.  All of our success hinges on how many people can participate in building these new systems, and involve the most people we need to make our APIs as human as possible.

Every realized dream started with a crazy vision, beyond the realm of possibility and slowly materialized.  My vision is to humanize the API space, open the floodgates to the population, and allow everyone to participate.  This is where I am pushing, and with my guidelines for hypermedia web APIs, this is just the beginning.  The story I am going to tell is a story of inclusion, and I intend to include as many as possible.

 

Hypermedia APIs: Don’t version anything.

In my last post I, perhaps controversially, set the constraint that an API should not couple itself or document tightly to URI paths and patterns.  This comes as a stark contrast to many of the popular trends within the API space, but the long term benefits to the service over its life far outweigh the complexity and cost of implementation.  In this post I would like to discuss an additional constraint which is in part corollary to the last guideline: do not version anything at all in the service.

I will first start out by addressing the elephant in the room, this guideline also comes at a stark contrast to popular trends in the API space, as well as the established trend from some of the largest Silicon Valley technology companies.  These two guidelines add a great deal of complexity to initial API design and architecture for public APIs.  Most of these leading organizations are driven by concerns with speed to market, and therefor do not allow themselves the appropriate design time before beginning to build applications, or are far more concerned with a larger audience’s familiarity with a particular design strategy than creating a better API design.  These concerns are certainly important from a business perspective, however it is often incorrectly presented as a technical limitation and guideline, when in fact it is driven almost entirely by business motivations.

If the rush for faster minimum viable products is driven by a business concern, what are the technical benefits associated with a hypermedia web API style?  The often cited axiom in the hypermedia web API space is WWBD, or what would browser do?  This is particularly apt given HTML is probably the most familiar hypermedia format to any user of the internet regardless of their awareness of this fact.  Netflix and other continuous delivery champions are famous for deploying code to production hundreds of times per day, however as a user of their services you are never aware of any change in the service platform.  The only way this can be done is if you are never aware or tightly coupled to any type of versioning within the service interface.  Your browser never knows, or is concerned in any way with, the version of the Netflix software it is querying.  This point perfectly addresses the style of CRUD api which includes a version number within its URI pattern, but does not cover all ways to version.

The other shortcut often taken during API design is to apply a version to the MIME type itself, and has been demonstrated to be beneficial in the very near term.  This solution does manage the version disconnect at the service layer, but leaves un-patched clients unable to fully consume the service until new client versions can be distributed.  Even though this strategy is one we are very familiar with as API service integrators, it is extremely consumer hostile as all the service version management responsibility have been dumped on the consumer.  Worse still this type of versioning is unique to each integration, greatly increasing the difficulty of a service which aggregates functionality from multiple APIs to create some or all of its responses.  This strategy will solve your concerns for versioning, but it comes at a hefty cost to the API’s consumers, and if there is completion in the space, this poor experience could result in the loss of a client or consumer.

Both of these reasons taken in isolation or together should be enough to convince a reasonable designer of the importance of removing all versioning from their API.  However, there is a more fundamental reason to exclude versioning entirely and it goes back to the very first guideline.  Versioning is a solved problem within the HTTP application protocol.  I previously discussed the ETag strategy to perform cache control, but this is nothing more than a more specific form of representation versioning.  If part of the structure or field of a message representation changes between versions, normal validation processing should handle this change, and the client can update its local representation and model cache for the service from the service itself.  If a historical representation of a resource is required for audit or some other need, the memento header exists to handle requesting a resource and representation as it existed at a certain point in time.

Clearly this is not be the simplest solution, however it is a far more useful and standard way to version the resources and messages of an API.  By adhering to the standard way to perform this action, a sophisticated http consumer can always know the status of any data it holds locally and remain unconcerned about the version of the service running.  Furthermore, this sophisticated client can be used to consume other APIs which follow these guidelines, with little or no additional integration effort.

Hypermedia APIs: Stop worrying about your URI patterns.

In my last post I discussed the need to carefully construct your resource and message representations in order to increase the flexibility of your design over time.  Great care needs to be taken to avoid many of the pitfalls which can befall your design with insufficient attention.  A considerable portion of your time as a designer of a hypermedia web api should be spent designing your representations.  This post may then come as a small relief as I would like to talk about removing an entire concern from your API designer mind to free up time for the previous discussion; the URI pattern.

For the most part the guidelines previous to this one could largely be followed in a crud or hypermedia web api and you could expect improved results regardless.  This trend ends now, and depending on your attachment to the tools or strategies I’m about to reference, this might upset you.  Immediately stop spending any design energy on your URI pattern philosophy, rotation, choice, or strategy as you simply won’t need it.

The documentation and standardization around a hierarchical URI pattern generally wastes an inordinate amount of designer and developer time before some semblance of a workable design reaches the hands of a capable developer.  In order to help speed up this process of wasting time, many solutions like OAS (Swagger), Apiary.io, RAML, etc.. have been created to cut out chunks of effort to disprove a design.  This statement isn’t to be taken out of the appropriate context, as previously stated hypermedia web api’s are not right for every situation.  There are many viable situations where the solutions they offer are the correct tool for the job.  However, scaling those solutions to a large domain size with limited or no control over consumer code, and desiring high utilization endurance over time, it becomes an exceptionally difficult task to accomplish.  I would argue the longer you desire the API to run unmodified, the closer to effectively impossible the task becomes, and the point where this occurred wouldn’t be anywhere near as far in the future as you might believe.

Overloading the URI structure to contain semantic information places a tremendous amount of pressure on the designer to allow enough gap for future growth and needs within the hierarchy.  It also places a huge burden on the consumer of the service to know a lot of information about your service before being able to fully utilize it.  Kin Lane (“the api evangelist”) on the recent Api Academy podcast described this as human targeted documentation and tooling.  Meaning that while tooling can help set up some of the boilerplate code, the responsibility for tying things together is left for the human developer.

When utilizing a hypermedia web api which matches the preceding four guidelines, the URI pattern for resources is entirely irrelevant as the actual URIs are completely opaque to a consumer of the API.  With the root resource representation name of ‘user’, and assuming complete home document, and vocabulary definitions, the actual URI for the resource is meaningless.

With the same conditions stated above, the user resource being under ‘/uusdfskd231232/’ and the hypermedia client would have no difficulty whatsoever discovering the resource, nor surfacing the context to a human consumer.  The same is obviously not true of the human integration developer, whom must consult some external documentation to identify the change.  As a consequence, moving the resource to another URI would simply be a matter of making some small modifications to the profile, and this would be entirely unrecognized by the clients.  It is important to not, I do not advocate obscuring the URLs of the resource unless your design absolutely calls for it, where possible an intelligible and human friendly format should be used but it should not become an obstacle of any note.

By utilizing the representations painstakingly created following the previous guideline, your service doesn’t need to worry about creating the perfect hierarchy structure for your API, it just doesn’t.

Hypermedia APIs: Flatten those resources!

In my previous post, I discussed the need to present a Home document.  Through its presentation your API design is able to be flexible and adaptive over time, while still being easy for clients to consume and discover functionality.  In this post, I want to discuss the design and handling of the resource representations themselves.

In traditional Object Oriented design, the principles of encapsulation encourage a class designer to hierarchically wrap atomic representations in increasingly more complex representations to compose a complete model which is self-sufficient.  In the OO world, this will enhance the flexibility of the implementation as it adheres to the information hiding principle.  A class implementer is free to make any changes to the encapsulated code, and any execution of the code is bound to the interface provided by the object, and is not impacted by internal design changes.  The following json serialization might represent a good OO model for a User class, it has been truncated for brevity.

{"user":{
  "address":{
    "street":"742 Evergreen Terrace",
    "city":"Springfield",
    "state":"?",
    "zip":"doh"
  },
  "userProfile":{
    "name":{
      "first":"Homer",
      "last":"Simpson"
    },
    "accountCredentials":{
      "userName":"homers",
      "email":"homers@example.com"
    }
  }
}}

When designing the representations for resources and messages, a designer needs to abandon this long standing practice and focus on creating the least complex but semantically complete representation possible.  The value of the atomic resource and message design is the enhanced flexibility gained by the API designer to bind related resources via relationships which allows mutability of the interface over time.  This guideline can clearly be taken to an unreasonable extreme, therefor it is important to always keep an understanding of the caveat ‘semantically complete’ in mind as there will be occasions when a representation is not entirely flat.  When successfully completed, a resource model should look to the OO designer eye to be remarkably flat and in desperate need of factoring.  For example, the representation of the above OO model could be 3 separate resources with the following representations.

{
  "user": {
    "userName": "homers",
    "email": "homers@example.com"
  }
}
{
  "userProfile": {
    "name": {
      "first": "Homer",
      "last": "Simpson"
    }
  }
}
{
  "address": {
    "street": "742 Evergreen Terrace",
    "city": "Springfield",
    "state": "?",
    "zip": "doh"
  }
}

If this was the end of the process, the resulting implementation of our API design could be a disastrous and unworkable torrent of traffic for even the most minute of tasks at scale.  In order to aggregate the same collection information in a resource representation, the design must compose the related resources through embedding, transcluding, or link relation.  Through these means, and fine grained use of cache-control, a hypermedia web api can have surprisingly low overhead while still retaining the flexibility and robust nature we seek.

At this point in the discussion of these 11 guidelines for designing a hypermedia web api the reader might have noticed the heavily referential nature towards previous guidelines.  Each guideline addresses a particular constraint or sets of constraints which a designer might be inclined bypass.  However, each step is a crucial building block on the path towards a fully functional hypermedia web api.  In order to achieve the benefits of a flexible, robust, and enduring API we must focus on designing the API right the first time, and allowing it to grow as resource and affordance understanding changes.

Hypermedia APIs: Present a Home document.

In my last post, I discussed the need to document resources through vocabularies.  By leveraging vocabularies, you can provide separate documents which describe the APIs resource representations and behaviors, however you aren’t quite yet capable of presenting a uniform interface.  In this post I’ll go through the final piece we need in order to present a uniform interface to facilitate a completely discoverable hypermedia web api; a Home document.

There are a few names which have been floated to describe this resource, root document, directory, index, and home document to name a few.  The name isn’t terribly important, however the important part is to present the single valid entry point from which all clients MUST discover your APIs resources and functionality.  The Home Document specification is a great example of one method for providing the discoverability of all resources and meta-data for your service.  There are many added functionalities, but the root of the document is a list of root resources and meta-data through which a client can begin to interact and discover your API.

Another very important role of the home document is to provide a starting point to expose multiple dimensions of negotiation the service supports.  These dimensions would include the media type, goals, and even vocabularies in addition to more standard protocol negotiation elements.

By utilizing the home document, a hypermedia web api can present the uniform interface it needs to provide robust flexibility and discoverability for consumers of all types.

Hypermedia APIs: Document resources through Vocabularies

In my previous post, I discussed the need to adhere closely to the HTTP specifications.  In a similar way, in this post I would like to discuss the need to standardize your resources and actions through vocabulary definitions.

Much of the work done while designing CRUD APIs is spent building the URI resource hierarchies and determining how to reveal behavioral capabilities through non-standard designs to consumers.  There are many well-known and supported frameworks and technologies to support the CRUD API design pattern including OAS formerly Swagger, RAML, Apiary.io and more.  However, these techniques all result in interfaces which are tightly coupled to their clients, are very brittle, and become increasingly difficult to maintain over time.  Hypermedia APIs can alleviate many of the symptoms, but simply adding hypermedia to a brittle foundation (like Spring-HATEOAS) will not resolve these issues.  The problem with all of these approaches is the static nature of the API definition, the contracts they publish or describe are not expected to change over time.  If the contract is to change in any way, some external mechanism for handling the change is to be employed, and coordination or intervention is expected to resolve any disparities resulting from the change.  In other words, the API provider makes changes and the all the consumers clients break.

The way to get around the fragility and brittle nature of the static binding is simple, don’t guarantee a static contract.  Instead your design should aim to publish or subscribe to dynamic contracts which are malleable, robust, and capable of supporting constant interrogation.  It is much less effort overall to define domain vocabularies which encapsulate the resource and message representations and behaviors which obviates the need for strict static contracts.  There are a variety of ways to produce these vocabularies, including Schema.org and Application Level Profile Semantics (ALPS) among others.  The suggested approach will vary based on the use case, however Schema.org definitions are tightly coupled to the HTTP transport making ALPS the better generic decision for a transport agnostic vocabulary definitions.

There are a few often cited perceived drawbacks to this approach, and they fall hand in hand with the standard arguments against hypermedia APIs.  First, is the enhanced complexity of creating vocabularies makes the design process more difficult overall, not less.  The second is the use of these published and discoverable semantics makes the consumption of the service a more complex and costly process.

The simple answer for the first perception is to understand the vocabulary definitions are not static.  While it is good practice to attempt to account for the future the dynamic nature of the vocabulary definitions provides the opportunity to fix design mistakes and other changes.  All too often voices from the semantic web community clamoring for the creation of perfect vocabularies are confused for proponents of hypermedia driven API design who simply advocate for designs to promote lower coupling.  More importantly, the barrier to entry for contributing value in vocabulary design is not limited to the highly technical members of your team.  Members of your team with domain expertise can play a much more direct role in the design of vocabularies, allowing more technical members to focus their effort on more technical tasks.  Quite simply this perception is a fallacy, the design process should be no more complicated to define a vocabulary than to barter away on the URI structure of a CRUD API.

Discoverable semantics, late-binding definitions, or follow your nose APIs; regardless of how you want to refer to the concept there is truth to the notion that a hypermedia API is more complex to consume.  Despite this admission the sky is not actually falling on the hypermedia concept.  In fact this small complexity is one of the tradeoffs which allows the API and clients to be more loosely coupled.  A vocabulary defined hypermedia API enhances the API design flexibility while increasing the clients’ longevity by strongly urging the client design to discover the API’s resources where the consumer can go next.  Our first guideline has come back into play to provide us a solution to another difficult problem.  The service does not need to be discovered on every single request to the API.  In fact, all that needs to be done is to build a client which responds appropriately to correctly formatted cache control headers from the HTTP specification and we will greatly reduce the overhead of discovery.  Additionally, by leveraging different HTTP status codes we can broadcast to consumers a temporary or permanent move of a resource.  With these statuses the server is able to gently broadcast deprecation of resources providing a buffer period to allow clients and consumers time to adjust to the evolved API.  This greatly reduces the stress and pressure on a designer to create a perfect API on the initial version by providing non-breaking ways to make changes over time.

Both of these common misconceptions about hypermedia APIs result from a fundamental misunderstanding of the benefits and difficulties of hypermedia.  The burdens are overblown, the perceived risk is amplified, however a well-designed API should see a net reduction in complexity through its initial release and beyond.

By leveraging the documentation of your resources through vocabulary definition your design greatly reduces its fragility in the face of change, and your consumers can expect a much longer lifetime from their investments in clients.

Hypermedia APIs: Embrace the HTTP standards

In my previous post I established some guideline constraints to follow when designing a hypermedia API.  Heavy emphasis on constraints, as the complexity we intentionally introduce now will greatly reduce the complexity of larger scaling and performance issues in time.  In this post I will to discuss the first constraint, ‘The interface MUST embrace the underlying HTTP protocol’.

If you haven’t spent time on IANA or IEFT websites perusing the nearly endless supply of standards papers filled with normative and dry text, then you probably would consider yourself lucky.  There is just an absolute mountain of current and proposed standards which fall under the HTTP umbrella, so you would be forgiven for not immediately rushing there to read everything related to HTTP.  However, as an API designer you are responsible for the behavior of your design and you really should be familiar with the entirety of the basic specification of HTTP, as well as some of the more popular or useful additions.

I’ll step back and ask the obvious question, why is HTTP so important?  A restful architecture is not a goal to be sought in its own right, but for the benefits the constraints offer to the overall performance and behavior of the application.  By ensuring adherence to standards, you can expect a large range of clients to support your service immediately and for an extended service life without the burden of further technical intervention.  Perhaps most importantly, adhering to standard functional definitions within the protocol will allow a service to fully leverage the benefits of a restful architecture.  Presumably you are designing a service to send messages through HTTP, in that case as an example it would be beneficial to know when the protocol itself has a solution to a particularly difficult asynchronous application problem, like resource locking.

There are many reasons why asynchronous systems require the ability to lock a particular resource, be it exclusive access to physical or hardware resources, simultaneous editing, or even reserving items for a specific user’s purchase.  There are two main approaches to locking ‘pessimistic’ and ‘optimistic’.  Pessimistic locking requires the service to maintain the state of a resource as owned by a particular entity, but within a restful architecture we really really don’t want to add statefulness.  This is not an ivory tower argument at all, the primary desire for statelessness in this case is the driven by the desire to allow caching.  If a design were to introduce statefulness on any resource, it would be impossible for any caches to ever retain the resource as there would be no way to determine its current state.  This leaves us the optimistic approach as the only viable option for a restful hypermedia API.  Fortunately, the HTTP specification has a standard way to support optimistic locking.

To utilize optimistic locking, an HTTP service will provide metadata about a resource representation when responding to a request through the inclusion of an ETag header with a unique value for the resources state at the time of the response.  When a consumer would like to modify the resource they will simply conduct the appropriate request along with supplying the previously returned ETag header value within the ‘If-Match’ header field and ‘return=minimal’ within the ‘Prefer’ header field.  If the resource is unmodified, the service can perform the update and return a 204 status code, with a ‘Preference-Applied’ header value of ‘return’ and an empty body.  However, if the resource has changed, the service can return a 409 conflict response while ignoring the request preference and returning the current representation, or with a ‘Location’ header with the resource URI and the ‘Preference-Applied’ header of ‘return’.  It is important to note there is flexibility for implementation details within the protocol specifications for optimistic locking and most use cases should be covered.  For applications intended to scale and effectively utilize a restful architecture caching is crucial to the performance of the system, and in the scenario presented above the resource is always cacheable.

This entire scenario is certainly more engaged than a simple request to a lock sub-URI on a resource, however a very large portion of a very difficult problem has already been handled through a standard, well known, and supported process.

Embracing the use of HTTP constructs to solve application concerns for your API, will allow you as the designer to focus on tasks which build value in your product and avoid extraneous boilerplate design.

Development Guidelines for Hypermedia Web APIs

I have a confession, I have written the same application many times, probably just like you.  I’m sick of it.  I just don’t get the feeling of accomplishment when I implement core API functionality for the 10th time.  It just isn’t fun anymore.

The same discussions, the same concerns, and the same technology always results in the same application.  I do work as an API designer and developer, so I can see how there is so much overlap between projects.  Yet, in retrospect I’m forced to admit I’ve spent a lot of time doing work which wasn’t very novel and could have been better spent adding value to my employers and clients.

The tools and techniques to focus on domain value didn’t exist, and we all work with the tools we are provided, but now there is a way to look past much of the previous work and focus on getting products done.

Hypermedia Web APIs.

Just to be clear, hypermedia APIs are not a silver bullet swiss army knife to do everything you need, but they come pretty close.  This post is going to be the start of a series of posts where I discuss hypermedia web APIs and how to leverage their power and minimize the perceived or real pain created by the added complexity.

Guidelines for Designing Hypermedia Web API

The following design constraints will provide a strong foundation for your API design.

  1. The interface MUST embrace the underlying HTTP protocol.
  2. The interface MUST document resources and resource capabilities through vocabulary definitions.
  3. The interface MUST present a home document to publish resources and documentation.
  4. The interface MUST define all resources atomically, and MUST flatten resource representations.
  5. The interface MUST NOT couple or document tightly to specific URI paths and patterns.
  6. The interface MUST NOT include any versioning in its representations.
  7. The interface MUST expose applicable resource capabilities through hypermedia controls.
  8. The interface MUST respond to consumer declared goals if the goal is understood.
  9. The interface MUST be decoupled from hypermedia format, the format MUST be negotiable.
  10. The interface MUST promote flexible design, it MUST NOT present breaking changes.
  11. The interface MUST extensively leverage content negotiation.

Got that, no? Well, like everything else these days that is just the tip of the ice berg.  The key is to leverage existing standards and functionality to get this done over HTTP.

Over the next few posts I will delve a little deeper into each point, explain why we are bucking some industry trends, and begin a high level review of how to use this guide to design a hypermedia web API.

Much thanks goes out to Mike Amundsen, Steve Klabnik, Ruben Verborgh, and Kin Lane for providing extremely valuable blogs, talks, links, and dissertations to understand this stuff.