Development Guidelines for Hypermedia Web APIs

I have a confession, I have written the same application many times, probably just like you.  I’m sick of it.  I just don’t get the feeling of accomplishment when I implement core API functionality for the 10th time.  It just isn’t fun anymore.

The same discussions, the same concerns, and the same technology always results in the same application.  I do work as an API designer and developer, so I can see how there is so much overlap between projects.  Yet, in retrospect I’m forced to admit I’ve spent a lot of time doing work which wasn’t very novel and could have been better spent adding value to my employers and clients.

The tools and techniques to focus on domain value didn’t exist, and we all work with the tools we are provided, but now there is a way to look past much of the previous work and focus on getting products done.

Hypermedia Web APIs.

Just to be clear, hypermedia APIs are not a silver bullet swiss army knife to do everything you need, but they come pretty close.  This post is going to be the start of a series of posts where I discuss hypermedia web APIs and how to leverage their power and minimize the perceived or real pain created by the added complexity.

Guidelines for Designing Hypermedia Web API

The following design constraints will provide a strong foundation for your API design.

  1. The interface MUST embrace the underlying HTTP protocol.
  2. The interface MUST document resources and resource capabilities through vocabulary definitions.
  3. The interface MUST present a home document to publish resources and documentation.
  4. The interface MUST define all resources atomically, and MUST flatten resource representations.
  5. The interface MUST NOT couple or document tightly to specific URI paths and patterns.
  6. The interface MUST NOT include any versioning in its representations.
  7. The interface MUST expose applicable resource capabilities through hypermedia controls.
  8. The interface MUST respond to consumer declared goals if the goal is understood.
  9. The interface MUST be decoupled from hypermedia format, the format MUST be negotiable.
  10. The interface MUST promote flexible design, it MUST NOT present breaking changes.
  11. The interface MUST extensively leverage content negotiation.

Got that, no? Well, like everything else these days that is just the tip of the ice berg.  The key is to leverage existing standards and functionality to get this done over HTTP.

Over the next few posts I will delve a little deeper into each point, explain why we are bucking some industry trends, and begin a high level review of how to use this guide to design a hypermedia web API.

Much thanks goes out to Mike Amundsen, Steve Klabnik, Ruben Verborgh, and Kin Lane for providing extremely valuable blogs, talks, links, and dissertations to understand this stuff.

14 thoughts on “Development Guidelines for Hypermedia Web APIs

  1. Hi Michael,

    Interesting initiative!

    A couple of remarks:
    1) The interface MUST embrace the underlying HTTP protocol.

    For hypermedia-driven _Web_ APIs, yes. Probably important to clarify.
    Additionally, “embrace” is maybe too weak a word here.

    5) The interface MUST NOT couple or document tightly to specific URI paths and patterns.

    I would phrase this differently.
    After all, the server _will_ be coupled tightly to URI patterns.

    Instead, I’d say that the client MUST NOT attempt to interpret URIs.

    7) The interface MUST NOT define resource capabilities within representations.

    I don’t understand—should this be “MUST”?

    9) …the format MUST be negotiable.

    It depends; sometimes, there’s only one format available.
    If there are multiple, then negotiation should be the answer.

    11) The interface MUST extensively leverage content negotiation.

    What is the difference with 9 and what is “extensively” in this context?

    Best,

    Ruben

  2. Ruben thanks for the comments.

    1) I think you’re entirely accurate with the _web_ insertion, I’ll add that in to be more specific, as this is the intended meaning. As for the ’embrace’ word choice, the primary driver for the guidelines is creating a more approachable guide to hypermedia with as little abrasive or normative language throughout as possible. I’m at a loss for a better word for a header which means ‘utilize existing standard functionality unless it doesn’t exist’.

    5) I am actually on the fence with this one. With the existing standardized formats, I see benefits for URI ‘sym-links’. The reason is I view hypermedia formats as sister languages and some of those formats have been opinionated on URI structure within the hypermedia format specification. I don’t think it was a good idea, but now that it is done and they are used, I am attempting to work through a way forward with existing standards. I am wary that this admission is opening a can of worms, but unless I resort to xkcd 927 I don’t know how to require tight URI binding as you suggest.

    I do however like the additional stipulation of not interpreting the URI, I’ll think a bit about this one.

    7) I think this is confusing because I forgot the first word in ‘resource representation’.

    9) I disagree, I think we should be designing APIs to be format agnostic, if there is any issue with transitioning from one format to another, you don’t have a ‘hypermedia web api’ you have a ‘hypermedia format web api’. A negotiable list of 1 is still negotiable, it just doesn’t have any additional options. However, it doesn’t add the artificial constraint of another message structure or condition to manage.

    11) I’ll get more into this in a later post, but the essence is that format, vocabularies, and therefor goals should be negotiable in addition to the standard http components.

  3. > I’m at a loss for a better word for a header which means ‘utilize existing standard functionality unless it doesn’t exist’.

    “follows” (wherever applicable)

    Because even in the “doesn’t exist” case, it should still follow HTTP,
    i.e., use/create compatible solutions.

    > With the existing standardized formats, I see benefits for URI ‘sym-links’.

    Not sure I follow here; maybe my original statement wasn’t clear.

    What I mean is the following.
    Given an URL like /people/35/friends,
    the client is _not_ allowed to
    bind to the pattern of /people/{id}/{collection}.

    However, the server (which hosts the interface),
    _is_ allowed to bind to that pattern,
    and that is very likely how the server is implemented.

    > 7) I think this is confusing because I forgot the first word in ‘resource representation’.

    Then we disagree.
    The purpose of hypermedia is to describe
    the capabilities of a resource in-band
    through hypermedia controls.

    > 9) I disagree, I think we should be designing APIs to be format agnostic,

    Depending on the definition of “format-agnostic”, yes or no.

    No in the sense that Fielding writes that
    ”A REST API should spend almost all of its descriptive effort
    in defining the media type(s) used for representing resources
    and driving application state”
    (http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven)

    Yes in the sense that he continues the sentence with
    “or in defining extended relation names and/or
    hypertext-enabled mark-up for existing standard media types.”

    So if you can describe such relations across media types,
    then that’s a great way to be format-agnostic.
    Also see my https://ruben.verborgh.org/blog/2015/10/06/turtles-all-the-way-down/

    > if there is any issue with transitioning from one format to another, you don’t have a ‘hypermedia web api’ you have a ‘hypermedia format web api’.

    I disagree; where did you find this statement?

    > A negotiable list of 1 is still negotiable, it just doesn’t have any additional options.

    “Hey, let’s negotiate. You either pay me $100 or you pay me $100 ;-)”

    > However, it doesn’t add the artificial constraint of another message structure or condition to manage.

    It’s just a choice of words then I presume;
    I’d call it a representation-independent resource design.

    > 11) I’ll get more into this in a later post, but the essence is that format, vocabularies, and therefor goals should be negotiable in addition to the standard http components.

    So that’s multidimensional content negotiation then?
    I’d still put that under 9.

    You might be interested in https://ruben.verborgh.org/articles/fine-grained-content-negotiation/
    and the related W3C Workgroup that will kick off soon.

    1. > “follows” (wherever applicable) …

      I considered follows, but there isn’t much imperative behind the word. While I would love to live in the world where everyone went back to the standards body for the appropriate course of action to extend HTTP, that simply won’t happen in the wild. In this case designing with an eye towards functionality which can be rolled back into a HTTP extension of some sort is obviously the preferred path.

      > Not sure I follow here; maybe my original statement wasn’t clear….

      Your statement was clear, but your assumption that the server binds to one URI pattern specifically is what I disagree with, and was talking towards. The idea is a hypermedia format agnostic service, and unfortunately some ill advised decisions were made in existing hypermedia formats which prescribe certain URI patterns within their specification. So in your example the client would be prohibited from binding to a uri pattern, however the server would bind a resource to one or more URIs, with the appropriate hypermedia format content-types supported at each URI.

      >Then we disagree.
      The purpose of hypermedia is to describe …

      Yes we do. I won’t beat around the bush so to speak, I am not a member of the semantic web driving team. I believe one of hypermedia’s functions can be as you say, but also it can (and is with HTML) be used for contextual handling where a priori knowledge is reduced to a standard set, and clients can be built to function much more flexibly for an extended period of time. Hypermedia isn’t just about linking data between domains, and I think within the industry the most immediate gains can be had by creating more generalized clients capable of handling many hypermedia formats.

      > Depending on the definition of “format-agnostic”, yes or no…

      My position rests solely on the latter of the two. Hypermedia formats and their creation are the first focus you mention, creating vocabularies (like HTML) where the links have relationship names can be used to create more generic and reusable clients.

      >I disagree; where did you find this statement?

      My head 😀 the ‘format’ was a replacement token for a specific format, like json:api. Like you alluded to before, my concern and aim is to build on the work which has been done to isolate the information required for each hypermedia format and translate the data, links, and meta-data into the appropriate representation.

      >“Hey, let’s negotiate. You either pay me $100 or you pay me $100 ;-)”

      Hey! You added another option! [{“pay”:100},{“pay”:100}]!!

      Joking aside, it’s a very common business concern to want extensibility but work with a particular technology / functionality right now. The reason this is important is to prevent the shortcut taking which binds all API design to the individual quirks and constraints of a particular format, which will make the second, third..etc formats far more difficult to include as well.

      > It’s just a choice of words then I presume;
      I’d call it a representation-independent resource design.

      Sure, but it isn’t just that it’s the stuff mentioned above which subtly creeps in through inattention or deadline pressure.

      >So that’s multidimensional content negotiation then?

      Precisely, and that particular blog is one of the reasons your name appeared as inspiration to the guidelines. The primary reason for breaking out 11 from 9 is essentially the audience, and the need to extensively provide details for an audience comfortable with RMM2 concepts, while addressing the concerns and benefits for RMM 4/5 requirements.

  4. > some ill advised decisions were made in existing hypermedia formats which prescribe certain URI patterns within their specification.

    That’s definitely a mistake indeed.

    > So in your example the client would be prohibited from binding to a uri pattern, however the server would bind a resource to one or more URIs

    Yes.

    I don’t think it really matters,
    but I just noticed that some people get confused
    when we say “URIs are opaque”,
    because they only are to clients;
    servers who mint the URI are allowed to do more.

    > Then we disagree.
    > The purpose of hypermedia is to describe …
    >
    > Yes we do. I won’t beat around the bush so to speak, I am not a member of the semantic web driving team.

    I am, but SemWeb is just one possible implementation.

    What it really is about is the core of hypermedia,
    i.e., hypermedia as the engine of application state.
    The fact that a server gives a client links and forms
    that afford a certain activity.
    The presence of a certain form
    indicates that a certain functionality is available.

    That is a necessary condition for hypermedia APIs,
    hence my firm disagreement with your 7.

    > I believe one of hypermedia’s functions can be as you say, but also it can (and is with HTML) be used for contextual handling where a priori knowledge is reduced to a standard set

    Yes, but even then, the response indicates capabilities
    through the presence of hypermedia forms.

    The only difference between HTML and hardcore SemWebby forms
    is that the SemWeb stuff has deeper semantics for things.
    But both require the indication of functionalities
    (just not at the same level).

    So in order to defend 7, you’d need to convince me that
    the inclusion of forms in hypermedia is not necessary
    (and I will never agree to that).
    Because as soon as you have forms, you define resource capabilities.

  5. > I am, but SemWeb is just one possible implementation. …

    I think you are misunderstanding my intent, and it is likely my fault due to poor wording. Perhaps there is a way to be more clear and concise. #7 is not about preventing binding hypermedia to the data at response time, it is about not defining the resource affordances through the resource representation. A resources affordances should be defined in the vocabularies, not the representation. The service would then translate the available affordances based on the response in a hypermedia format in order to conform to the HATEOAS constraint.

    Like I said, your explanations of your ‘disagreement’ are falling firmly in line with my intent, so I should spend some time trying to be more clear with my concise tagline headings.

    I’ll leave the rest unaddressed as it is entirely dependent on the belief that #7 precludes hypermedia as part of the service response. It is a guideline in organization of resource and affordance definition, not an assault on hypermedia’s necessity.

    OT: I read pretty much everything on your site hypermedia related, so I’m well aware of your semweb fanhood, that doesn’t mean I don’t think many of your ideas taken out of that context aren’t fantastic. I however disagree with the semweb cause, as the primary benefits in the near and mid term are to data aggregators and not the citizens of the web while potentially harming those same citizens. Ultimately, it would be great but I don’t think our infrastructure and architecture is mature enough and distributed enough to benefit users yet.

  6. > it is about not defining the resource affordances through the resource representation. A resources affordances should be defined in the vocabularies, not the representation.

    So the misunderstanding is in the word “defining” then.
    We seem to agree that representations “list” the affordances.

    What you seem to require is that the “explanation” of the affordances is external.

    Then my criticism boils down to three points:

    1) I’d suggest an explicit point about having the list of affordances in the representation.
    This is an essential point of hypermedia APIs, which is not touched upon.

    2) There seems to be an overlap between 2 and 7.

    3) I don’t see why the in-band explanation of affordances would need to be forbidden.
    Okay that you don’t require it, but way forbid it? I.e., why the strong “MUST NOT”?
    Most human-readable forms on the Web come with an explanation,
    so why deprive machines—who need that even more—from such explanations?

    > so I’m well aware of your semweb fanhood

    It’s only a means to an end (and we haven’t reached that end yet).

    > the primary benefits in the near and mid term are to data aggregators and not the citizens of the web

    I hear you loud and clear; hence the focus of my work on small consumers.
    https://twitter.com/RubenVerborgh/status/779653968448352256 😉

  7. > What you seem to require is that the “explanation” of the affordances is external.

    Yes, If you click through the explanation post of #2 it goes into some detail saying this. The service should subscribe or present it’s profile(s) to consumers as an external contract it has chosen to support. I go into some reasons why in the post, so I won’t rewrite them here, however one primary reason I didn’t go into when compared to the json-ld / hydra approach you are a fan of is bloat. In the traditional XML vs JSON argument, a lot of the ‘json is more efficient’ argument goes away in the face of gzip. There is no analogous solution to trimming json-ld’s added bloat. If your client requested, you could provide this data, but doing so is a) negotiated, and b) does not violate the principles I have stated because it would be a convenience not the source of truth.

    >1…
    I think this is a fallacious argument, and it is not inherent in the nature of hypermedia, see the HTML specification. A priori knowledge is not taboo, it is just supposed to be minimized to a standard and or discoverable set to allow increased longevity and interoperability. See my point above about requesting a json-ld context as part of a response vs requiring every response contain a very large set of additional information which a client is capable of caching.

    2> There is a lot of overlap throughout, and it is by nature a means to break up a topic which is inherently monolithic into more digestible chunks.

    3> This is going back to the define vs list argument. The definition like in software is the ‘truth’, the rest is a convenience which can be negotiated or requested but most definitely should NOT be required. This is no deprivation, the whole service needs to be discovered, so representations and link information is required to be presented, it just isn’t serialized unnecessarily with every request. This has massive ramifications in the mobile and micro services spaces where extra bloat can be extremely costly in a variety of ways.

    >It’s only a means to an end (and we haven’t reached that end yet). ….

    I entirely agree with the sentiment, but its not the right tool for the job now. My argument against that is entirely architectural in that the backbone of our network is still not distributed enough where the semantic web is more beneficial than harmful to users. The concept is sound, the foundational architecture with the present bandwidth and computational capacities as they exist today however preclude any of my support for the movement. If we had distributed dns (block-chain?) with mesh routing globally where effort to build semantic bridges wouldn’t result in such massive privacy and security concerns then in this case I would likely be far more supportive of the idea, as the network would be far more democratized. Until then, I’m very much against policies which give data aggregators of all types an easier time to profile web traffic. This blog post goes into some more detail: https://www.linkedin.com/pulse/i-pledge-do-harm-your-data-michael-hibay

  8. > however one primary reason I didn’t go into when compared to the json-ld / hydra approach you are a fan of is bloat.

    It is interesting that the human Web
    is full of such bloat, where we call it “usability” 🙂

    > If your client requested, you could provide this data, but doing so is a) negotiated

    That’s interesting, but not explicitly mentioned.

    > and b) does not violate the principles I have stated because it would be a convenience not the source of truth.

    It directly violates the “MUST NOT” of 7;
    7 does not make any exception,
    neither for convenience nor for negotiation.

    Making it a “SHOULD NOT” allows exceptions.

    > > 1) I’d suggest an explicit point about having the list of affordances in the representation.
    This is an essential point of hypermedia APIs, which is not touched upon.

    > I think this is a fallacious argument, and it is not inherent in the nature of hypermedia

    Alright, we’ll need to agree to disagree then.
    If the hypermedia doesn’t contain affordances,
    then it’s not a hypermedia-driven API
    for every source I know about the topic.

    Which is fine, you can define other things,
    but the blog post brings them as general guidelines
    while they fundamentally disagree with others.
    And that’s very confusing…
    Unless you find other sources for that statement
    (which I’d be very happy to hear about),
    please don’t redefine “hypermedia APIs” in general
    but rather introduce a new variant of it.

  9. > It is interesting that the human Web
    is full of such bloat, where we call it “usability” 🙂

    In the human web, that information is contextualized because the client (us pesky homosapiens-sapiens) don’t have the cache potential of our digital clients. We need the information presented in this way for it to be a coherent picture, machines do not. Assembling from a cached (and cache-control moderated) local copy of the resource definition and it’s potential affordances is an extremely simple task for a machine client.

    > That’s interesting, but not explicitly mentioned.

    I know, we’re having a great discussion on this before I was able to write out the longer explanations of each point. If this was all out before, perhaps a lot of this confusion would have been cleared up.. however, I am doing this in my free time as I can, so.. you know.. life 😀

    >It directly violates the “MUST NOT” of 7;
    7 does not make any exception,
    neither for convenience nor for negotiation.

    It does not. Again, I haven’t gotten to that point but it’s part of the extensive content negotiation point. But to address your repeated concern specifically, there is no reason the following ‘Accept: application/vnd.collection+json;application/ld+json’ or some similar format could not be included on a list of supported mime-types. As embedding json-ld inside other media formats is perfectly legal. The how of it via negotiation of the embedded information would obviously need to be worked out. The ‘instance’ of the resource representation isn’t the definition of the resource representation and as such it doesn’t violate the statement in the slightest.

    The profile is the source of the resource representation definition. In terms of alps defined vocabulary, it will represent all semantic components of the resource. It will also provide all of the potential affordances of the vocabulary, all goals of the vocabulary, and all affordances of the vocabulary.

    You seem to have an issue with the way I am using the word ‘define’ which is a computer science book definition of the word. I define the domain of all possible affordances through vocabularies, however the hypermedia component applies those affordances to the representation as required by business logic and other stateful information at request time. These applied or listed affordances are most definitely required and bound as dictated by the negotiated hypermedia format’s particular response structure. This is the same point I made previously where I thought this confusion was resolved with the define vs list points.

    I am not creating a new concept of ‘hypermedia’ I’m simply challenging and disagreeing with your apparent assumption that the definition of all possible affordances must accompany all instances of that representation. Not all resources at the same URI will have the same affordances at all times. You can’t define a helpful hypermedia client, without a realistic and dependable means for knowing the entire domain of affordances you MAY receive, again see the HTML specification and definitions of all tags with information regarding their handling which is NOT serialized with the specific instance of a resource / data.

  10. I’ll make it short and simple 🙂

    > your apparent assumption that the definition of all possible affordances must accompany all instances of that representation.

    I don’t have that assumption.

    What I’m saying can be summarized as:
    – The essence of hypermedia APIs is that their responses contain hypermedia controls (in whatever form, and with or without definition)
    – none of your 11 guidelines mandate this

  11. >– none of your 11 guidelines mandate this

    /giphy doh.

    I apologize if my responses seemed pedantic, I seem to have missed the forrest for the trees.

    Well, the guidelines don’t specifically mention it, you’re right. However, in the explanation for number 2 I do explicitly mention the actions but I think your right that my assumption of this understanding is probably not the best thing, as this is intended to be an educational document. Thanks for the feedback!

  12. Why must HTTP be “embraced?” The concept of hypermedia is protocol independent. HTTP does of course make for a great implementation of hypermedia. But what about CoAP as another example? Will this series use JSON-LD?

    1. You very correctly point out it is far bigger than just HTTP. This post was written mostly targeting developers using HTTP and predominately still only 1.1. The reason for embracing HTTP in this case is to remain consistent with the uniform interface, and operationally to get the desired behavior and performance from your services. From a development perspective, it is important to understand the underlying protocols native capabilities to prevent costly substandard re-implementations of HTTP capabilities. Some of the most common examples of this would be `200 OK {“error”:”*error message here*”}`, custom media type caching, enabling pessimistic locking, etc..

      A simple question “What do you need to create a generic client for hypermedia APIs?” lead to three core principles directing this effort:
      1. The generic client must enable rich interactions with custom domain specific languages.
      2. The generic client must be protocol agnostic.
      3. If at all possible the client must be media-type agnostic, and strongly prefer metadata be decoupled from resource data.

      CoAP is a great example of a fit for purpose protocol which can benefit greatly from the decoupling hypermedia provides. Though it is by nature less verbose in its metadata capabilities, using CoRAL will make up for many of the differences.

      I was happy to be a part of a series of discussions on hypermedia approaches with some folks working on Thing-Description at W3C, CoAP/CoRAL/OCF with the IRTF, and JSON Hyper-Schema and what is clear is there is an emerging understanding of the types of metadata required for decoupled systems, and certain protocols are better at conveying the metadata than others. This leads to trade-offs from a process and technical perspective in the degree of freedom to create and use custom domain vocabulary for hypermedia controls.

      As for JSON-LD, its a great serialization of RDF and semantic relationships between resources (what I frequently call composite relationships / composites), however it isn’t exactly a great fit for APIs as it always brings up the “actions” or more accurately affordances or potential actions debate. The team working on Hydra however is trying to address this in a high level fashion while ironing out their generic client work. I eagerly am watching their progress and we frequently share ideas on our approaches, however I don’t plan to directly address either of them directly in this series more than I have. I’m trying to focus on getting the spec behind this post, an initial framework, and foundation of a generic client out the door first 😀

Leave a Reply