What should the HTTP Status Code of a Degraded Health Check Be?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







6















I have a health check endpoint at /status that returns the following status codes and response bodies:




  • Healthy - 200 OK

  • Degraded - ?

  • Unhealthy - 503 Service Unnavailable


What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?










share|improve this question




















  • 1





    I don't think your question makes sense. You need to decide what HTTP GET of /status should do

    – Basile Starynkevitch
    Dec 31 '18 at 8:39






  • 1





    What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.

    – Retired Ninja
    Dec 31 '18 at 9:02






  • 1





    @MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example 218 This is fine (Apache Web Server)

    – Nkosi
    Jan 2 at 17:47








  • 1





    @MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00

    – Nkosi
    Jan 2 at 17:57






  • 1





    @MuhammadRehanSaeed hoping you check the more recent version They also suggested In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response. where warn status is healthy, with some concerns, which I believe aligns closely to you mode.

    – Nkosi
    Jan 3 at 8:45




















6















I have a health check endpoint at /status that returns the following status codes and response bodies:




  • Healthy - 200 OK

  • Degraded - ?

  • Unhealthy - 503 Service Unnavailable


What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?










share|improve this question




















  • 1





    I don't think your question makes sense. You need to decide what HTTP GET of /status should do

    – Basile Starynkevitch
    Dec 31 '18 at 8:39






  • 1





    What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.

    – Retired Ninja
    Dec 31 '18 at 9:02






  • 1





    @MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example 218 This is fine (Apache Web Server)

    – Nkosi
    Jan 2 at 17:47








  • 1





    @MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00

    – Nkosi
    Jan 2 at 17:57






  • 1





    @MuhammadRehanSaeed hoping you check the more recent version They also suggested In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response. where warn status is healthy, with some concerns, which I believe aligns closely to you mode.

    – Nkosi
    Jan 3 at 8:45
















6












6








6


0






I have a health check endpoint at /status that returns the following status codes and response bodies:




  • Healthy - 200 OK

  • Degraded - ?

  • Unhealthy - 503 Service Unnavailable


What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?










share|improve this question
















I have a health check endpoint at /status that returns the following status codes and response bodies:




  • Healthy - 200 OK

  • Degraded - ?

  • Unhealthy - 503 Service Unnavailable


What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?







http http-status-codes http-status-code-503 health-monitoring kubernetes-health-check






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 31 '18 at 9:02







Muhammad Rehan Saeed

















asked Dec 31 '18 at 8:38









Muhammad Rehan SaeedMuhammad Rehan Saeed

12.4k10115196




12.4k10115196








  • 1





    I don't think your question makes sense. You need to decide what HTTP GET of /status should do

    – Basile Starynkevitch
    Dec 31 '18 at 8:39






  • 1





    What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.

    – Retired Ninja
    Dec 31 '18 at 9:02






  • 1





    @MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example 218 This is fine (Apache Web Server)

    – Nkosi
    Jan 2 at 17:47








  • 1





    @MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00

    – Nkosi
    Jan 2 at 17:57






  • 1





    @MuhammadRehanSaeed hoping you check the more recent version They also suggested In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response. where warn status is healthy, with some concerns, which I believe aligns closely to you mode.

    – Nkosi
    Jan 3 at 8:45
















  • 1





    I don't think your question makes sense. You need to decide what HTTP GET of /status should do

    – Basile Starynkevitch
    Dec 31 '18 at 8:39






  • 1





    What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.

    – Retired Ninja
    Dec 31 '18 at 9:02






  • 1





    @MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example 218 This is fine (Apache Web Server)

    – Nkosi
    Jan 2 at 17:47








  • 1





    @MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00

    – Nkosi
    Jan 2 at 17:57






  • 1





    @MuhammadRehanSaeed hoping you check the more recent version They also suggested In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response. where warn status is healthy, with some concerns, which I believe aligns closely to you mode.

    – Nkosi
    Jan 3 at 8:45










1




1





I don't think your question makes sense. You need to decide what HTTP GET of /status should do

– Basile Starynkevitch
Dec 31 '18 at 8:39





I don't think your question makes sense. You need to decide what HTTP GET of /status should do

– Basile Starynkevitch
Dec 31 '18 at 8:39




1




1





What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.

– Retired Ninja
Dec 31 '18 at 9:02





What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.

– Retired Ninja
Dec 31 '18 at 9:02




1




1





@MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example 218 This is fine (Apache Web Server)

– Nkosi
Jan 2 at 17:47







@MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example 218 This is fine (Apache Web Server)

– Nkosi
Jan 2 at 17:47






1




1





@MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00

– Nkosi
Jan 2 at 17:57





@MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00

– Nkosi
Jan 2 at 17:57




1




1





@MuhammadRehanSaeed hoping you check the more recent version They also suggested In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response. where warn status is healthy, with some concerns, which I believe aligns closely to you mode.

– Nkosi
Jan 3 at 8:45







@MuhammadRehanSaeed hoping you check the more recent version They also suggested In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response. where warn status is healthy, with some concerns, which I believe aligns closely to you mode.

– Nkosi
Jan 3 at 8:45














3 Answers
3






active

oldest

votes


















2





+50









The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK.



I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.



You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...



Other comments on your design



The most natural way to interpret a 5xx response to any request is that the operation in question failed.



So a 503 Service Unavailable response to a GET /status request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:




A health endpoint is only meaningful in the context of the component
it indicates the health of. It has no other meaning or purpose. As
such, its health is a conduit to the health of the component.
Clients SHOULD assume that the HTTP response code returned by the
health endpoint is applicable to the entire component (e.g. a larger
API or a microservice).




But with a URL path of just /status, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.



Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health. I would also suggest placing it under some base path, e.g. /things/health, to make it more clear which component it indicates the health of.



If, on the other hand, /status was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status service itself.



So be careful with how you name things and what status codes you use!






share|improve this answer































    2














    Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.



    For example 218 This is fine (Apache Web Server)




    Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive




    After doing some research I came across a draft



    Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03



    Where they also made similar suggestions




    In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.




    where the warn status in the draft is healthy, with some concerns, which I believe aligns closely to your desired model.



    While not definitive, I believe it provides some ideas to help with the eventual design.






    share|improve this answer





















    • 1





      I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!

      – Muhammad Rehan Saeed
      Jan 10 at 8:50



















    2














    I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.



    I typically use the following interface for application healthchecks:



    204 - No Content, everything is working within tolerences



    500 - Something failed, and here's some details in the response about what went wrong



    Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.



    If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.



    The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.






    share|improve this answer
























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53985294%2fwhat-should-the-http-status-code-of-a-degraded-health-check-be%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2





      +50









      The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK.



      I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.



      You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...



      Other comments on your design



      The most natural way to interpret a 5xx response to any request is that the operation in question failed.



      So a 503 Service Unavailable response to a GET /status request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:




      A health endpoint is only meaningful in the context of the component
      it indicates the health of. It has no other meaning or purpose. As
      such, its health is a conduit to the health of the component.
      Clients SHOULD assume that the HTTP response code returned by the
      health endpoint is applicable to the entire component (e.g. a larger
      API or a microservice).




      But with a URL path of just /status, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.



      Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health. I would also suggest placing it under some base path, e.g. /things/health, to make it more clear which component it indicates the health of.



      If, on the other hand, /status was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status service itself.



      So be careful with how you name things and what status codes you use!






      share|improve this answer




























        2





        +50









        The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK.



        I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.



        You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...



        Other comments on your design



        The most natural way to interpret a 5xx response to any request is that the operation in question failed.



        So a 503 Service Unavailable response to a GET /status request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:




        A health endpoint is only meaningful in the context of the component
        it indicates the health of. It has no other meaning or purpose. As
        such, its health is a conduit to the health of the component.
        Clients SHOULD assume that the HTTP response code returned by the
        health endpoint is applicable to the entire component (e.g. a larger
        API or a microservice).




        But with a URL path of just /status, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.



        Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health. I would also suggest placing it under some base path, e.g. /things/health, to make it more clear which component it indicates the health of.



        If, on the other hand, /status was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status service itself.



        So be careful with how you name things and what status codes you use!






        share|improve this answer


























          2





          +50







          2





          +50



          2




          +50





          The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK.



          I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.



          You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...



          Other comments on your design



          The most natural way to interpret a 5xx response to any request is that the operation in question failed.



          So a 503 Service Unavailable response to a GET /status request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:




          A health endpoint is only meaningful in the context of the component
          it indicates the health of. It has no other meaning or purpose. As
          such, its health is a conduit to the health of the component.
          Clients SHOULD assume that the HTTP response code returned by the
          health endpoint is applicable to the entire component (e.g. a larger
          API or a microservice).




          But with a URL path of just /status, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.



          Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health. I would also suggest placing it under some base path, e.g. /things/health, to make it more clear which component it indicates the health of.



          If, on the other hand, /status was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status service itself.



          So be careful with how you name things and what status codes you use!






          share|improve this answer













          The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK.



          I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.



          You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...



          Other comments on your design



          The most natural way to interpret a 5xx response to any request is that the operation in question failed.



          So a 503 Service Unavailable response to a GET /status request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:




          A health endpoint is only meaningful in the context of the component
          it indicates the health of. It has no other meaning or purpose. As
          such, its health is a conduit to the health of the component.
          Clients SHOULD assume that the HTTP response code returned by the
          health endpoint is applicable to the entire component (e.g. a larger
          API or a microservice).




          But with a URL path of just /status, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.



          Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health. I would also suggest placing it under some base path, e.g. /things/health, to make it more clear which component it indicates the health of.



          If, on the other hand, /status was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status service itself.



          So be careful with how you name things and what status codes you use!







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 8 at 17:49









          mbjmbj

          653314




          653314

























              2














              Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.



              For example 218 This is fine (Apache Web Server)




              Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive




              After doing some research I came across a draft



              Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03



              Where they also made similar suggestions




              In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.




              where the warn status in the draft is healthy, with some concerns, which I believe aligns closely to your desired model.



              While not definitive, I believe it provides some ideas to help with the eventual design.






              share|improve this answer





















              • 1





                I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!

                – Muhammad Rehan Saeed
                Jan 10 at 8:50
















              2














              Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.



              For example 218 This is fine (Apache Web Server)




              Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive




              After doing some research I came across a draft



              Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03



              Where they also made similar suggestions




              In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.




              where the warn status in the draft is healthy, with some concerns, which I believe aligns closely to your desired model.



              While not definitive, I believe it provides some ideas to help with the eventual design.






              share|improve this answer





















              • 1





                I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!

                – Muhammad Rehan Saeed
                Jan 10 at 8:50














              2












              2








              2







              Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.



              For example 218 This is fine (Apache Web Server)




              Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive




              After doing some research I came across a draft



              Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03



              Where they also made similar suggestions




              In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.




              where the warn status in the draft is healthy, with some concerns, which I believe aligns closely to your desired model.



              While not definitive, I believe it provides some ideas to help with the eventual design.






              share|improve this answer















              Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.



              For example 218 This is fine (Apache Web Server)




              Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive




              After doing some research I came across a draft



              Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03



              Where they also made similar suggestions




              In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.




              where the warn status in the draft is healthy, with some concerns, which I believe aligns closely to your desired model.



              While not definitive, I believe it provides some ideas to help with the eventual design.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Jan 3 at 16:34

























              answered Jan 3 at 9:01









              NkosiNkosi

              121k17142206




              121k17142206








              • 1





                I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!

                – Muhammad Rehan Saeed
                Jan 10 at 8:50














              • 1





                I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!

                – Muhammad Rehan Saeed
                Jan 10 at 8:50








              1




              1





              I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!

              – Muhammad Rehan Saeed
              Jan 10 at 8:50





              I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!

              – Muhammad Rehan Saeed
              Jan 10 at 8:50











              2














              I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.



              I typically use the following interface for application healthchecks:



              204 - No Content, everything is working within tolerences



              500 - Something failed, and here's some details in the response about what went wrong



              Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.



              If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.



              The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.






              share|improve this answer




























                2














                I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.



                I typically use the following interface for application healthchecks:



                204 - No Content, everything is working within tolerences



                500 - Something failed, and here's some details in the response about what went wrong



                Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.



                If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.



                The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.






                share|improve this answer


























                  2












                  2








                  2







                  I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.



                  I typically use the following interface for application healthchecks:



                  204 - No Content, everything is working within tolerences



                  500 - Something failed, and here's some details in the response about what went wrong



                  Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.



                  If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.



                  The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.






                  share|improve this answer













                  I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.



                  I typically use the following interface for application healthchecks:



                  204 - No Content, everything is working within tolerences



                  500 - Something failed, and here's some details in the response about what went wrong



                  Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.



                  If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.



                  The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 7 at 5:53









                  stringy05stringy05

                  2,6891517




                  2,6891517






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53985294%2fwhat-should-the-http-status-code-of-a-degraded-health-check-be%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

                      Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

                      A Topological Invariant for $pi_3(U(n))$