What should the HTTP Status Code of a Degraded Health Check Be?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a health check endpoint at /status
that returns the following status codes and response bodies:
- Healthy -
200 OK
- Degraded -
?
- Unhealthy -
503 Service Unnavailable
What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?
http http-status-codes http-status-code-503 health-monitoring kubernetes-health-check
|
show 6 more comments
I have a health check endpoint at /status
that returns the following status codes and response bodies:
- Healthy -
200 OK
- Degraded -
?
- Unhealthy -
503 Service Unnavailable
What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?
http http-status-codes http-status-code-503 health-monitoring kubernetes-health-check
1
I don't think your question makes sense. You need to decide what HTTPGET
of/status
should do
– Basile Starynkevitch
Dec 31 '18 at 8:39
1
What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.
– Retired Ninja
Dec 31 '18 at 9:02
1
@MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example218 This is fine (Apache Web Server)
– Nkosi
Jan 2 at 17:47
1
@MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00
– Nkosi
Jan 2 at 17:57
1
@MuhammadRehanSaeed hoping you check the more recent version They also suggestedIn case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
wherewarn
status ishealthy, with some concerns
, which I believe aligns closely to you mode.
– Nkosi
Jan 3 at 8:45
|
show 6 more comments
I have a health check endpoint at /status
that returns the following status codes and response bodies:
- Healthy -
200 OK
- Degraded -
?
- Unhealthy -
503 Service Unnavailable
What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?
http http-status-codes http-status-code-503 health-monitoring kubernetes-health-check
I have a health check endpoint at /status
that returns the following status codes and response bodies:
- Healthy -
200 OK
- Degraded -
?
- Unhealthy -
503 Service Unnavailable
What should the HTTP status code be for a degraded response be? A 'degraded' check is used for checks that did succeed but are slow or unstable. What HTTP status code makes the most sense?
http http-status-codes http-status-code-503 health-monitoring kubernetes-health-check
http http-status-codes http-status-code-503 health-monitoring kubernetes-health-check
edited Dec 31 '18 at 9:02
Muhammad Rehan Saeed
asked Dec 31 '18 at 8:38
Muhammad Rehan SaeedMuhammad Rehan Saeed
12.4k10115196
12.4k10115196
1
I don't think your question makes sense. You need to decide what HTTPGET
of/status
should do
– Basile Starynkevitch
Dec 31 '18 at 8:39
1
What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.
– Retired Ninja
Dec 31 '18 at 9:02
1
@MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example218 This is fine (Apache Web Server)
– Nkosi
Jan 2 at 17:47
1
@MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00
– Nkosi
Jan 2 at 17:57
1
@MuhammadRehanSaeed hoping you check the more recent version They also suggestedIn case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
wherewarn
status ishealthy, with some concerns
, which I believe aligns closely to you mode.
– Nkosi
Jan 3 at 8:45
|
show 6 more comments
1
I don't think your question makes sense. You need to decide what HTTPGET
of/status
should do
– Basile Starynkevitch
Dec 31 '18 at 8:39
1
What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.
– Retired Ninja
Dec 31 '18 at 9:02
1
@MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example218 This is fine (Apache Web Server)
– Nkosi
Jan 2 at 17:47
1
@MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00
– Nkosi
Jan 2 at 17:57
1
@MuhammadRehanSaeed hoping you check the more recent version They also suggestedIn case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
wherewarn
status ishealthy, with some concerns
, which I believe aligns closely to you mode.
– Nkosi
Jan 3 at 8:45
1
1
I don't think your question makes sense. You need to decide what HTTP
GET
of /status
should do– Basile Starynkevitch
Dec 31 '18 at 8:39
I don't think your question makes sense. You need to decide what HTTP
GET
of /status
should do– Basile Starynkevitch
Dec 31 '18 at 8:39
1
1
What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.
– Retired Ninja
Dec 31 '18 at 9:02
What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.
– Retired Ninja
Dec 31 '18 at 9:02
1
1
@MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example
218 This is fine (Apache Web Server)
– Nkosi
Jan 2 at 17:47
@MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example
218 This is fine (Apache Web Server)
– Nkosi
Jan 2 at 17:47
1
1
@MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00
– Nkosi
Jan 2 at 17:57
@MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00
– Nkosi
Jan 2 at 17:57
1
1
@MuhammadRehanSaeed hoping you check the more recent version They also suggested
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
where warn
status is healthy, with some concerns
, which I believe aligns closely to you mode.– Nkosi
Jan 3 at 8:45
@MuhammadRehanSaeed hoping you check the more recent version They also suggested
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
where warn
status is healthy, with some concerns
, which I believe aligns closely to you mode.– Nkosi
Jan 3 at 8:45
|
show 6 more comments
3 Answers
3
active
oldest
votes
The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK
.
I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.
You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...
Other comments on your design
The most natural way to interpret a 5xx
response to any request is that the operation in question failed.
So a 503 Service Unavailable
response to a GET /status
request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status
is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:
A health endpoint is only meaningful in the context of the component
it indicates the health of. It has no other meaning or purpose. As
such, its health is a conduit to the health of the component.
Clients SHOULD assume that the HTTP response code returned by the
health endpoint is applicable to the entire component (e.g. a larger
API or a microservice).
But with a URL path of just /status
, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.
Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health
. I would also suggest placing it under some base path, e.g. /things/health
, to make it more clear which component it indicates the health of.
If, on the other hand, /status
was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK
is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx
would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status
service itself.
So be careful with how you name things and what status codes you use!
add a comment |
Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.
For example 218 This is fine (Apache Web Server)
Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive
After doing some research I came across a draft
Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03
Where they also made similar suggestions
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
where the warn
status in the draft is healthy, with some concerns
, which I believe aligns closely to your desired model.
While not definitive, I believe it provides some ideas to help with the eventual design.
1
I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!
– Muhammad Rehan Saeed
Jan 10 at 8:50
add a comment |
I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.
I typically use the following interface for application healthchecks:
204 - No Content, everything is working within tolerences
500 - Something failed, and here's some details in the response about what went wrong
Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable
. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.
If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.
The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53985294%2fwhat-should-the-http-status-code-of-a-degraded-health-check-be%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK
.
I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.
You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...
Other comments on your design
The most natural way to interpret a 5xx
response to any request is that the operation in question failed.
So a 503 Service Unavailable
response to a GET /status
request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status
is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:
A health endpoint is only meaningful in the context of the component
it indicates the health of. It has no other meaning or purpose. As
such, its health is a conduit to the health of the component.
Clients SHOULD assume that the HTTP response code returned by the
health endpoint is applicable to the entire component (e.g. a larger
API or a microservice).
But with a URL path of just /status
, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.
Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health
. I would also suggest placing it under some base path, e.g. /things/health
, to make it more clear which component it indicates the health of.
If, on the other hand, /status
was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK
is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx
would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status
service itself.
So be careful with how you name things and what status codes you use!
add a comment |
The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK
.
I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.
You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...
Other comments on your design
The most natural way to interpret a 5xx
response to any request is that the operation in question failed.
So a 503 Service Unavailable
response to a GET /status
request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status
is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:
A health endpoint is only meaningful in the context of the component
it indicates the health of. It has no other meaning or purpose. As
such, its health is a conduit to the health of the component.
Clients SHOULD assume that the HTTP response code returned by the
health endpoint is applicable to the entire component (e.g. a larger
API or a microservice).
But with a URL path of just /status
, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.
Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health
. I would also suggest placing it under some base path, e.g. /things/health
, to make it more clear which component it indicates the health of.
If, on the other hand, /status
was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK
is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx
would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status
service itself.
So be careful with how you name things and what status codes you use!
add a comment |
The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK
.
I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.
You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...
Other comments on your design
The most natural way to interpret a 5xx
response to any request is that the operation in question failed.
So a 503 Service Unavailable
response to a GET /status
request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status
is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:
A health endpoint is only meaningful in the context of the component
it indicates the health of. It has no other meaning or purpose. As
such, its health is a conduit to the health of the component.
Clients SHOULD assume that the HTTP response code returned by the
health endpoint is applicable to the entire component (e.g. a larger
API or a microservice).
But with a URL path of just /status
, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.
Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health
. I would also suggest placing it under some base path, e.g. /things/health
, to make it more clear which component it indicates the health of.
If, on the other hand, /status
was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK
is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx
would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status
service itself.
So be careful with how you name things and what status codes you use!
The most suitable HTTP status code for a "Degraded" status response from a health endpoint is nothing other than 200 OK
.
I say this because I can't find any better code in the official Hypertext Transfer Protocol (HTTP) Status Code Registry maintained by IANA, pointed to by [RFC7231] HTTP/1.1: Semantics and Content. Unofficial codes should be avoided, because they only make your API more difficult to understand.
You should design your APIs so that they become easy to use. Resource names, HTTP verbs, status codes, etc. should be more or less self-explanatory, so that people who already know "the REST language" can immediately understand how to use your API without having to decipher vague names or unusual status codes. Which brings me to the next part of my answer...
Other comments on your design
The most natural way to interpret a 5xx
response to any request is that the operation in question failed.
So a 503 Service Unavailable
response to a GET /status
request means that the status checking operation itself failed. Such a response would only be useful if we can be certain that /status
is a health endoint, as pointed out in the API Health Check draft referred to in Nkosi's answer:
A health endpoint is only meaningful in the context of the component
it indicates the health of. It has no other meaning or purpose. As
such, its health is a conduit to the health of the component.
Clients SHOULD assume that the HTTP response code returned by the
health endpoint is applicable to the entire component (e.g. a larger
API or a microservice).
But with a URL path of just /status
, it is not completely obvious that this really is a health endpoint. From looking at the URL, we only know that it returns information about the status of something, but we can't really be sure what that "something" is.
Since you're also telling us that yes, it is in fact a health endpoint, I must suggest that you change the name to health
. I would also suggest placing it under some base path, e.g. /things/health
, to make it more clear which component it indicates the health of.
If, on the other hand, /status
was actually a resource of it own, i.e. something that represents the status of some other component/thing (like its name currently suggests), then 200 OK
is the only reasonable status for successful invocations, even if the thing that it indicates the status of is "Unhealthy". In that case, a 5xx
would mean that no status could be obtained, and details in the response payload would be assumed to be related to a failure in the /status
service itself.
So be careful with how you name things and what status codes you use!
answered Jan 8 at 17:49
mbjmbj
653314
653314
add a comment |
add a comment |
Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.
For example 218 This is fine (Apache Web Server)
Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive
After doing some research I came across a draft
Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03
Where they also made similar suggestions
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
where the warn
status in the draft is healthy, with some concerns
, which I believe aligns closely to your desired model.
While not definitive, I believe it provides some ideas to help with the eventual design.
1
I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!
– Muhammad Rehan Saeed
Jan 10 at 8:50
add a comment |
Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.
For example 218 This is fine (Apache Web Server)
Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive
After doing some research I came across a draft
Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03
Where they also made similar suggestions
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
where the warn
status in the draft is healthy, with some concerns
, which I believe aligns closely to your desired model.
While not definitive, I believe it provides some ideas to help with the eventual design.
1
I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!
– Muhammad Rehan Saeed
Jan 10 at 8:50
add a comment |
Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.
For example 218 This is fine (Apache Web Server)
Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive
After doing some research I came across a draft
Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03
Where they also made similar suggestions
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
where the warn
status in the draft is healthy, with some concerns
, which I believe aligns closely to your desired model.
While not definitive, I believe it provides some ideas to help with the eventual design.
Consider returning a custom code within the 2xx Success range that is not already taken within the known/common status codes. Similar to some of the unofficial codes not supported by any standard.
For example 218 This is fine (Apache Web Server)
Used as a catch-all error condition for allowing response bodies to flow through Apache when ProxyErrorOverride is enabled. When ProxyErrorOverride is enabled in Apache, response bodies that contain a status code of 4xx or 5xx are automatically discarded by Apache in favor of a generic response or a custom response specified by the ErrorDocument directive
After doing some research I came across a draft
Health Check Response Format for HTTP APIs: draft-inadarei-api-health-check-03
Where they also made similar suggestions
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
where the warn
status in the draft is healthy, with some concerns
, which I believe aligns closely to your desired model.
While not definitive, I believe it provides some ideas to help with the eventual design.
edited Jan 3 at 16:34
answered Jan 3 at 9:01
NkosiNkosi
121k17142206
121k17142206
1
I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!
– Muhammad Rehan Saeed
Jan 10 at 8:50
add a comment |
1
I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!
– Muhammad Rehan Saeed
Jan 10 at 8:50
1
1
I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!
– Muhammad Rehan Saeed
Jan 10 at 8:50
I contacted the author of the draft over Twitter (See twitter.com/RehanSaeedUK/status/1081121474667253760?s=20). His response was basically to refer to the HTTP RFC (which isn't much help) and avoid unofficial status codes. While not a complete answer, your input is valuable, so thank you!
– Muhammad Rehan Saeed
Jan 10 at 8:50
add a comment |
I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.
I typically use the following interface for application healthchecks:
204 - No Content, everything is working within tolerences
500 - Something failed, and here's some details in the response about what went wrong
Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable
. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.
If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.
The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.
add a comment |
I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.
I typically use the following interface for application healthchecks:
204 - No Content, everything is working within tolerences
500 - Something failed, and here's some details in the response about what went wrong
Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable
. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.
If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.
The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.
add a comment |
I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.
I typically use the following interface for application healthchecks:
204 - No Content, everything is working within tolerences
500 - Something failed, and here's some details in the response about what went wrong
Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable
. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.
If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.
The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.
I would be wary of splitting hairs like this on a healthcheck on the upstream server side. The service providing the healthcheck should be lightly (and concurrently) testing all its upstream dependencies based on its own set of policies or rules - request timeouts, connection failures and so on. In reality the healthcheck either works or it doesn't and the application shouldn't really need to be keeping track of the results of the healthcheck (other than capturing metrics about what happened). IMHO a stateful healthcheck is a recipe for disaster.
I typically use the following interface for application healthchecks:
204 - No Content, everything is working within tolerences
500 - Something failed, and here's some details in the response about what went wrong
Where it gets tricky depends on your architecture. You may have a VIP or reverse proxy that is interpreting this response and deciding if a given node is healthy or not, in which case it's going to either route the request to a healthy node or return the 503 Service Unavailable
. This decision is going to made on some policy basis - x healthcheck requests failed over a y time period across z upstream services.
If you use a mesh then everyone can feed data back to the service registry to keep the health state up to date and it can be based on actual service calls rather than a healthcheck.
The client is perfectly placed to make a decision based on the health of services it depends on as they can keep track of the various responses from the service. Circuit breakers are an excellent way to handle that and can do it continuously on actual requests rather than just on the healthcheck. Circuit breaker libraries (such as resilience4j) will do this for you at the cost of setting up some policies about how many failed/slow requests constitute a bad service. Service Registrys like the netflix eureka can help with the discovery and ongoing monitoring.
answered Jan 7 at 5:53
stringy05stringy05
2,6891517
2,6891517
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53985294%2fwhat-should-the-http-status-code-of-a-degraded-health-check-be%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I don't think your question makes sense. You need to decide what HTTP
GET
of/status
should do– Basile Starynkevitch
Dec 31 '18 at 8:39
1
What do you believe your choices to be? If it's working we use 200 and return additional information if necessary. Really, it's up to you.
– Retired Ninja
Dec 31 '18 at 9:02
1
@MuhammadRehanSaeed return a custom code withing the 2xx Success range that is not already taken withing the known/common codes. Similar to some of the unofficial codes not supported by any standard. For example
218 This is fine (Apache Web Server)
– Nkosi
Jan 2 at 17:47
1
@MuhammadRehanSaeed also found this tools.ietf.org/html/draft-inadarei-api-health-check-00
– Nkosi
Jan 2 at 17:57
1
@MuhammadRehanSaeed hoping you check the more recent version They also suggested
In case of the “warn” status, endpoints MUST return HTTP status in the 2xx-3xx range, and additional information SHOULD be provided, utilizing optional fields of the response.
wherewarn
status ishealthy, with some concerns
, which I believe aligns closely to you mode.– Nkosi
Jan 3 at 8:45