How to avoid downtime with linux?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
Frequently software updates to Ubuntu require reboots (which can have side effects such as downtime).
I see Ubuntu has https://www.ubuntu.com/livepatch which allows for kernel updates without reboots, however, this is a paid service. There is also ksplice.
Are there Linux distributions/processes where upgrades/patches never require reboots?
(I know setting up high availability (HA) servers and having disposable servers are best practices - so I'm not asking about keeping a service up, but on actual servers.)
ubuntu update linux-kernel upgrade
|
show 1 more comment
Frequently software updates to Ubuntu require reboots (which can have side effects such as downtime).
I see Ubuntu has https://www.ubuntu.com/livepatch which allows for kernel updates without reboots, however, this is a paid service. There is also ksplice.
Are there Linux distributions/processes where upgrades/patches never require reboots?
(I know setting up high availability (HA) servers and having disposable servers are best practices - so I'm not asking about keeping a service up, but on actual servers.)
ubuntu update linux-kernel upgrade
1
Would a air-gapped server work as a machine that never needs rebooting? After all, if no one can access it, you never need to reboot it? ;) -- For example, a monitoring server on a nuclear power plant, that simply sounds an alarm if something is wrong. (Yes i'm aware this would likely be a dedicated system rather than a random server, but I'm using the example just to make the point that there is occasions when rebooting for 'security updates' maybe an entirely fastidious idea.
– djsmiley2k
Feb 3 at 14:40
3
@djsmiley2k That's one of those cases where a machine that you never reboot still doesn't give you sufficient availability. Instead you need redundancy.
– kasperd
Feb 3 at 15:13
@kasperd ok, so a cluster of never rebooted machines?
– djsmiley2k
Feb 3 at 16:25
3
@djsmiley2k My answer to the question already argues why I consider a cluster of machines that are rebooted one at a time to be more reliable than one which you never reboot.
– kasperd
Feb 3 at 17:36
2
What makes you think avoiding individual system downtime is preferable?
– warren
Feb 4 at 20:17
|
show 1 more comment
Frequently software updates to Ubuntu require reboots (which can have side effects such as downtime).
I see Ubuntu has https://www.ubuntu.com/livepatch which allows for kernel updates without reboots, however, this is a paid service. There is also ksplice.
Are there Linux distributions/processes where upgrades/patches never require reboots?
(I know setting up high availability (HA) servers and having disposable servers are best practices - so I'm not asking about keeping a service up, but on actual servers.)
ubuntu update linux-kernel upgrade
Frequently software updates to Ubuntu require reboots (which can have side effects such as downtime).
I see Ubuntu has https://www.ubuntu.com/livepatch which allows for kernel updates without reboots, however, this is a paid service. There is also ksplice.
Are there Linux distributions/processes where upgrades/patches never require reboots?
(I know setting up high availability (HA) servers and having disposable servers are best practices - so I'm not asking about keeping a service up, but on actual servers.)
ubuntu update linux-kernel upgrade
ubuntu update linux-kernel upgrade
edited Feb 3 at 16:45


yagmoth555♦
12.4k31842
12.4k31842
asked Feb 2 at 16:10
user75126user75126
1707
1707
1
Would a air-gapped server work as a machine that never needs rebooting? After all, if no one can access it, you never need to reboot it? ;) -- For example, a monitoring server on a nuclear power plant, that simply sounds an alarm if something is wrong. (Yes i'm aware this would likely be a dedicated system rather than a random server, but I'm using the example just to make the point that there is occasions when rebooting for 'security updates' maybe an entirely fastidious idea.
– djsmiley2k
Feb 3 at 14:40
3
@djsmiley2k That's one of those cases where a machine that you never reboot still doesn't give you sufficient availability. Instead you need redundancy.
– kasperd
Feb 3 at 15:13
@kasperd ok, so a cluster of never rebooted machines?
– djsmiley2k
Feb 3 at 16:25
3
@djsmiley2k My answer to the question already argues why I consider a cluster of machines that are rebooted one at a time to be more reliable than one which you never reboot.
– kasperd
Feb 3 at 17:36
2
What makes you think avoiding individual system downtime is preferable?
– warren
Feb 4 at 20:17
|
show 1 more comment
1
Would a air-gapped server work as a machine that never needs rebooting? After all, if no one can access it, you never need to reboot it? ;) -- For example, a monitoring server on a nuclear power plant, that simply sounds an alarm if something is wrong. (Yes i'm aware this would likely be a dedicated system rather than a random server, but I'm using the example just to make the point that there is occasions when rebooting for 'security updates' maybe an entirely fastidious idea.
– djsmiley2k
Feb 3 at 14:40
3
@djsmiley2k That's one of those cases where a machine that you never reboot still doesn't give you sufficient availability. Instead you need redundancy.
– kasperd
Feb 3 at 15:13
@kasperd ok, so a cluster of never rebooted machines?
– djsmiley2k
Feb 3 at 16:25
3
@djsmiley2k My answer to the question already argues why I consider a cluster of machines that are rebooted one at a time to be more reliable than one which you never reboot.
– kasperd
Feb 3 at 17:36
2
What makes you think avoiding individual system downtime is preferable?
– warren
Feb 4 at 20:17
1
1
Would a air-gapped server work as a machine that never needs rebooting? After all, if no one can access it, you never need to reboot it? ;) -- For example, a monitoring server on a nuclear power plant, that simply sounds an alarm if something is wrong. (Yes i'm aware this would likely be a dedicated system rather than a random server, but I'm using the example just to make the point that there is occasions when rebooting for 'security updates' maybe an entirely fastidious idea.
– djsmiley2k
Feb 3 at 14:40
Would a air-gapped server work as a machine that never needs rebooting? After all, if no one can access it, you never need to reboot it? ;) -- For example, a monitoring server on a nuclear power plant, that simply sounds an alarm if something is wrong. (Yes i'm aware this would likely be a dedicated system rather than a random server, but I'm using the example just to make the point that there is occasions when rebooting for 'security updates' maybe an entirely fastidious idea.
– djsmiley2k
Feb 3 at 14:40
3
3
@djsmiley2k That's one of those cases where a machine that you never reboot still doesn't give you sufficient availability. Instead you need redundancy.
– kasperd
Feb 3 at 15:13
@djsmiley2k That's one of those cases where a machine that you never reboot still doesn't give you sufficient availability. Instead you need redundancy.
– kasperd
Feb 3 at 15:13
@kasperd ok, so a cluster of never rebooted machines?
– djsmiley2k
Feb 3 at 16:25
@kasperd ok, so a cluster of never rebooted machines?
– djsmiley2k
Feb 3 at 16:25
3
3
@djsmiley2k My answer to the question already argues why I consider a cluster of machines that are rebooted one at a time to be more reliable than one which you never reboot.
– kasperd
Feb 3 at 17:36
@djsmiley2k My answer to the question already argues why I consider a cluster of machines that are rebooted one at a time to be more reliable than one which you never reboot.
– kasperd
Feb 3 at 17:36
2
2
What makes you think avoiding individual system downtime is preferable?
– warren
Feb 4 at 20:17
What makes you think avoiding individual system downtime is preferable?
– warren
Feb 4 at 20:17
|
show 1 more comment
2 Answers
2
active
oldest
votes
To your question, "Are there Linux distributions/processes where upgrades/patches never require reboots?", I'm not aware of any, and I'm highly doubtful that there ever will be any which are truly reboot-free. In addition to Michael Hampton's comment about why live patching is not an out-of-the-box experience anywhere, live patching also doesn't achieve the same result as rebooting.
An anecdote to illustrate this: I recently investigated a problem where one particular utility had started segfaulting on a large number of machines. I tried looking at the shared libraries which it used to see if anything recently upgraded had broken it; ldd said it wasn't an executable (even though when I pulled the same binary down to my laptop, ldd could see the shared library dependencies just fine). I tried stepping through it in gdb; it segfaulted before it even got to the first instruction.
Looking at the timing of the fault, I found that a Ksplice patch had been recently applied. I backed out the patch and the binary didn't segfault, then added it back in, and it started segfaulting again. Rebooting onto equivalently-patched kernel worked fine. It turned out to be a patch for 32-bit support which the Ksplice folks had not applied quite correctly. To their credit, they issued a fixed patch within a few hours and it was back to working correctly on our fleet without intervention.
Another example: the Meltdown/Spectre patches were so invasive that the Ubuntu kernel team decided that live patching was impractical and required people to reboot their systems into the fixed kernel before receiving live patches again.
We run a large fleet of physical and virtual servers at work, with a large number of both Ksplice and Canonical Livepatch systems. They've both been far more reliable than a lot of other software, but I would still rather see our services designed with a reboot-friendly architecture than rely on kernel live patching.
add a comment |
There is an important distinction between making a service highly available and making an individual machine highly available.
In most cases the goal is to make the service highly available, and availability of individual machines is only a means toward achieving that goal. However there is a limit in how far towards the goal you can get by improving availability of individual machines.
Even if you could take away all the downtime due to needing to update software the individual machines will still not be 100% available. Thus to increase the availability of the service above the availability of individual machines you have to design redundancy at a higher level. The last sentence of your question shows that at least in principle you know this.
If you do design a service to be more available than individual machines can deliver there is no longer pressure to achieve high availability of individual machines. Thus for highly available services there is no need to avoid reboots. Instead you can sacrifice some reliability of individual machines to make savings which can be put towards other areas where you can get much higher gains in reliability.
Once the high level system is design to be reliable in case of individual hardware components failing the live patching of kernels changes from being an advantage to becoming a risk.
It's a risk because there can be subtle differences between the behavior of a machine which was live patched and a machine which was booted with the newest kernel version. This can introduce a latent bug that can cause an outage next time a machine is rebooted. This risk is amplified by rebooting to get a clean slate being seen as a method to mitigate some outages.
One day you could have an outage where you think rebooting the machine might help. But as you reboot you are hit by the latent bug preventing the machine from coming back in the desired state. Live patching is not the only way such a latent bug can happen, it could as well happen due to something as mundane as a service having been enabled manually and never configured to start during boot, or having been configured to start too early such that it fails to come up due to unsatisfied dependencies.
For those reasons a highly available service may actually be easier to achieve with regular reboots of individual machines at a slow enough rate that you can detect problems and pause the sequence of reboots once problems do happen.
I liked your description of the risk; "patched vs booted with the newest kernel".. However, you didn't answer my question.. which I could rephrase, are there linux distros which ship with 'livepatch' out-of-the-box?
– user75126
Feb 2 at 18:39
@user75126 I see it as a feature which is more appropriate for client machines than for servers. Moreover asking which distributions support it sounds like a product recommendation question. To me that sounds like two reasons why rephrasing the question like that would make it off-topic for this site.
– kasperd
Feb 2 at 19:20
3
@user75126 Oracle's Ksplice has a free trial, and a free tier for Ubuntu and Fedora desktops (only, but they don't really enforce this). The problem is that creating the live patches is difficult to automate, and even the parts that can be automated are also time consuming. Creating these patches is a relatively labor intensive operation, and it's reasonable for companies to charge for that. I looked into what it would take to create the live patches myself, and noped right out of there. I haven't got that kind of time in my day.
– Michael Hampton♦
Feb 2 at 21:55
12
@user75126 It's really bad practice on this site to change the question title and body in a way that invalidates an existing answer. If you wanted to ask a different question, then ask a different question.
– Greg Schmit
Feb 3 at 2:26
2
@user75126 Thanks. I read your question, and I didn't think it was really an answer to it. I was merely commenting on why this is a paid service.
– Michael Hampton♦
Feb 4 at 0:28
|
show 2 more comments
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "2"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952030%2fhow-to-avoid-downtime-with-linux%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
To your question, "Are there Linux distributions/processes where upgrades/patches never require reboots?", I'm not aware of any, and I'm highly doubtful that there ever will be any which are truly reboot-free. In addition to Michael Hampton's comment about why live patching is not an out-of-the-box experience anywhere, live patching also doesn't achieve the same result as rebooting.
An anecdote to illustrate this: I recently investigated a problem where one particular utility had started segfaulting on a large number of machines. I tried looking at the shared libraries which it used to see if anything recently upgraded had broken it; ldd said it wasn't an executable (even though when I pulled the same binary down to my laptop, ldd could see the shared library dependencies just fine). I tried stepping through it in gdb; it segfaulted before it even got to the first instruction.
Looking at the timing of the fault, I found that a Ksplice patch had been recently applied. I backed out the patch and the binary didn't segfault, then added it back in, and it started segfaulting again. Rebooting onto equivalently-patched kernel worked fine. It turned out to be a patch for 32-bit support which the Ksplice folks had not applied quite correctly. To their credit, they issued a fixed patch within a few hours and it was back to working correctly on our fleet without intervention.
Another example: the Meltdown/Spectre patches were so invasive that the Ubuntu kernel team decided that live patching was impractical and required people to reboot their systems into the fixed kernel before receiving live patches again.
We run a large fleet of physical and virtual servers at work, with a large number of both Ksplice and Canonical Livepatch systems. They've both been far more reliable than a lot of other software, but I would still rather see our services designed with a reboot-friendly architecture than rely on kernel live patching.
add a comment |
To your question, "Are there Linux distributions/processes where upgrades/patches never require reboots?", I'm not aware of any, and I'm highly doubtful that there ever will be any which are truly reboot-free. In addition to Michael Hampton's comment about why live patching is not an out-of-the-box experience anywhere, live patching also doesn't achieve the same result as rebooting.
An anecdote to illustrate this: I recently investigated a problem where one particular utility had started segfaulting on a large number of machines. I tried looking at the shared libraries which it used to see if anything recently upgraded had broken it; ldd said it wasn't an executable (even though when I pulled the same binary down to my laptop, ldd could see the shared library dependencies just fine). I tried stepping through it in gdb; it segfaulted before it even got to the first instruction.
Looking at the timing of the fault, I found that a Ksplice patch had been recently applied. I backed out the patch and the binary didn't segfault, then added it back in, and it started segfaulting again. Rebooting onto equivalently-patched kernel worked fine. It turned out to be a patch for 32-bit support which the Ksplice folks had not applied quite correctly. To their credit, they issued a fixed patch within a few hours and it was back to working correctly on our fleet without intervention.
Another example: the Meltdown/Spectre patches were so invasive that the Ubuntu kernel team decided that live patching was impractical and required people to reboot their systems into the fixed kernel before receiving live patches again.
We run a large fleet of physical and virtual servers at work, with a large number of both Ksplice and Canonical Livepatch systems. They've both been far more reliable than a lot of other software, but I would still rather see our services designed with a reboot-friendly architecture than rely on kernel live patching.
add a comment |
To your question, "Are there Linux distributions/processes where upgrades/patches never require reboots?", I'm not aware of any, and I'm highly doubtful that there ever will be any which are truly reboot-free. In addition to Michael Hampton's comment about why live patching is not an out-of-the-box experience anywhere, live patching also doesn't achieve the same result as rebooting.
An anecdote to illustrate this: I recently investigated a problem where one particular utility had started segfaulting on a large number of machines. I tried looking at the shared libraries which it used to see if anything recently upgraded had broken it; ldd said it wasn't an executable (even though when I pulled the same binary down to my laptop, ldd could see the shared library dependencies just fine). I tried stepping through it in gdb; it segfaulted before it even got to the first instruction.
Looking at the timing of the fault, I found that a Ksplice patch had been recently applied. I backed out the patch and the binary didn't segfault, then added it back in, and it started segfaulting again. Rebooting onto equivalently-patched kernel worked fine. It turned out to be a patch for 32-bit support which the Ksplice folks had not applied quite correctly. To their credit, they issued a fixed patch within a few hours and it was back to working correctly on our fleet without intervention.
Another example: the Meltdown/Spectre patches were so invasive that the Ubuntu kernel team decided that live patching was impractical and required people to reboot their systems into the fixed kernel before receiving live patches again.
We run a large fleet of physical and virtual servers at work, with a large number of both Ksplice and Canonical Livepatch systems. They've both been far more reliable than a lot of other software, but I would still rather see our services designed with a reboot-friendly architecture than rely on kernel live patching.
To your question, "Are there Linux distributions/processes where upgrades/patches never require reboots?", I'm not aware of any, and I'm highly doubtful that there ever will be any which are truly reboot-free. In addition to Michael Hampton's comment about why live patching is not an out-of-the-box experience anywhere, live patching also doesn't achieve the same result as rebooting.
An anecdote to illustrate this: I recently investigated a problem where one particular utility had started segfaulting on a large number of machines. I tried looking at the shared libraries which it used to see if anything recently upgraded had broken it; ldd said it wasn't an executable (even though when I pulled the same binary down to my laptop, ldd could see the shared library dependencies just fine). I tried stepping through it in gdb; it segfaulted before it even got to the first instruction.
Looking at the timing of the fault, I found that a Ksplice patch had been recently applied. I backed out the patch and the binary didn't segfault, then added it back in, and it started segfaulting again. Rebooting onto equivalently-patched kernel worked fine. It turned out to be a patch for 32-bit support which the Ksplice folks had not applied quite correctly. To their credit, they issued a fixed patch within a few hours and it was back to working correctly on our fleet without intervention.
Another example: the Meltdown/Spectre patches were so invasive that the Ubuntu kernel team decided that live patching was impractical and required people to reboot their systems into the fixed kernel before receiving live patches again.
We run a large fleet of physical and virtual servers at work, with a large number of both Ksplice and Canonical Livepatch systems. They've both been far more reliable than a lot of other software, but I would still rather see our services designed with a reboot-friendly architecture than rely on kernel live patching.
edited Feb 6 at 22:57
answered Feb 5 at 20:37


Paul GearPaul Gear
2,9741131
2,9741131
add a comment |
add a comment |
There is an important distinction between making a service highly available and making an individual machine highly available.
In most cases the goal is to make the service highly available, and availability of individual machines is only a means toward achieving that goal. However there is a limit in how far towards the goal you can get by improving availability of individual machines.
Even if you could take away all the downtime due to needing to update software the individual machines will still not be 100% available. Thus to increase the availability of the service above the availability of individual machines you have to design redundancy at a higher level. The last sentence of your question shows that at least in principle you know this.
If you do design a service to be more available than individual machines can deliver there is no longer pressure to achieve high availability of individual machines. Thus for highly available services there is no need to avoid reboots. Instead you can sacrifice some reliability of individual machines to make savings which can be put towards other areas where you can get much higher gains in reliability.
Once the high level system is design to be reliable in case of individual hardware components failing the live patching of kernels changes from being an advantage to becoming a risk.
It's a risk because there can be subtle differences between the behavior of a machine which was live patched and a machine which was booted with the newest kernel version. This can introduce a latent bug that can cause an outage next time a machine is rebooted. This risk is amplified by rebooting to get a clean slate being seen as a method to mitigate some outages.
One day you could have an outage where you think rebooting the machine might help. But as you reboot you are hit by the latent bug preventing the machine from coming back in the desired state. Live patching is not the only way such a latent bug can happen, it could as well happen due to something as mundane as a service having been enabled manually and never configured to start during boot, or having been configured to start too early such that it fails to come up due to unsatisfied dependencies.
For those reasons a highly available service may actually be easier to achieve with regular reboots of individual machines at a slow enough rate that you can detect problems and pause the sequence of reboots once problems do happen.
I liked your description of the risk; "patched vs booted with the newest kernel".. However, you didn't answer my question.. which I could rephrase, are there linux distros which ship with 'livepatch' out-of-the-box?
– user75126
Feb 2 at 18:39
@user75126 I see it as a feature which is more appropriate for client machines than for servers. Moreover asking which distributions support it sounds like a product recommendation question. To me that sounds like two reasons why rephrasing the question like that would make it off-topic for this site.
– kasperd
Feb 2 at 19:20
3
@user75126 Oracle's Ksplice has a free trial, and a free tier for Ubuntu and Fedora desktops (only, but they don't really enforce this). The problem is that creating the live patches is difficult to automate, and even the parts that can be automated are also time consuming. Creating these patches is a relatively labor intensive operation, and it's reasonable for companies to charge for that. I looked into what it would take to create the live patches myself, and noped right out of there. I haven't got that kind of time in my day.
– Michael Hampton♦
Feb 2 at 21:55
12
@user75126 It's really bad practice on this site to change the question title and body in a way that invalidates an existing answer. If you wanted to ask a different question, then ask a different question.
– Greg Schmit
Feb 3 at 2:26
2
@user75126 Thanks. I read your question, and I didn't think it was really an answer to it. I was merely commenting on why this is a paid service.
– Michael Hampton♦
Feb 4 at 0:28
|
show 2 more comments
There is an important distinction between making a service highly available and making an individual machine highly available.
In most cases the goal is to make the service highly available, and availability of individual machines is only a means toward achieving that goal. However there is a limit in how far towards the goal you can get by improving availability of individual machines.
Even if you could take away all the downtime due to needing to update software the individual machines will still not be 100% available. Thus to increase the availability of the service above the availability of individual machines you have to design redundancy at a higher level. The last sentence of your question shows that at least in principle you know this.
If you do design a service to be more available than individual machines can deliver there is no longer pressure to achieve high availability of individual machines. Thus for highly available services there is no need to avoid reboots. Instead you can sacrifice some reliability of individual machines to make savings which can be put towards other areas where you can get much higher gains in reliability.
Once the high level system is design to be reliable in case of individual hardware components failing the live patching of kernels changes from being an advantage to becoming a risk.
It's a risk because there can be subtle differences between the behavior of a machine which was live patched and a machine which was booted with the newest kernel version. This can introduce a latent bug that can cause an outage next time a machine is rebooted. This risk is amplified by rebooting to get a clean slate being seen as a method to mitigate some outages.
One day you could have an outage where you think rebooting the machine might help. But as you reboot you are hit by the latent bug preventing the machine from coming back in the desired state. Live patching is not the only way such a latent bug can happen, it could as well happen due to something as mundane as a service having been enabled manually and never configured to start during boot, or having been configured to start too early such that it fails to come up due to unsatisfied dependencies.
For those reasons a highly available service may actually be easier to achieve with regular reboots of individual machines at a slow enough rate that you can detect problems and pause the sequence of reboots once problems do happen.
I liked your description of the risk; "patched vs booted with the newest kernel".. However, you didn't answer my question.. which I could rephrase, are there linux distros which ship with 'livepatch' out-of-the-box?
– user75126
Feb 2 at 18:39
@user75126 I see it as a feature which is more appropriate for client machines than for servers. Moreover asking which distributions support it sounds like a product recommendation question. To me that sounds like two reasons why rephrasing the question like that would make it off-topic for this site.
– kasperd
Feb 2 at 19:20
3
@user75126 Oracle's Ksplice has a free trial, and a free tier for Ubuntu and Fedora desktops (only, but they don't really enforce this). The problem is that creating the live patches is difficult to automate, and even the parts that can be automated are also time consuming. Creating these patches is a relatively labor intensive operation, and it's reasonable for companies to charge for that. I looked into what it would take to create the live patches myself, and noped right out of there. I haven't got that kind of time in my day.
– Michael Hampton♦
Feb 2 at 21:55
12
@user75126 It's really bad practice on this site to change the question title and body in a way that invalidates an existing answer. If you wanted to ask a different question, then ask a different question.
– Greg Schmit
Feb 3 at 2:26
2
@user75126 Thanks. I read your question, and I didn't think it was really an answer to it. I was merely commenting on why this is a paid service.
– Michael Hampton♦
Feb 4 at 0:28
|
show 2 more comments
There is an important distinction between making a service highly available and making an individual machine highly available.
In most cases the goal is to make the service highly available, and availability of individual machines is only a means toward achieving that goal. However there is a limit in how far towards the goal you can get by improving availability of individual machines.
Even if you could take away all the downtime due to needing to update software the individual machines will still not be 100% available. Thus to increase the availability of the service above the availability of individual machines you have to design redundancy at a higher level. The last sentence of your question shows that at least in principle you know this.
If you do design a service to be more available than individual machines can deliver there is no longer pressure to achieve high availability of individual machines. Thus for highly available services there is no need to avoid reboots. Instead you can sacrifice some reliability of individual machines to make savings which can be put towards other areas where you can get much higher gains in reliability.
Once the high level system is design to be reliable in case of individual hardware components failing the live patching of kernels changes from being an advantage to becoming a risk.
It's a risk because there can be subtle differences between the behavior of a machine which was live patched and a machine which was booted with the newest kernel version. This can introduce a latent bug that can cause an outage next time a machine is rebooted. This risk is amplified by rebooting to get a clean slate being seen as a method to mitigate some outages.
One day you could have an outage where you think rebooting the machine might help. But as you reboot you are hit by the latent bug preventing the machine from coming back in the desired state. Live patching is not the only way such a latent bug can happen, it could as well happen due to something as mundane as a service having been enabled manually and never configured to start during boot, or having been configured to start too early such that it fails to come up due to unsatisfied dependencies.
For those reasons a highly available service may actually be easier to achieve with regular reboots of individual machines at a slow enough rate that you can detect problems and pause the sequence of reboots once problems do happen.
There is an important distinction between making a service highly available and making an individual machine highly available.
In most cases the goal is to make the service highly available, and availability of individual machines is only a means toward achieving that goal. However there is a limit in how far towards the goal you can get by improving availability of individual machines.
Even if you could take away all the downtime due to needing to update software the individual machines will still not be 100% available. Thus to increase the availability of the service above the availability of individual machines you have to design redundancy at a higher level. The last sentence of your question shows that at least in principle you know this.
If you do design a service to be more available than individual machines can deliver there is no longer pressure to achieve high availability of individual machines. Thus for highly available services there is no need to avoid reboots. Instead you can sacrifice some reliability of individual machines to make savings which can be put towards other areas where you can get much higher gains in reliability.
Once the high level system is design to be reliable in case of individual hardware components failing the live patching of kernels changes from being an advantage to becoming a risk.
It's a risk because there can be subtle differences between the behavior of a machine which was live patched and a machine which was booted with the newest kernel version. This can introduce a latent bug that can cause an outage next time a machine is rebooted. This risk is amplified by rebooting to get a clean slate being seen as a method to mitigate some outages.
One day you could have an outage where you think rebooting the machine might help. But as you reboot you are hit by the latent bug preventing the machine from coming back in the desired state. Live patching is not the only way such a latent bug can happen, it could as well happen due to something as mundane as a service having been enabled manually and never configured to start during boot, or having been configured to start too early such that it fails to come up due to unsatisfied dependencies.
For those reasons a highly available service may actually be easier to achieve with regular reboots of individual machines at a slow enough rate that you can detect problems and pause the sequence of reboots once problems do happen.
edited Feb 2 at 16:33
answered Feb 2 at 16:23


kasperdkasperd
26.8k1252104
26.8k1252104
I liked your description of the risk; "patched vs booted with the newest kernel".. However, you didn't answer my question.. which I could rephrase, are there linux distros which ship with 'livepatch' out-of-the-box?
– user75126
Feb 2 at 18:39
@user75126 I see it as a feature which is more appropriate for client machines than for servers. Moreover asking which distributions support it sounds like a product recommendation question. To me that sounds like two reasons why rephrasing the question like that would make it off-topic for this site.
– kasperd
Feb 2 at 19:20
3
@user75126 Oracle's Ksplice has a free trial, and a free tier for Ubuntu and Fedora desktops (only, but they don't really enforce this). The problem is that creating the live patches is difficult to automate, and even the parts that can be automated are also time consuming. Creating these patches is a relatively labor intensive operation, and it's reasonable for companies to charge for that. I looked into what it would take to create the live patches myself, and noped right out of there. I haven't got that kind of time in my day.
– Michael Hampton♦
Feb 2 at 21:55
12
@user75126 It's really bad practice on this site to change the question title and body in a way that invalidates an existing answer. If you wanted to ask a different question, then ask a different question.
– Greg Schmit
Feb 3 at 2:26
2
@user75126 Thanks. I read your question, and I didn't think it was really an answer to it. I was merely commenting on why this is a paid service.
– Michael Hampton♦
Feb 4 at 0:28
|
show 2 more comments
I liked your description of the risk; "patched vs booted with the newest kernel".. However, you didn't answer my question.. which I could rephrase, are there linux distros which ship with 'livepatch' out-of-the-box?
– user75126
Feb 2 at 18:39
@user75126 I see it as a feature which is more appropriate for client machines than for servers. Moreover asking which distributions support it sounds like a product recommendation question. To me that sounds like two reasons why rephrasing the question like that would make it off-topic for this site.
– kasperd
Feb 2 at 19:20
3
@user75126 Oracle's Ksplice has a free trial, and a free tier for Ubuntu and Fedora desktops (only, but they don't really enforce this). The problem is that creating the live patches is difficult to automate, and even the parts that can be automated are also time consuming. Creating these patches is a relatively labor intensive operation, and it's reasonable for companies to charge for that. I looked into what it would take to create the live patches myself, and noped right out of there. I haven't got that kind of time in my day.
– Michael Hampton♦
Feb 2 at 21:55
12
@user75126 It's really bad practice on this site to change the question title and body in a way that invalidates an existing answer. If you wanted to ask a different question, then ask a different question.
– Greg Schmit
Feb 3 at 2:26
2
@user75126 Thanks. I read your question, and I didn't think it was really an answer to it. I was merely commenting on why this is a paid service.
– Michael Hampton♦
Feb 4 at 0:28
I liked your description of the risk; "patched vs booted with the newest kernel".. However, you didn't answer my question.. which I could rephrase, are there linux distros which ship with 'livepatch' out-of-the-box?
– user75126
Feb 2 at 18:39
I liked your description of the risk; "patched vs booted with the newest kernel".. However, you didn't answer my question.. which I could rephrase, are there linux distros which ship with 'livepatch' out-of-the-box?
– user75126
Feb 2 at 18:39
@user75126 I see it as a feature which is more appropriate for client machines than for servers. Moreover asking which distributions support it sounds like a product recommendation question. To me that sounds like two reasons why rephrasing the question like that would make it off-topic for this site.
– kasperd
Feb 2 at 19:20
@user75126 I see it as a feature which is more appropriate for client machines than for servers. Moreover asking which distributions support it sounds like a product recommendation question. To me that sounds like two reasons why rephrasing the question like that would make it off-topic for this site.
– kasperd
Feb 2 at 19:20
3
3
@user75126 Oracle's Ksplice has a free trial, and a free tier for Ubuntu and Fedora desktops (only, but they don't really enforce this). The problem is that creating the live patches is difficult to automate, and even the parts that can be automated are also time consuming. Creating these patches is a relatively labor intensive operation, and it's reasonable for companies to charge for that. I looked into what it would take to create the live patches myself, and noped right out of there. I haven't got that kind of time in my day.
– Michael Hampton♦
Feb 2 at 21:55
@user75126 Oracle's Ksplice has a free trial, and a free tier for Ubuntu and Fedora desktops (only, but they don't really enforce this). The problem is that creating the live patches is difficult to automate, and even the parts that can be automated are also time consuming. Creating these patches is a relatively labor intensive operation, and it's reasonable for companies to charge for that. I looked into what it would take to create the live patches myself, and noped right out of there. I haven't got that kind of time in my day.
– Michael Hampton♦
Feb 2 at 21:55
12
12
@user75126 It's really bad practice on this site to change the question title and body in a way that invalidates an existing answer. If you wanted to ask a different question, then ask a different question.
– Greg Schmit
Feb 3 at 2:26
@user75126 It's really bad practice on this site to change the question title and body in a way that invalidates an existing answer. If you wanted to ask a different question, then ask a different question.
– Greg Schmit
Feb 3 at 2:26
2
2
@user75126 Thanks. I read your question, and I didn't think it was really an answer to it. I was merely commenting on why this is a paid service.
– Michael Hampton♦
Feb 4 at 0:28
@user75126 Thanks. I read your question, and I didn't think it was really an answer to it. I was merely commenting on why this is a paid service.
– Michael Hampton♦
Feb 4 at 0:28
|
show 2 more comments
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952030%2fhow-to-avoid-downtime-with-linux%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Would a air-gapped server work as a machine that never needs rebooting? After all, if no one can access it, you never need to reboot it? ;) -- For example, a monitoring server on a nuclear power plant, that simply sounds an alarm if something is wrong. (Yes i'm aware this would likely be a dedicated system rather than a random server, but I'm using the example just to make the point that there is occasions when rebooting for 'security updates' maybe an entirely fastidious idea.
– djsmiley2k
Feb 3 at 14:40
3
@djsmiley2k That's one of those cases where a machine that you never reboot still doesn't give you sufficient availability. Instead you need redundancy.
– kasperd
Feb 3 at 15:13
@kasperd ok, so a cluster of never rebooted machines?
– djsmiley2k
Feb 3 at 16:25
3
@djsmiley2k My answer to the question already argues why I consider a cluster of machines that are rebooted one at a time to be more reliable than one which you never reboot.
– kasperd
Feb 3 at 17:36
2
What makes you think avoiding individual system downtime is preferable?
– warren
Feb 4 at 20:17