Hot clone a living Linux service
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
We need to hot clone a Linux service when it's alive, not just because of we can't reboot or something; it's just because of our special scenario (yeah, I've already read this answer, but it's a little bit different from mine Clone a working Linux server).
We have a calculation node, you can say an NLP calculating node which is running some models on it. When we start the node (with a service of course), the calculation will be horrible slow until we feed it several times. We called it warm-up.
Unfortunately, the warming-up job takes a long time for us to wait for (maybe our calculating finished before the node warmed up).
So, the problem comes, is there a stable way to hot clone out a Linux server to keep the node at the best performance so that we can clone and make it online in a shorter time?
linux clone
|
show 1 more comment
We need to hot clone a Linux service when it's alive, not just because of we can't reboot or something; it's just because of our special scenario (yeah, I've already read this answer, but it's a little bit different from mine Clone a working Linux server).
We have a calculation node, you can say an NLP calculating node which is running some models on it. When we start the node (with a service of course), the calculation will be horrible slow until we feed it several times. We called it warm-up.
Unfortunately, the warming-up job takes a long time for us to wait for (maybe our calculating finished before the node warmed up).
So, the problem comes, is there a stable way to hot clone out a Linux server to keep the node at the best performance so that we can clone and make it online in a shorter time?
linux clone
Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?
– TripeHound
Jan 31 at 15:40
13
Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.
– MSalters
Jan 31 at 17:05
fork() is one way to create more processes on a given machine while saving whatever startup overhead.
– Yet Another User
Jan 31 at 18:45
thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves
– chen steven
Feb 1 at 2:25
2
Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.
– Criggie
Feb 1 at 22:53
|
show 1 more comment
We need to hot clone a Linux service when it's alive, not just because of we can't reboot or something; it's just because of our special scenario (yeah, I've already read this answer, but it's a little bit different from mine Clone a working Linux server).
We have a calculation node, you can say an NLP calculating node which is running some models on it. When we start the node (with a service of course), the calculation will be horrible slow until we feed it several times. We called it warm-up.
Unfortunately, the warming-up job takes a long time for us to wait for (maybe our calculating finished before the node warmed up).
So, the problem comes, is there a stable way to hot clone out a Linux server to keep the node at the best performance so that we can clone and make it online in a shorter time?
linux clone
We need to hot clone a Linux service when it's alive, not just because of we can't reboot or something; it's just because of our special scenario (yeah, I've already read this answer, but it's a little bit different from mine Clone a working Linux server).
We have a calculation node, you can say an NLP calculating node which is running some models on it. When we start the node (with a service of course), the calculation will be horrible slow until we feed it several times. We called it warm-up.
Unfortunately, the warming-up job takes a long time for us to wait for (maybe our calculating finished before the node warmed up).
So, the problem comes, is there a stable way to hot clone out a Linux server to keep the node at the best performance so that we can clone and make it online in a shorter time?
linux clone
linux clone
edited Jan 31 at 23:48
Peter Mortensen
2,14242124
2,14242124
asked Jan 31 at 10:55
chen stevenchen steven
713
713
Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?
– TripeHound
Jan 31 at 15:40
13
Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.
– MSalters
Jan 31 at 17:05
fork() is one way to create more processes on a given machine while saving whatever startup overhead.
– Yet Another User
Jan 31 at 18:45
thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves
– chen steven
Feb 1 at 2:25
2
Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.
– Criggie
Feb 1 at 22:53
|
show 1 more comment
Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?
– TripeHound
Jan 31 at 15:40
13
Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.
– MSalters
Jan 31 at 17:05
fork() is one way to create more processes on a given machine while saving whatever startup overhead.
– Yet Another User
Jan 31 at 18:45
thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves
– chen steven
Feb 1 at 2:25
2
Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.
– Criggie
Feb 1 at 22:53
Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?
– TripeHound
Jan 31 at 15:40
Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?
– TripeHound
Jan 31 at 15:40
13
13
Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.
– MSalters
Jan 31 at 17:05
Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.
– MSalters
Jan 31 at 17:05
fork() is one way to create more processes on a given machine while saving whatever startup overhead.
– Yet Another User
Jan 31 at 18:45
fork() is one way to create more processes on a given machine while saving whatever startup overhead.
– Yet Another User
Jan 31 at 18:45
thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves
– chen steven
Feb 1 at 2:25
thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves
– chen steven
Feb 1 at 2:25
2
2
Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.
– Criggie
Feb 1 at 22:53
Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.
– Criggie
Feb 1 at 22:53
|
show 1 more comment
4 Answers
4
active
oldest
votes
Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.
This allows you to save the program's internal state to disk and stop the program, and
later, to restore the program to that state from the saved files.
To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.
criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check
on a particular machine to determine if the prerequisites for criu are present.
it looks awesome and I'll do some tests on this, thanks bro
– chen steven
Feb 1 at 2:35
From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.
– James_pic
Feb 1 at 14:08
@James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.
– Michael Hampton♦
Feb 1 at 14:23
add a comment |
It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).
Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.
– John Mahowald
Jan 31 at 21:30
It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?
– chen steven
Feb 1 at 2:38
3
When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.
– Spooler
Feb 1 at 3:24
@chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.
– cawwot
Feb 1 at 19:01
add a comment |
The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.
That the options are there does not mean a lot to what is running on the server.
You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.
It is not exactly clear what you meant with "until we feed it several times".
But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.
To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.
I'm sorry if is not the answer you expect, but the options you get are those.
It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming
– chen steven
Feb 1 at 2:45
add a comment |
There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.
However, what you seek to do is entirely plausible, as I have done it before. If you use dd
you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.
Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.
If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:
dd
to clone all data the new server.- Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.
- It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.
- After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.
You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.
This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.
The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.
Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.
Hope this helps, and good luck with your server move!
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "2"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f951665%2fhot-clone-a-living-linux-service%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.
This allows you to save the program's internal state to disk and stop the program, and
later, to restore the program to that state from the saved files.
To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.
criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check
on a particular machine to determine if the prerequisites for criu are present.
it looks awesome and I'll do some tests on this, thanks bro
– chen steven
Feb 1 at 2:35
From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.
– James_pic
Feb 1 at 14:08
@James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.
– Michael Hampton♦
Feb 1 at 14:23
add a comment |
Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.
This allows you to save the program's internal state to disk and stop the program, and
later, to restore the program to that state from the saved files.
To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.
criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check
on a particular machine to determine if the prerequisites for criu are present.
it looks awesome and I'll do some tests on this, thanks bro
– chen steven
Feb 1 at 2:35
From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.
– James_pic
Feb 1 at 14:08
@James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.
– Michael Hampton♦
Feb 1 at 14:23
add a comment |
Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.
This allows you to save the program's internal state to disk and stop the program, and
later, to restore the program to that state from the saved files.
To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.
criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check
on a particular machine to determine if the prerequisites for criu are present.
Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.
This allows you to save the program's internal state to disk and stop the program, and
later, to restore the program to that state from the saved files.
To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.
criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check
on a particular machine to determine if the prerequisites for criu are present.
edited Jan 31 at 16:01
answered Jan 31 at 13:50


Michael Hampton♦Michael Hampton
174k27319644
174k27319644
it looks awesome and I'll do some tests on this, thanks bro
– chen steven
Feb 1 at 2:35
From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.
– James_pic
Feb 1 at 14:08
@James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.
– Michael Hampton♦
Feb 1 at 14:23
add a comment |
it looks awesome and I'll do some tests on this, thanks bro
– chen steven
Feb 1 at 2:35
From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.
– James_pic
Feb 1 at 14:08
@James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.
– Michael Hampton♦
Feb 1 at 14:23
it looks awesome and I'll do some tests on this, thanks bro
– chen steven
Feb 1 at 2:35
it looks awesome and I'll do some tests on this, thanks bro
– chen steven
Feb 1 at 2:35
From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.
– James_pic
Feb 1 at 14:08
From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.
– James_pic
Feb 1 at 14:08
@James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.
– Michael Hampton♦
Feb 1 at 14:23
@James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.
– Michael Hampton♦
Feb 1 at 14:23
add a comment |
It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).
Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.
– John Mahowald
Jan 31 at 21:30
It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?
– chen steven
Feb 1 at 2:38
3
When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.
– Spooler
Feb 1 at 3:24
@chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.
– cawwot
Feb 1 at 19:01
add a comment |
It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).
Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.
– John Mahowald
Jan 31 at 21:30
It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?
– chen steven
Feb 1 at 2:38
3
When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.
– Spooler
Feb 1 at 3:24
@chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.
– cawwot
Feb 1 at 19:01
add a comment |
It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).
It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).
answered Jan 31 at 15:36


cawwotcawwot
1215
1215
Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.
– John Mahowald
Jan 31 at 21:30
It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?
– chen steven
Feb 1 at 2:38
3
When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.
– Spooler
Feb 1 at 3:24
@chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.
– cawwot
Feb 1 at 19:01
add a comment |
Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.
– John Mahowald
Jan 31 at 21:30
It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?
– chen steven
Feb 1 at 2:38
3
When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.
– Spooler
Feb 1 at 3:24
@chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.
– cawwot
Feb 1 at 19:01
Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.
– John Mahowald
Jan 31 at 21:30
Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.
– John Mahowald
Jan 31 at 21:30
It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?
– chen steven
Feb 1 at 2:38
It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?
– chen steven
Feb 1 at 2:38
3
3
When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.
– Spooler
Feb 1 at 3:24
When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.
– Spooler
Feb 1 at 3:24
@chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.
– cawwot
Feb 1 at 19:01
@chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.
– cawwot
Feb 1 at 19:01
add a comment |
The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.
That the options are there does not mean a lot to what is running on the server.
You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.
It is not exactly clear what you meant with "until we feed it several times".
But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.
To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.
I'm sorry if is not the answer you expect, but the options you get are those.
It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming
– chen steven
Feb 1 at 2:45
add a comment |
The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.
That the options are there does not mean a lot to what is running on the server.
You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.
It is not exactly clear what you meant with "until we feed it several times".
But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.
To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.
I'm sorry if is not the answer you expect, but the options you get are those.
It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming
– chen steven
Feb 1 at 2:45
add a comment |
The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.
That the options are there does not mean a lot to what is running on the server.
You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.
It is not exactly clear what you meant with "until we feed it several times".
But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.
To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.
I'm sorry if is not the answer you expect, but the options you get are those.
The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.
That the options are there does not mean a lot to what is running on the server.
You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.
It is not exactly clear what you meant with "until we feed it several times".
But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.
To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.
I'm sorry if is not the answer you expect, but the options you get are those.
edited Jan 31 at 23:54
Peter Mortensen
2,14242124
2,14242124
answered Jan 31 at 12:27


AtomiX84AtomiX84
3365
3365
It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming
– chen steven
Feb 1 at 2:45
add a comment |
It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming
– chen steven
Feb 1 at 2:45
It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming
– chen steven
Feb 1 at 2:45
It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming
– chen steven
Feb 1 at 2:45
add a comment |
There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.
However, what you seek to do is entirely plausible, as I have done it before. If you use dd
you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.
Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.
If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:
dd
to clone all data the new server.- Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.
- It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.
- After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.
You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.
This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.
The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.
Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.
Hope this helps, and good luck with your server move!
add a comment |
There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.
However, what you seek to do is entirely plausible, as I have done it before. If you use dd
you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.
Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.
If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:
dd
to clone all data the new server.- Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.
- It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.
- After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.
You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.
This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.
The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.
Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.
Hope this helps, and good luck with your server move!
add a comment |
There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.
However, what you seek to do is entirely plausible, as I have done it before. If you use dd
you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.
Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.
If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:
dd
to clone all data the new server.- Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.
- It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.
- After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.
You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.
This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.
The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.
Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.
Hope this helps, and good luck with your server move!
There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.
However, what you seek to do is entirely plausible, as I have done it before. If you use dd
you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.
Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.
If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:
dd
to clone all data the new server.- Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.
- It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.
- After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.
You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.
This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.
The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.
Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.
Hope this helps, and good luck with your server move!
answered Feb 1 at 18:40


serveraddictserveraddict
11110
11110
add a comment |
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f951665%2fhot-clone-a-living-linux-service%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?
– TripeHound
Jan 31 at 15:40
13
Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.
– MSalters
Jan 31 at 17:05
fork() is one way to create more processes on a given machine while saving whatever startup overhead.
– Yet Another User
Jan 31 at 18:45
thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves
– chen steven
Feb 1 at 2:25
2
Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.
– Criggie
Feb 1 at 22:53