Hot clone a living Linux service





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







14















We need to hot clone a Linux service when it's alive, not just because of we can't reboot or something; it's just because of our special scenario (yeah, I've already read this answer, but it's a little bit different from mine Clone a working Linux server).



We have a calculation node, you can say an NLP calculating node which is running some models on it. When we start the node (with a service of course), the calculation will be horrible slow until we feed it several times. We called it warm-up.



Unfortunately, the warming-up job takes a long time for us to wait for (maybe our calculating finished before the node warmed up).



So, the problem comes, is there a stable way to hot clone out a Linux server to keep the node at the best performance so that we can clone and make it online in a shorter time?










share|improve this question

























  • Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?

    – TripeHound
    Jan 31 at 15:40






  • 13





    Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.

    – MSalters
    Jan 31 at 17:05











  • fork() is one way to create more processes on a given machine while saving whatever startup overhead.

    – Yet Another User
    Jan 31 at 18:45











  • thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves

    – chen steven
    Feb 1 at 2:25








  • 2





    Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.

    – Criggie
    Feb 1 at 22:53


















14















We need to hot clone a Linux service when it's alive, not just because of we can't reboot or something; it's just because of our special scenario (yeah, I've already read this answer, but it's a little bit different from mine Clone a working Linux server).



We have a calculation node, you can say an NLP calculating node which is running some models on it. When we start the node (with a service of course), the calculation will be horrible slow until we feed it several times. We called it warm-up.



Unfortunately, the warming-up job takes a long time for us to wait for (maybe our calculating finished before the node warmed up).



So, the problem comes, is there a stable way to hot clone out a Linux server to keep the node at the best performance so that we can clone and make it online in a shorter time?










share|improve this question

























  • Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?

    – TripeHound
    Jan 31 at 15:40






  • 13





    Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.

    – MSalters
    Jan 31 at 17:05











  • fork() is one way to create more processes on a given machine while saving whatever startup overhead.

    – Yet Another User
    Jan 31 at 18:45











  • thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves

    – chen steven
    Feb 1 at 2:25








  • 2





    Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.

    – Criggie
    Feb 1 at 22:53














14












14








14


0






We need to hot clone a Linux service when it's alive, not just because of we can't reboot or something; it's just because of our special scenario (yeah, I've already read this answer, but it's a little bit different from mine Clone a working Linux server).



We have a calculation node, you can say an NLP calculating node which is running some models on it. When we start the node (with a service of course), the calculation will be horrible slow until we feed it several times. We called it warm-up.



Unfortunately, the warming-up job takes a long time for us to wait for (maybe our calculating finished before the node warmed up).



So, the problem comes, is there a stable way to hot clone out a Linux server to keep the node at the best performance so that we can clone and make it online in a shorter time?










share|improve this question
















We need to hot clone a Linux service when it's alive, not just because of we can't reboot or something; it's just because of our special scenario (yeah, I've already read this answer, but it's a little bit different from mine Clone a working Linux server).



We have a calculation node, you can say an NLP calculating node which is running some models on it. When we start the node (with a service of course), the calculation will be horrible slow until we feed it several times. We called it warm-up.



Unfortunately, the warming-up job takes a long time for us to wait for (maybe our calculating finished before the node warmed up).



So, the problem comes, is there a stable way to hot clone out a Linux server to keep the node at the best performance so that we can clone and make it online in a shorter time?







linux clone






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 31 at 23:48









Peter Mortensen

2,14242124




2,14242124










asked Jan 31 at 10:55









chen stevenchen steven

713




713













  • Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?

    – TripeHound
    Jan 31 at 15:40






  • 13





    Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.

    – MSalters
    Jan 31 at 17:05











  • fork() is one way to create more processes on a given machine while saving whatever startup overhead.

    – Yet Another User
    Jan 31 at 18:45











  • thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves

    – chen steven
    Feb 1 at 2:25








  • 2





    Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.

    – Criggie
    Feb 1 at 22:53



















  • Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?

    – TripeHound
    Jan 31 at 15:40






  • 13





    Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.

    – MSalters
    Jan 31 at 17:05











  • fork() is one way to create more processes on a given machine while saving whatever startup overhead.

    – Yet Another User
    Jan 31 at 18:45











  • thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves

    – chen steven
    Feb 1 at 2:25








  • 2





    Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.

    – Criggie
    Feb 1 at 22:53

















Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?

– TripeHound
Jan 31 at 15:40





Would visualising the machine, and taking a snap shot of the "warmed-up" state be any use?

– TripeHound
Jan 31 at 15:40




13




13





Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.

– MSalters
Jan 31 at 17:05





Do you understand why this warm-up happens? For instance, it might be a side-effect of the file cache. But some answers to cloning machines discard the file cache, because a cache by definition can be reconstructed from the underlying original.

– MSalters
Jan 31 at 17:05













fork() is one way to create more processes on a given machine while saving whatever startup overhead.

– Yet Another User
Jan 31 at 18:45





fork() is one way to create more processes on a given machine while saving whatever startup overhead.

– Yet Another User
Jan 31 at 18:45













thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves

– chen steven
Feb 1 at 2:25







thanks folks, @TripeHound, I've asked a friend of mine who works in VMWare, and he said it looks impossible for them to simply snapshot the "warmed-up" state, neither some mirror stuff. MSalters, I'm not 100% sure what happens during the warming up, but it looks like after the service's up, some lazy loading job works after the calculating job involves

– chen steven
Feb 1 at 2:25






2




2





Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.

– Criggie
Feb 1 at 22:53





Unaware of your background setup, but this smells like a situation where your server must never go down. This suggests that your host's kernel could be ancient and that updates have never been applied. Perhaps this is an indicator of a systemic design flaw that needs to be considered.

– Criggie
Feb 1 at 22:53










4 Answers
4






active

oldest

votes


















28














Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.



This allows you to save the program's internal state to disk and stop the program, and
later, to restore the program to that state from the saved files.



To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.



criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check on a particular machine to determine if the prerequisites for criu are present.






share|improve this answer


























  • it looks awesome and I'll do some tests on this, thanks bro

    – chen steven
    Feb 1 at 2:35











  • From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.

    – James_pic
    Feb 1 at 14:08











  • @James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.

    – Michael Hampton
    Feb 1 at 14:23



















12














It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).






share|improve this answer
























  • Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.

    – John Mahowald
    Jan 31 at 21:30











  • It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?

    – chen steven
    Feb 1 at 2:38






  • 3





    When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.

    – Spooler
    Feb 1 at 3:24











  • @chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.

    – cawwot
    Feb 1 at 19:01



















3














The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.



That the options are there does not mean a lot to what is running on the server.
You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.



It is not exactly clear what you meant with "until we feed it several times".



But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.



To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.



I'm sorry if is not the answer you expect, but the options you get are those.






share|improve this answer


























  • It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming

    – chen steven
    Feb 1 at 2:45



















1














There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.



However, what you seek to do is entirely plausible, as I have done it before. If you use dd you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.



Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.



If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:





  • dd to clone all data the new server.

  • Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.

  • It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.

  • After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.


You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.



This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.



The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.



Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.



Hope this helps, and good luck with your server move!






share|improve this answer
























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "2"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f951665%2fhot-clone-a-living-linux-service%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    28














    Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.



    This allows you to save the program's internal state to disk and stop the program, and
    later, to restore the program to that state from the saved files.



    To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.



    criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check on a particular machine to determine if the prerequisites for criu are present.






    share|improve this answer


























    • it looks awesome and I'll do some tests on this, thanks bro

      – chen steven
      Feb 1 at 2:35











    • From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.

      – James_pic
      Feb 1 at 14:08











    • @James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.

      – Michael Hampton
      Feb 1 at 14:23
















    28














    Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.



    This allows you to save the program's internal state to disk and stop the program, and
    later, to restore the program to that state from the saved files.



    To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.



    criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check on a particular machine to determine if the prerequisites for criu are present.






    share|improve this answer


























    • it looks awesome and I'll do some tests on this, thanks bro

      – chen steven
      Feb 1 at 2:35











    • From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.

      – James_pic
      Feb 1 at 14:08











    • @James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.

      – Michael Hampton
      Feb 1 at 14:23














    28












    28








    28







    Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.



    This allows you to save the program's internal state to disk and stop the program, and
    later, to restore the program to that state from the saved files.



    To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.



    criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check on a particular machine to determine if the prerequisites for criu are present.






    share|improve this answer















    Maybe you can't "hot clone" a whole server (you can, but only if it's a virtual machine), but you can freeze and restore a single process, with criu, Checkpoint/Restore in Userspace.



    This allows you to save the program's internal state to disk and stop the program, and
    later, to restore the program to that state from the saved files.



    To support your desired operation, you can copy the files representing the saved program to another server, and restore it there.



    criu requires a recent kernel with various features compiled in, so older Linux distributions might not work. You can run criu check on a particular machine to determine if the prerequisites for criu are present.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 31 at 16:01

























    answered Jan 31 at 13:50









    Michael HamptonMichael Hampton

    174k27319644




    174k27319644













    • it looks awesome and I'll do some tests on this, thanks bro

      – chen steven
      Feb 1 at 2:35











    • From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.

      – James_pic
      Feb 1 at 14:08











    • @James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.

      – Michael Hampton
      Feb 1 at 14:23



















    • it looks awesome and I'll do some tests on this, thanks bro

      – chen steven
      Feb 1 at 2:35











    • From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.

      – James_pic
      Feb 1 at 14:08











    • @James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.

      – Michael Hampton
      Feb 1 at 14:23

















    it looks awesome and I'll do some tests on this, thanks bro

    – chen steven
    Feb 1 at 2:35





    it looks awesome and I'll do some tests on this, thanks bro

    – chen steven
    Feb 1 at 2:35













    From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.

    – James_pic
    Feb 1 at 14:08





    From your experience, how well does this work in practice? Looking at the limitations criu lists (which are pretty much the ones I'd expect - this is a hard problem), I get the feeling this is unlikely to work with applications that weren't designed with this use case in mind.

    – James_pic
    Feb 1 at 14:08













    @James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.

    – Michael Hampton
    Feb 1 at 14:23





    @James_pic It's been perhaps a year since I looked at it seriously, since I don't currently have a use for it. For a daemon that's just accepting connections and doing some computation (e.g. the OP's machine learning job, or a web server) it works pretty well.

    – Michael Hampton
    Feb 1 at 14:23













    12














    It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).






    share|improve this answer
























    • Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.

      – John Mahowald
      Jan 31 at 21:30











    • It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?

      – chen steven
      Feb 1 at 2:38






    • 3





      When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.

      – Spooler
      Feb 1 at 3:24











    • @chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.

      – cawwot
      Feb 1 at 19:01
















    12














    It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).






    share|improve this answer
























    • Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.

      – John Mahowald
      Jan 31 at 21:30











    • It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?

      – chen steven
      Feb 1 at 2:38






    • 3





      When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.

      – Spooler
      Feb 1 at 3:24











    • @chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.

      – cawwot
      Feb 1 at 19:01














    12












    12








    12







    It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).






    share|improve this answer













    It may be a bit out of scope of your current environment, but the industry standard way of doing this is to virtualize your server. Many virtualization hosts (VMware, virtualbox, etc.) allow “snapshots” that save the state of a server, which can then be cloned into new instances. These new instances will have exactly the same state as the original, down to running processes. Of course you’ll want to make sure that the software that you’re running will still perform correctly in a virtual environment (CUDA/ GPU calculation springs to mind).







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 31 at 15:36









    cawwotcawwot

    1215




    1215













    • Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.

      – John Mahowald
      Jan 31 at 21:30











    • It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?

      – chen steven
      Feb 1 at 2:38






    • 3





      When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.

      – Spooler
      Feb 1 at 3:24











    • @chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.

      – cawwot
      Feb 1 at 19:01



















    • Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.

      – John Mahowald
      Jan 31 at 21:30











    • It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?

      – chen steven
      Feb 1 at 2:38






    • 3





      When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.

      – Spooler
      Feb 1 at 3:24











    • @chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.

      – cawwot
      Feb 1 at 19:01

















    Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.

    – John Mahowald
    Jan 31 at 21:30





    Virtualization is great, until the software (or its dependencies) requires an update, and does not provide a graceful reload mechanism. A VM snapshot or live migration is running the old code.

    – John Mahowald
    Jan 31 at 21:30













    It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?

    – chen steven
    Feb 1 at 2:38





    It's both acceptive for me to run the project in a "real" machine or virtualization host, and we can take several ways to handle the "old" code stuff, maybe A/B test or rolling update .etc. But are you sure the snapshots can totally clone the warmed-up state of my working node?

    – chen steven
    Feb 1 at 2:38




    3




    3





    When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.

    – Spooler
    Feb 1 at 3:24





    When you "live-migrate" a machine, it needs to be paused. While it is paused, its memory is copied 1:1 to another machine in a cluster, where it is unpaused -- intact. This can take some time depending on how much memory is in use, and how fast the network fabric is. You may be able to use this method if the amount of downtime it invokes is low enough for your needs.

    – Spooler
    Feb 1 at 3:24













    @chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.

    – cawwot
    Feb 1 at 19:01





    @chensteven I've most recently come from a virtualbox environment. That was some time ago, but from what I remember a running snapshot contains the exact state of the vm at the time the snapshot was taken, including running processes and the contents of the memory. This snapshot can then be cloned to a new vm, giving you two machines in exactly the same state.

    – cawwot
    Feb 1 at 19:01











    3














    The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.



    That the options are there does not mean a lot to what is running on the server.
    You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.



    It is not exactly clear what you meant with "until we feed it several times".



    But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.



    To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.



    I'm sorry if is not the answer you expect, but the options you get are those.






    share|improve this answer


























    • It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming

      – chen steven
      Feb 1 at 2:45
















    3














    The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.



    That the options are there does not mean a lot to what is running on the server.
    You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.



    It is not exactly clear what you meant with "until we feed it several times".



    But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.



    To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.



    I'm sorry if is not the answer you expect, but the options you get are those.






    share|improve this answer


























    • It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming

      – chen steven
      Feb 1 at 2:45














    3












    3








    3







    The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.



    That the options are there does not mean a lot to what is running on the server.
    You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.



    It is not exactly clear what you meant with "until we feed it several times".



    But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.



    To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.



    I'm sorry if is not the answer you expect, but the options you get are those.






    share|improve this answer















    The question you mention refer to a link, http://www.linuxfocus.org/English/March2005/article370.shtml, which describe all the ways I had imagined to do your requests.



    That the options are there does not mean a lot to what is running on the server.
    You have to consider that all the files that could change in the cloning process could be inconsistent files on the target machine. On that post you provide they talk about databases, and cloning it like that do not give any insurance of data integrity.



    It is not exactly clear what you meant with "until we feed it several times".



    But if I understood well what you ask, you have to consider that in order to clone a system it needs the time to copy and calculate resources.



    To perform an "ON/OF" or better called an active/backup environment, the server has to be properly configured in the cluster.



    I'm sorry if is not the answer you expect, but the options you get are those.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 31 at 23:54









    Peter Mortensen

    2,14242124




    2,14242124










    answered Jan 31 at 12:27









    AtomiX84AtomiX84

    3365




    3365













    • It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming

      – chen steven
      Feb 1 at 2:45



















    • It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming

      – chen steven
      Feb 1 at 2:45

















    It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming

    – chen steven
    Feb 1 at 2:45





    It's my fault to make you a little confuse here, the "feed" stuff means, after my service start up, we need to invoke the calculation tasks several times to ensure the node is "warmed-up" into the top performance. So the problem here is like the dynamic clone or expansion for our living jobs as if the large numbers of requests hitting our system, we won't have enough time to set up new calculation nodes (the warming-up take too much time) to handle them, u know, just like the waves coming

    – chen steven
    Feb 1 at 2:45











    1














    There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.



    However, what you seek to do is entirely plausible, as I have done it before. If you use dd you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.



    Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.



    If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:





    • dd to clone all data the new server.

    • Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.

    • It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.

    • After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.


    You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.



    This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.



    The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.



    Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.



    Hope this helps, and good luck with your server move!






    share|improve this answer




























      1














      There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.



      However, what you seek to do is entirely plausible, as I have done it before. If you use dd you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.



      Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.



      If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:





      • dd to clone all data the new server.

      • Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.

      • It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.

      • After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.


      You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.



      This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.



      The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.



      Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.



      Hope this helps, and good luck with your server move!






      share|improve this answer


























        1












        1








        1







        There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.



        However, what you seek to do is entirely plausible, as I have done it before. If you use dd you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.



        Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.



        If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:





        • dd to clone all data the new server.

        • Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.

        • It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.

        • After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.


        You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.



        This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.



        The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.



        Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.



        Hope this helps, and good luck with your server move!






        share|improve this answer













        There are many potential issues with what you are trying to do, and of course as you know it would be best to take the server offline and clone it while no data is being dynamically stored.



        However, what you seek to do is entirely plausible, as I have done it before. If you use dd you can clone the full server at the block level to another drive or another server. It will however take some additional setup on the new server, and you probably won't be able to simply turn the other off and the new one on. For us to understand this, we need to know a few things about your server hardware and software.



        Firstly, in order to determine the best data strategy, it would be helpful to know what is updating regularly. Do you have an SQL server which is dynamically updating but have static content? Alternatively, do you have a team of developers over a subversioning system like git sending constant data updates to your content? Depending on what is updating will determine the best full course of action.



        If for example, it is only the SQL which is updating regularly, then you can migrate to a new server while that server is live in the following manner:





        • dd to clone all data the new server.

        • Start setting up the new server, it may take some work especially if it is different hardware, but still may be faster than setting up from scratch.

        • It may also take some DNS changes, since you can't use the same DNS on another server if you need to work on the second server live while the first server is still live.

        • After the new server is complete and running independently, take a final backup of the sql server on the original server, and import it into the new server.


        You may need to take your original server offline temporarily to ensure that you don't miss any data. Alternatively, to have zero downtime, you could make the second live, point the dns to the new server, and then update any dns entries manually on the new server, so there is effectively zero downtime. This is more hassle than a few minutes of downtime though to backup the sql and restore to the new server, but may be necessary for zero downtime.



        This of course is only one use case example, and depending on your configuration and several variables, you may need to create your own strategy for the migration based on your specific case.



        The other issue is in regards to the server hardware configuration. Is the new server 100% identical in hardware to the old server? If so, then the setup is easier. However, if on the far other hand, it is a totally, completely different hardware configuration, then you may need to implement a different strategy which is to simply set up the second server ahead of time, then backup all your data and sql databases on the first server and manually migrate them over, changing configuration as desired.



        Server migration is by no means trivial, and in order to have a successful move, you need to have deep knowledge of servers, or staff on hand who have the same. In any case, it is highly recommended that you immediately take a full backup and store it on a third source, even on your local computer, so that if the worst case scenario happens (both servers crash and die irreparably), you still have another copy of your data to rebuild your servers with.



        Hope this helps, and good luck with your server move!







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 1 at 18:40









        serveraddictserveraddict

        11110




        11110






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Server Fault!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f951665%2fhot-clone-a-living-linux-service%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            Npm cannot find a required file even through it is in the searched directory

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith