Remove a file from whole git history












1















I know this question has already been asked, but in every answer, I found the situation is slightly different from mine and I don't see how to adapt it.



So here is the problem:



I cloned a repository and added a folder to work in it. In this folder, I added .csv files and .py files that use the csv ones.
I tried to push this but realised it was taking to long as 2 csv files are very big. So i



git rm files


and then commit. I tried to push again and only then realised that removing a file doesn't remove it from the git history .//
So now, from the last completed push, I have 2 commits: 1 where I added the files, 1 where I deleted some .csv.



I would like your help to delete the last 2 commits. Is that feasible?
Thanks










share|improve this question



























    1















    I know this question has already been asked, but in every answer, I found the situation is slightly different from mine and I don't see how to adapt it.



    So here is the problem:



    I cloned a repository and added a folder to work in it. In this folder, I added .csv files and .py files that use the csv ones.
    I tried to push this but realised it was taking to long as 2 csv files are very big. So i



    git rm files


    and then commit. I tried to push again and only then realised that removing a file doesn't remove it from the git history .//
    So now, from the last completed push, I have 2 commits: 1 where I added the files, 1 where I deleted some .csv.



    I would like your help to delete the last 2 commits. Is that feasible?
    Thanks










    share|improve this question

























      1












      1








      1








      I know this question has already been asked, but in every answer, I found the situation is slightly different from mine and I don't see how to adapt it.



      So here is the problem:



      I cloned a repository and added a folder to work in it. In this folder, I added .csv files and .py files that use the csv ones.
      I tried to push this but realised it was taking to long as 2 csv files are very big. So i



      git rm files


      and then commit. I tried to push again and only then realised that removing a file doesn't remove it from the git history .//
      So now, from the last completed push, I have 2 commits: 1 where I added the files, 1 where I deleted some .csv.



      I would like your help to delete the last 2 commits. Is that feasible?
      Thanks










      share|improve this question














      I know this question has already been asked, but in every answer, I found the situation is slightly different from mine and I don't see how to adapt it.



      So here is the problem:



      I cloned a repository and added a folder to work in it. In this folder, I added .csv files and .py files that use the csv ones.
      I tried to push this but realised it was taking to long as 2 csv files are very big. So i



      git rm files


      and then commit. I tried to push again and only then realised that removing a file doesn't remove it from the git history .//
      So now, from the last completed push, I have 2 commits: 1 where I added the files, 1 where I deleted some .csv.



      I would like your help to delete the last 2 commits. Is that feasible?
      Thanks







      git commit git-rm






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Sep 25 '18 at 17:02









      WendyWendy

      315




      315
























          3 Answers
          3






          active

          oldest

          votes


















          1














          I find the first example in the git filter-branch doc very fitting to your context. Take a look (source) :




          Suppose you want to remove a file (containing confidential information or copyright violation) from all commits:




          git filter-branch --tree-filter 'rm filename' HEAD
          # and see also the variant further in the example description
          git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD


          (see the details on the doc page, I refrained from copy-pasting the whole thing here)






          share|improve this answer

































            1














            filter-branch, as has been advised is fine if we are talking about biiiig histories. If we are talking of only a handful of revisions, you can do it (remove the files) by just amend the revision where you added the file and cherry-pick, or rebase interactive.



            One example..... say I added file a.txt on master~2. I don't want it on the history anymore.



            git checkout master~2
            git rm --cached a.txt
            git commit --amend --no-edit
            git cherry-pick master~2..master
            git branch -f master # point master in this revision
            git checkout master



            That should be enough.






            share|improve this answer































              1















              ... I would like ... to delete the last 2 commits. Is that feasible?




              You can't quite delete commits, but you can easily tell Git to forget them.



              The way this works is pretty simple, in the end. We start by noting that each commit saves a snapshot, and also stores the hash ID of its parent commit (along with your commit log message and your name as author and so on). This forms a backwards-pointing chain of commits.



              If we let single uppercase letters stand in for commit hash IDs, we can draw this chain:



              ... <-F  <-G  <-H   <--master


              Note that the branch name, master in this case, stores the hash ID of the last commit in the chain. (When something stores the hash ID of a commit, we say that this thing points to the commit, hence the arrows. The name master points to H, H points to G, and so on.)



              The way Git finds these commits is to read the hash ID of H out of master, which locates commit H, then read commit H and show it. Then, having read H, Git has the hash ID of commit G, so Git can read G and show it, and so on.



              When we make a new commit, Git actually does this by:




              • writing out the snapshot;

              • writing out the author and log message and so on;

              • having the new point back to the current commit;

              • and last, but most important, writing the hash ID of the new commit into the branch name.


              So if we had:



              ...--F--G--H


              and we added --I:



              ...--F--G--H--I


              then Git has changed the name master to store the hash ID of commit I. Eventually we have:



              ...--F--G--H--I--J   <-- master


              If we made several unwanted commits, we can tell Git: Re-set the name master to point to commit H instead of commit J. There are several ways to do that, but the first one to reach for, in this case, is git reset --hard (while we have master checked out, and be sure you don't have anything you are concerned with losing, because git reset --hard tells Git to throw everything out):



              git checkout master
              git reset --hard HEAD~2


              The ~2 suffix tells Git to count back two steps—technically, two first parent steps, which matters when we have some merge commits in our chain, but here, we don't so it does not matter. If master currently points to J, that has Git count back twice: J to I, then I to H. Git then replaces our work with the contents from commit H and makes the name, master, point to H instead of J:



                           I--J
              /
              ...--F--G--H <-- master


              Now that J is hard to find, it appears to be deleted.



              The drawback to this is that if we've had our Git tell some other Git: Here, take copies of commits I and J, that other Git has the two commits and will re-introduce them to our own Git even after our Git has forgotten them. But if we have never successfully sent the two commits anywhere else, we're the only one who has them, so if we forget them, they're as good as gone.



              (If we have pushed them, we can have our Git, and their Git, and every other Git that has picked them up since then, all forget them, and then they will be gone. But obviously this gets hard quickly.)






              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52503380%2fremove-a-file-from-whole-git-history%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                1














                I find the first example in the git filter-branch doc very fitting to your context. Take a look (source) :




                Suppose you want to remove a file (containing confidential information or copyright violation) from all commits:




                git filter-branch --tree-filter 'rm filename' HEAD
                # and see also the variant further in the example description
                git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD


                (see the details on the doc page, I refrained from copy-pasting the whole thing here)






                share|improve this answer






























                  1














                  I find the first example in the git filter-branch doc very fitting to your context. Take a look (source) :




                  Suppose you want to remove a file (containing confidential information or copyright violation) from all commits:




                  git filter-branch --tree-filter 'rm filename' HEAD
                  # and see also the variant further in the example description
                  git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD


                  (see the details on the doc page, I refrained from copy-pasting the whole thing here)






                  share|improve this answer




























                    1












                    1








                    1







                    I find the first example in the git filter-branch doc very fitting to your context. Take a look (source) :




                    Suppose you want to remove a file (containing confidential information or copyright violation) from all commits:




                    git filter-branch --tree-filter 'rm filename' HEAD
                    # and see also the variant further in the example description
                    git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD


                    (see the details on the doc page, I refrained from copy-pasting the whole thing here)






                    share|improve this answer















                    I find the first example in the git filter-branch doc very fitting to your context. Take a look (source) :




                    Suppose you want to remove a file (containing confidential information or copyright violation) from all commits:




                    git filter-branch --tree-filter 'rm filename' HEAD
                    # and see also the variant further in the example description
                    git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD


                    (see the details on the doc page, I refrained from copy-pasting the whole thing here)







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Sep 25 '18 at 17:15

























                    answered Sep 25 '18 at 17:08









                    RomainValeriRomainValeri

                    2,51721228




                    2,51721228

























                        1














                        filter-branch, as has been advised is fine if we are talking about biiiig histories. If we are talking of only a handful of revisions, you can do it (remove the files) by just amend the revision where you added the file and cherry-pick, or rebase interactive.



                        One example..... say I added file a.txt on master~2. I don't want it on the history anymore.



                        git checkout master~2
                        git rm --cached a.txt
                        git commit --amend --no-edit
                        git cherry-pick master~2..master
                        git branch -f master # point master in this revision
                        git checkout master



                        That should be enough.






                        share|improve this answer




























                          1














                          filter-branch, as has been advised is fine if we are talking about biiiig histories. If we are talking of only a handful of revisions, you can do it (remove the files) by just amend the revision where you added the file and cherry-pick, or rebase interactive.



                          One example..... say I added file a.txt on master~2. I don't want it on the history anymore.



                          git checkout master~2
                          git rm --cached a.txt
                          git commit --amend --no-edit
                          git cherry-pick master~2..master
                          git branch -f master # point master in this revision
                          git checkout master



                          That should be enough.






                          share|improve this answer


























                            1












                            1








                            1







                            filter-branch, as has been advised is fine if we are talking about biiiig histories. If we are talking of only a handful of revisions, you can do it (remove the files) by just amend the revision where you added the file and cherry-pick, or rebase interactive.



                            One example..... say I added file a.txt on master~2. I don't want it on the history anymore.



                            git checkout master~2
                            git rm --cached a.txt
                            git commit --amend --no-edit
                            git cherry-pick master~2..master
                            git branch -f master # point master in this revision
                            git checkout master



                            That should be enough.






                            share|improve this answer













                            filter-branch, as has been advised is fine if we are talking about biiiig histories. If we are talking of only a handful of revisions, you can do it (remove the files) by just amend the revision where you added the file and cherry-pick, or rebase interactive.



                            One example..... say I added file a.txt on master~2. I don't want it on the history anymore.



                            git checkout master~2
                            git rm --cached a.txt
                            git commit --amend --no-edit
                            git cherry-pick master~2..master
                            git branch -f master # point master in this revision
                            git checkout master



                            That should be enough.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Sep 25 '18 at 17:52









                            eftshift0eftshift0

                            4,6391019




                            4,6391019























                                1















                                ... I would like ... to delete the last 2 commits. Is that feasible?




                                You can't quite delete commits, but you can easily tell Git to forget them.



                                The way this works is pretty simple, in the end. We start by noting that each commit saves a snapshot, and also stores the hash ID of its parent commit (along with your commit log message and your name as author and so on). This forms a backwards-pointing chain of commits.



                                If we let single uppercase letters stand in for commit hash IDs, we can draw this chain:



                                ... <-F  <-G  <-H   <--master


                                Note that the branch name, master in this case, stores the hash ID of the last commit in the chain. (When something stores the hash ID of a commit, we say that this thing points to the commit, hence the arrows. The name master points to H, H points to G, and so on.)



                                The way Git finds these commits is to read the hash ID of H out of master, which locates commit H, then read commit H and show it. Then, having read H, Git has the hash ID of commit G, so Git can read G and show it, and so on.



                                When we make a new commit, Git actually does this by:




                                • writing out the snapshot;

                                • writing out the author and log message and so on;

                                • having the new point back to the current commit;

                                • and last, but most important, writing the hash ID of the new commit into the branch name.


                                So if we had:



                                ...--F--G--H


                                and we added --I:



                                ...--F--G--H--I


                                then Git has changed the name master to store the hash ID of commit I. Eventually we have:



                                ...--F--G--H--I--J   <-- master


                                If we made several unwanted commits, we can tell Git: Re-set the name master to point to commit H instead of commit J. There are several ways to do that, but the first one to reach for, in this case, is git reset --hard (while we have master checked out, and be sure you don't have anything you are concerned with losing, because git reset --hard tells Git to throw everything out):



                                git checkout master
                                git reset --hard HEAD~2


                                The ~2 suffix tells Git to count back two steps—technically, two first parent steps, which matters when we have some merge commits in our chain, but here, we don't so it does not matter. If master currently points to J, that has Git count back twice: J to I, then I to H. Git then replaces our work with the contents from commit H and makes the name, master, point to H instead of J:



                                             I--J
                                /
                                ...--F--G--H <-- master


                                Now that J is hard to find, it appears to be deleted.



                                The drawback to this is that if we've had our Git tell some other Git: Here, take copies of commits I and J, that other Git has the two commits and will re-introduce them to our own Git even after our Git has forgotten them. But if we have never successfully sent the two commits anywhere else, we're the only one who has them, so if we forget them, they're as good as gone.



                                (If we have pushed them, we can have our Git, and their Git, and every other Git that has picked them up since then, all forget them, and then they will be gone. But obviously this gets hard quickly.)






                                share|improve this answer




























                                  1















                                  ... I would like ... to delete the last 2 commits. Is that feasible?




                                  You can't quite delete commits, but you can easily tell Git to forget them.



                                  The way this works is pretty simple, in the end. We start by noting that each commit saves a snapshot, and also stores the hash ID of its parent commit (along with your commit log message and your name as author and so on). This forms a backwards-pointing chain of commits.



                                  If we let single uppercase letters stand in for commit hash IDs, we can draw this chain:



                                  ... <-F  <-G  <-H   <--master


                                  Note that the branch name, master in this case, stores the hash ID of the last commit in the chain. (When something stores the hash ID of a commit, we say that this thing points to the commit, hence the arrows. The name master points to H, H points to G, and so on.)



                                  The way Git finds these commits is to read the hash ID of H out of master, which locates commit H, then read commit H and show it. Then, having read H, Git has the hash ID of commit G, so Git can read G and show it, and so on.



                                  When we make a new commit, Git actually does this by:




                                  • writing out the snapshot;

                                  • writing out the author and log message and so on;

                                  • having the new point back to the current commit;

                                  • and last, but most important, writing the hash ID of the new commit into the branch name.


                                  So if we had:



                                  ...--F--G--H


                                  and we added --I:



                                  ...--F--G--H--I


                                  then Git has changed the name master to store the hash ID of commit I. Eventually we have:



                                  ...--F--G--H--I--J   <-- master


                                  If we made several unwanted commits, we can tell Git: Re-set the name master to point to commit H instead of commit J. There are several ways to do that, but the first one to reach for, in this case, is git reset --hard (while we have master checked out, and be sure you don't have anything you are concerned with losing, because git reset --hard tells Git to throw everything out):



                                  git checkout master
                                  git reset --hard HEAD~2


                                  The ~2 suffix tells Git to count back two steps—technically, two first parent steps, which matters when we have some merge commits in our chain, but here, we don't so it does not matter. If master currently points to J, that has Git count back twice: J to I, then I to H. Git then replaces our work with the contents from commit H and makes the name, master, point to H instead of J:



                                               I--J
                                  /
                                  ...--F--G--H <-- master


                                  Now that J is hard to find, it appears to be deleted.



                                  The drawback to this is that if we've had our Git tell some other Git: Here, take copies of commits I and J, that other Git has the two commits and will re-introduce them to our own Git even after our Git has forgotten them. But if we have never successfully sent the two commits anywhere else, we're the only one who has them, so if we forget them, they're as good as gone.



                                  (If we have pushed them, we can have our Git, and their Git, and every other Git that has picked them up since then, all forget them, and then they will be gone. But obviously this gets hard quickly.)






                                  share|improve this answer


























                                    1












                                    1








                                    1








                                    ... I would like ... to delete the last 2 commits. Is that feasible?




                                    You can't quite delete commits, but you can easily tell Git to forget them.



                                    The way this works is pretty simple, in the end. We start by noting that each commit saves a snapshot, and also stores the hash ID of its parent commit (along with your commit log message and your name as author and so on). This forms a backwards-pointing chain of commits.



                                    If we let single uppercase letters stand in for commit hash IDs, we can draw this chain:



                                    ... <-F  <-G  <-H   <--master


                                    Note that the branch name, master in this case, stores the hash ID of the last commit in the chain. (When something stores the hash ID of a commit, we say that this thing points to the commit, hence the arrows. The name master points to H, H points to G, and so on.)



                                    The way Git finds these commits is to read the hash ID of H out of master, which locates commit H, then read commit H and show it. Then, having read H, Git has the hash ID of commit G, so Git can read G and show it, and so on.



                                    When we make a new commit, Git actually does this by:




                                    • writing out the snapshot;

                                    • writing out the author and log message and so on;

                                    • having the new point back to the current commit;

                                    • and last, but most important, writing the hash ID of the new commit into the branch name.


                                    So if we had:



                                    ...--F--G--H


                                    and we added --I:



                                    ...--F--G--H--I


                                    then Git has changed the name master to store the hash ID of commit I. Eventually we have:



                                    ...--F--G--H--I--J   <-- master


                                    If we made several unwanted commits, we can tell Git: Re-set the name master to point to commit H instead of commit J. There are several ways to do that, but the first one to reach for, in this case, is git reset --hard (while we have master checked out, and be sure you don't have anything you are concerned with losing, because git reset --hard tells Git to throw everything out):



                                    git checkout master
                                    git reset --hard HEAD~2


                                    The ~2 suffix tells Git to count back two steps—technically, two first parent steps, which matters when we have some merge commits in our chain, but here, we don't so it does not matter. If master currently points to J, that has Git count back twice: J to I, then I to H. Git then replaces our work with the contents from commit H and makes the name, master, point to H instead of J:



                                                 I--J
                                    /
                                    ...--F--G--H <-- master


                                    Now that J is hard to find, it appears to be deleted.



                                    The drawback to this is that if we've had our Git tell some other Git: Here, take copies of commits I and J, that other Git has the two commits and will re-introduce them to our own Git even after our Git has forgotten them. But if we have never successfully sent the two commits anywhere else, we're the only one who has them, so if we forget them, they're as good as gone.



                                    (If we have pushed them, we can have our Git, and their Git, and every other Git that has picked them up since then, all forget them, and then they will be gone. But obviously this gets hard quickly.)






                                    share|improve this answer














                                    ... I would like ... to delete the last 2 commits. Is that feasible?




                                    You can't quite delete commits, but you can easily tell Git to forget them.



                                    The way this works is pretty simple, in the end. We start by noting that each commit saves a snapshot, and also stores the hash ID of its parent commit (along with your commit log message and your name as author and so on). This forms a backwards-pointing chain of commits.



                                    If we let single uppercase letters stand in for commit hash IDs, we can draw this chain:



                                    ... <-F  <-G  <-H   <--master


                                    Note that the branch name, master in this case, stores the hash ID of the last commit in the chain. (When something stores the hash ID of a commit, we say that this thing points to the commit, hence the arrows. The name master points to H, H points to G, and so on.)



                                    The way Git finds these commits is to read the hash ID of H out of master, which locates commit H, then read commit H and show it. Then, having read H, Git has the hash ID of commit G, so Git can read G and show it, and so on.



                                    When we make a new commit, Git actually does this by:




                                    • writing out the snapshot;

                                    • writing out the author and log message and so on;

                                    • having the new point back to the current commit;

                                    • and last, but most important, writing the hash ID of the new commit into the branch name.


                                    So if we had:



                                    ...--F--G--H


                                    and we added --I:



                                    ...--F--G--H--I


                                    then Git has changed the name master to store the hash ID of commit I. Eventually we have:



                                    ...--F--G--H--I--J   <-- master


                                    If we made several unwanted commits, we can tell Git: Re-set the name master to point to commit H instead of commit J. There are several ways to do that, but the first one to reach for, in this case, is git reset --hard (while we have master checked out, and be sure you don't have anything you are concerned with losing, because git reset --hard tells Git to throw everything out):



                                    git checkout master
                                    git reset --hard HEAD~2


                                    The ~2 suffix tells Git to count back two steps—technically, two first parent steps, which matters when we have some merge commits in our chain, but here, we don't so it does not matter. If master currently points to J, that has Git count back twice: J to I, then I to H. Git then replaces our work with the contents from commit H and makes the name, master, point to H instead of J:



                                                 I--J
                                    /
                                    ...--F--G--H <-- master


                                    Now that J is hard to find, it appears to be deleted.



                                    The drawback to this is that if we've had our Git tell some other Git: Here, take copies of commits I and J, that other Git has the two commits and will re-introduce them to our own Git even after our Git has forgotten them. But if we have never successfully sent the two commits anywhere else, we're the only one who has them, so if we forget them, they're as good as gone.



                                    (If we have pushed them, we can have our Git, and their Git, and every other Git that has picked them up since then, all forget them, and then they will be gone. But obviously this gets hard quickly.)







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Sep 25 '18 at 17:55









                                    torektorek

                                    191k18239321




                                    191k18239321






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52503380%2fremove-a-file-from-whole-git-history%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        MongoDB - Not Authorized To Execute Command

                                        in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

                                        Npm cannot find a required file even through it is in the searched directory