How to find old large files and generate metrics for directories containg those files












2















I am a hardware engineer working in a design department and we routinely generate directories with large amounts of data (both large files and directories that contain large numbers of small files). This data can hang around on the disk for quite a while and I am looking for a metric to identify directories with large amounts of old data in them as candidates for deletion.



The metric I have decided on is File Size (in M) * File Age (in days).



I have a working solution, but it is a combination of shell scripting and c and is neither maintainable, pretty nor elegant.



I am looking for ideas to improve the script.



The basic idea is to generate raw data on all the files using find



find $Dir -type f -exec stat -c "%s,%Y,%n" {} ; > rpt3


and then process that file in C to get a file (rpt3b) in the format



Metric,Age,Size,FileName



Metric is Age*Size



Age is number of days since file was modified



Size is size of file in M



FileName is name of file.



I then process this file to sum the metrics for each directory



for Directory in $( find /projects/solaris/implementation -maxdepth 4 -type d ) ; do
Total=`grep $Directory/ rpt3a | sed -e 's?,.*??' | paste -sd+ - | bc`
echo $Total,$Directory >> rpt3c
done


So the output is similar to a du, but it is the metric that is reported rather than the size taken on disk.



I could pull the last step into the C program, but I am looking for a solution that ideally works in one environment (doesn't have to be C, I am open to learning new languages).



Thanks in advance










share|improve this question



























    2















    I am a hardware engineer working in a design department and we routinely generate directories with large amounts of data (both large files and directories that contain large numbers of small files). This data can hang around on the disk for quite a while and I am looking for a metric to identify directories with large amounts of old data in them as candidates for deletion.



    The metric I have decided on is File Size (in M) * File Age (in days).



    I have a working solution, but it is a combination of shell scripting and c and is neither maintainable, pretty nor elegant.



    I am looking for ideas to improve the script.



    The basic idea is to generate raw data on all the files using find



    find $Dir -type f -exec stat -c "%s,%Y,%n" {} ; > rpt3


    and then process that file in C to get a file (rpt3b) in the format



    Metric,Age,Size,FileName



    Metric is Age*Size



    Age is number of days since file was modified



    Size is size of file in M



    FileName is name of file.



    I then process this file to sum the metrics for each directory



    for Directory in $( find /projects/solaris/implementation -maxdepth 4 -type d ) ; do
    Total=`grep $Directory/ rpt3a | sed -e 's?,.*??' | paste -sd+ - | bc`
    echo $Total,$Directory >> rpt3c
    done


    So the output is similar to a du, but it is the metric that is reported rather than the size taken on disk.



    I could pull the last step into the C program, but I am looking for a solution that ideally works in one environment (doesn't have to be C, I am open to learning new languages).



    Thanks in advance










    share|improve this question

























      2












      2








      2








      I am a hardware engineer working in a design department and we routinely generate directories with large amounts of data (both large files and directories that contain large numbers of small files). This data can hang around on the disk for quite a while and I am looking for a metric to identify directories with large amounts of old data in them as candidates for deletion.



      The metric I have decided on is File Size (in M) * File Age (in days).



      I have a working solution, but it is a combination of shell scripting and c and is neither maintainable, pretty nor elegant.



      I am looking for ideas to improve the script.



      The basic idea is to generate raw data on all the files using find



      find $Dir -type f -exec stat -c "%s,%Y,%n" {} ; > rpt3


      and then process that file in C to get a file (rpt3b) in the format



      Metric,Age,Size,FileName



      Metric is Age*Size



      Age is number of days since file was modified



      Size is size of file in M



      FileName is name of file.



      I then process this file to sum the metrics for each directory



      for Directory in $( find /projects/solaris/implementation -maxdepth 4 -type d ) ; do
      Total=`grep $Directory/ rpt3a | sed -e 's?,.*??' | paste -sd+ - | bc`
      echo $Total,$Directory >> rpt3c
      done


      So the output is similar to a du, but it is the metric that is reported rather than the size taken on disk.



      I could pull the last step into the C program, but I am looking for a solution that ideally works in one environment (doesn't have to be C, I am open to learning new languages).



      Thanks in advance










      share|improve this question














      I am a hardware engineer working in a design department and we routinely generate directories with large amounts of data (both large files and directories that contain large numbers of small files). This data can hang around on the disk for quite a while and I am looking for a metric to identify directories with large amounts of old data in them as candidates for deletion.



      The metric I have decided on is File Size (in M) * File Age (in days).



      I have a working solution, but it is a combination of shell scripting and c and is neither maintainable, pretty nor elegant.



      I am looking for ideas to improve the script.



      The basic idea is to generate raw data on all the files using find



      find $Dir -type f -exec stat -c "%s,%Y,%n" {} ; > rpt3


      and then process that file in C to get a file (rpt3b) in the format



      Metric,Age,Size,FileName



      Metric is Age*Size



      Age is number of days since file was modified



      Size is size of file in M



      FileName is name of file.



      I then process this file to sum the metrics for each directory



      for Directory in $( find /projects/solaris/implementation -maxdepth 4 -type d ) ; do
      Total=`grep $Directory/ rpt3a | sed -e 's?,.*??' | paste -sd+ - | bc`
      echo $Total,$Directory >> rpt3c
      done


      So the output is similar to a du, but it is the metric that is reported rather than the size taken on disk.



      I could pull the last step into the C program, but I am looking for a solution that ideally works in one environment (doesn't have to be C, I am open to learning new languages).



      Thanks in advance







      python c perl sh






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 2 at 11:59









      AlmostAHerbAlmostAHerb

      132




      132
























          3 Answers
          3






          active

          oldest

          votes


















          4














          You could do the whole lot in Perl. Perl comes with two operators -M and -s which are respectively the age of the file in days and the size of the file in bytes. Age here being the script start time minus the file modification time, and also the File::Find module that mimics the find command.



          #!perl
          use strict;
          use warnings;

          use File::Find;

          find(&process, shift); # shift the start directory off @ARGV

          sub process {
          # Lots of use of the magic _ file handle so we don't keep having to call stat()
          print( (-M _) * (-s _), ' ', -M _, ' ', -s _, " $File::Find::namen")
          if -f $_;
          }





          share|improve this answer


























          • @zdim That should work. now. It worked when I tested it last time because my initial directory was ..

            – JGNI
            Jan 4 at 8:24











          • Thank you :) (I would also round numbers and sort by metric)

            – zdim
            Jan 4 at 8:32











          • @zdim I'll leave that as an exercise for the poster, they may want to do that by options :-)

            – JGNI
            Jan 4 at 9:05











          • Thanks a lot for you help. Was enough to get me going

            – AlmostAHerb
            Jan 4 at 14:13





















          0














          Use cut to extract the correct column from your extracted lines in place of sed. cut -d, -f3 will extract the third column each separated by ,.



          With input:



          10,2,5,a/b
          20,4,5,a/c
          30,2,15,b/d
          40,4,10,a/d


          command grep a/ a.txt | cut -f3 -d, | paste -sd+ - | bc will produce:



          20


          and command grep b/ a.txt | cut -f3 -d, | paste -sd+ - | bc:



          15





          share|improve this answer
























          • Thanks - I agree with this

            – AlmostAHerb
            Jan 2 at 13:11



















          0














          Call 'python script.py startdir ~/somefile.txt'.



          You can use this as starting point:



          import os
          import sys
          import time

          def get_age_in_days(file_stats):
          """Calculate age in days from files stat."""
          return (time.time() - file_stats.st_mtime) // (60*60*24)

          def get_size_in_MB(file_stats):
          """Calculate file size in megabytes from files stat."""
          return file_stats.st_size / (1024 * 1024)

          def metric(root,f):
          """Uses root and f to create a metric for the file at 'os.path.join(root,f)'"""
          fn = os.path.join(root,f)
          fn_stat = os.stat(fn)
          age = get_age_in_days(fn_stat)
          size = get_size_in_MB(fn_stat)
          metric = age*size

          return [metric, age, size, fn]

          path = None
          fn = None
          if len(sys.argv)==3:
          path = sys.argv[1]
          fn = sys.argv[2]
          else:
          sys.exit(2)


          with open(fn,"w") as output:
          # walk directory recursivly and report anything with a metric > 1
          for root,dirs,files in os.walk(path):
          total_dict = 0
          for f in files:
          m = metric(root,f)

          # cutoff - only write to file if metric > 1
          if m[0] > 1:
          total_dict += m[0]
          output.write(','.join(map(str,m))+"n")
          output.write(','.join([str(total_dict), "total","dictionary",root])+"n")

          # testing purposes
          # print(open(fn).read())


          Example-file - (without cutoff - using https://pyfiddle.io/):



          0.0,0.0,0.0011606216430664062,./main.py
          0.0,0.0,0.0,./myfiles.txt
          0.0,total,dictionary,./


          You can look up any line that contains ,total,dictionary,: 0.0,total,dictionary,./ for dictionary-totals.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54005963%2fhow-to-find-old-large-files-and-generate-metrics-for-directories-containg-those%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            4














            You could do the whole lot in Perl. Perl comes with two operators -M and -s which are respectively the age of the file in days and the size of the file in bytes. Age here being the script start time minus the file modification time, and also the File::Find module that mimics the find command.



            #!perl
            use strict;
            use warnings;

            use File::Find;

            find(&process, shift); # shift the start directory off @ARGV

            sub process {
            # Lots of use of the magic _ file handle so we don't keep having to call stat()
            print( (-M _) * (-s _), ' ', -M _, ' ', -s _, " $File::Find::namen")
            if -f $_;
            }





            share|improve this answer


























            • @zdim That should work. now. It worked when I tested it last time because my initial directory was ..

              – JGNI
              Jan 4 at 8:24











            • Thank you :) (I would also round numbers and sort by metric)

              – zdim
              Jan 4 at 8:32











            • @zdim I'll leave that as an exercise for the poster, they may want to do that by options :-)

              – JGNI
              Jan 4 at 9:05











            • Thanks a lot for you help. Was enough to get me going

              – AlmostAHerb
              Jan 4 at 14:13


















            4














            You could do the whole lot in Perl. Perl comes with two operators -M and -s which are respectively the age of the file in days and the size of the file in bytes. Age here being the script start time minus the file modification time, and also the File::Find module that mimics the find command.



            #!perl
            use strict;
            use warnings;

            use File::Find;

            find(&process, shift); # shift the start directory off @ARGV

            sub process {
            # Lots of use of the magic _ file handle so we don't keep having to call stat()
            print( (-M _) * (-s _), ' ', -M _, ' ', -s _, " $File::Find::namen")
            if -f $_;
            }





            share|improve this answer


























            • @zdim That should work. now. It worked when I tested it last time because my initial directory was ..

              – JGNI
              Jan 4 at 8:24











            • Thank you :) (I would also round numbers and sort by metric)

              – zdim
              Jan 4 at 8:32











            • @zdim I'll leave that as an exercise for the poster, they may want to do that by options :-)

              – JGNI
              Jan 4 at 9:05











            • Thanks a lot for you help. Was enough to get me going

              – AlmostAHerb
              Jan 4 at 14:13
















            4












            4








            4







            You could do the whole lot in Perl. Perl comes with two operators -M and -s which are respectively the age of the file in days and the size of the file in bytes. Age here being the script start time minus the file modification time, and also the File::Find module that mimics the find command.



            #!perl
            use strict;
            use warnings;

            use File::Find;

            find(&process, shift); # shift the start directory off @ARGV

            sub process {
            # Lots of use of the magic _ file handle so we don't keep having to call stat()
            print( (-M _) * (-s _), ' ', -M _, ' ', -s _, " $File::Find::namen")
            if -f $_;
            }





            share|improve this answer















            You could do the whole lot in Perl. Perl comes with two operators -M and -s which are respectively the age of the file in days and the size of the file in bytes. Age here being the script start time minus the file modification time, and also the File::Find module that mimics the find command.



            #!perl
            use strict;
            use warnings;

            use File::Find;

            find(&process, shift); # shift the start directory off @ARGV

            sub process {
            # Lots of use of the magic _ file handle so we don't keep having to call stat()
            print( (-M _) * (-s _), ' ', -M _, ' ', -s _, " $File::Find::namen")
            if -f $_;
            }






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 4 at 8:22

























            answered Jan 2 at 12:31









            JGNIJGNI

            2,586718




            2,586718













            • @zdim That should work. now. It worked when I tested it last time because my initial directory was ..

              – JGNI
              Jan 4 at 8:24











            • Thank you :) (I would also round numbers and sort by metric)

              – zdim
              Jan 4 at 8:32











            • @zdim I'll leave that as an exercise for the poster, they may want to do that by options :-)

              – JGNI
              Jan 4 at 9:05











            • Thanks a lot for you help. Was enough to get me going

              – AlmostAHerb
              Jan 4 at 14:13





















            • @zdim That should work. now. It worked when I tested it last time because my initial directory was ..

              – JGNI
              Jan 4 at 8:24











            • Thank you :) (I would also round numbers and sort by metric)

              – zdim
              Jan 4 at 8:32











            • @zdim I'll leave that as an exercise for the poster, they may want to do that by options :-)

              – JGNI
              Jan 4 at 9:05











            • Thanks a lot for you help. Was enough to get me going

              – AlmostAHerb
              Jan 4 at 14:13



















            @zdim That should work. now. It worked when I tested it last time because my initial directory was ..

            – JGNI
            Jan 4 at 8:24





            @zdim That should work. now. It worked when I tested it last time because my initial directory was ..

            – JGNI
            Jan 4 at 8:24













            Thank you :) (I would also round numbers and sort by metric)

            – zdim
            Jan 4 at 8:32





            Thank you :) (I would also round numbers and sort by metric)

            – zdim
            Jan 4 at 8:32













            @zdim I'll leave that as an exercise for the poster, they may want to do that by options :-)

            – JGNI
            Jan 4 at 9:05





            @zdim I'll leave that as an exercise for the poster, they may want to do that by options :-)

            – JGNI
            Jan 4 at 9:05













            Thanks a lot for you help. Was enough to get me going

            – AlmostAHerb
            Jan 4 at 14:13







            Thanks a lot for you help. Was enough to get me going

            – AlmostAHerb
            Jan 4 at 14:13















            0














            Use cut to extract the correct column from your extracted lines in place of sed. cut -d, -f3 will extract the third column each separated by ,.



            With input:



            10,2,5,a/b
            20,4,5,a/c
            30,2,15,b/d
            40,4,10,a/d


            command grep a/ a.txt | cut -f3 -d, | paste -sd+ - | bc will produce:



            20


            and command grep b/ a.txt | cut -f3 -d, | paste -sd+ - | bc:



            15





            share|improve this answer
























            • Thanks - I agree with this

              – AlmostAHerb
              Jan 2 at 13:11
















            0














            Use cut to extract the correct column from your extracted lines in place of sed. cut -d, -f3 will extract the third column each separated by ,.



            With input:



            10,2,5,a/b
            20,4,5,a/c
            30,2,15,b/d
            40,4,10,a/d


            command grep a/ a.txt | cut -f3 -d, | paste -sd+ - | bc will produce:



            20


            and command grep b/ a.txt | cut -f3 -d, | paste -sd+ - | bc:



            15





            share|improve this answer
























            • Thanks - I agree with this

              – AlmostAHerb
              Jan 2 at 13:11














            0












            0








            0







            Use cut to extract the correct column from your extracted lines in place of sed. cut -d, -f3 will extract the third column each separated by ,.



            With input:



            10,2,5,a/b
            20,4,5,a/c
            30,2,15,b/d
            40,4,10,a/d


            command grep a/ a.txt | cut -f3 -d, | paste -sd+ - | bc will produce:



            20


            and command grep b/ a.txt | cut -f3 -d, | paste -sd+ - | bc:



            15





            share|improve this answer













            Use cut to extract the correct column from your extracted lines in place of sed. cut -d, -f3 will extract the third column each separated by ,.



            With input:



            10,2,5,a/b
            20,4,5,a/c
            30,2,15,b/d
            40,4,10,a/d


            command grep a/ a.txt | cut -f3 -d, | paste -sd+ - | bc will produce:



            20


            and command grep b/ a.txt | cut -f3 -d, | paste -sd+ - | bc:



            15






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 2 at 12:19









            Jean-Baptiste YunèsJean-Baptiste Yunès

            23.9k12652




            23.9k12652













            • Thanks - I agree with this

              – AlmostAHerb
              Jan 2 at 13:11



















            • Thanks - I agree with this

              – AlmostAHerb
              Jan 2 at 13:11

















            Thanks - I agree with this

            – AlmostAHerb
            Jan 2 at 13:11





            Thanks - I agree with this

            – AlmostAHerb
            Jan 2 at 13:11











            0














            Call 'python script.py startdir ~/somefile.txt'.



            You can use this as starting point:



            import os
            import sys
            import time

            def get_age_in_days(file_stats):
            """Calculate age in days from files stat."""
            return (time.time() - file_stats.st_mtime) // (60*60*24)

            def get_size_in_MB(file_stats):
            """Calculate file size in megabytes from files stat."""
            return file_stats.st_size / (1024 * 1024)

            def metric(root,f):
            """Uses root and f to create a metric for the file at 'os.path.join(root,f)'"""
            fn = os.path.join(root,f)
            fn_stat = os.stat(fn)
            age = get_age_in_days(fn_stat)
            size = get_size_in_MB(fn_stat)
            metric = age*size

            return [metric, age, size, fn]

            path = None
            fn = None
            if len(sys.argv)==3:
            path = sys.argv[1]
            fn = sys.argv[2]
            else:
            sys.exit(2)


            with open(fn,"w") as output:
            # walk directory recursivly and report anything with a metric > 1
            for root,dirs,files in os.walk(path):
            total_dict = 0
            for f in files:
            m = metric(root,f)

            # cutoff - only write to file if metric > 1
            if m[0] > 1:
            total_dict += m[0]
            output.write(','.join(map(str,m))+"n")
            output.write(','.join([str(total_dict), "total","dictionary",root])+"n")

            # testing purposes
            # print(open(fn).read())


            Example-file - (without cutoff - using https://pyfiddle.io/):



            0.0,0.0,0.0011606216430664062,./main.py
            0.0,0.0,0.0,./myfiles.txt
            0.0,total,dictionary,./


            You can look up any line that contains ,total,dictionary,: 0.0,total,dictionary,./ for dictionary-totals.






            share|improve this answer






























              0














              Call 'python script.py startdir ~/somefile.txt'.



              You can use this as starting point:



              import os
              import sys
              import time

              def get_age_in_days(file_stats):
              """Calculate age in days from files stat."""
              return (time.time() - file_stats.st_mtime) // (60*60*24)

              def get_size_in_MB(file_stats):
              """Calculate file size in megabytes from files stat."""
              return file_stats.st_size / (1024 * 1024)

              def metric(root,f):
              """Uses root and f to create a metric for the file at 'os.path.join(root,f)'"""
              fn = os.path.join(root,f)
              fn_stat = os.stat(fn)
              age = get_age_in_days(fn_stat)
              size = get_size_in_MB(fn_stat)
              metric = age*size

              return [metric, age, size, fn]

              path = None
              fn = None
              if len(sys.argv)==3:
              path = sys.argv[1]
              fn = sys.argv[2]
              else:
              sys.exit(2)


              with open(fn,"w") as output:
              # walk directory recursivly and report anything with a metric > 1
              for root,dirs,files in os.walk(path):
              total_dict = 0
              for f in files:
              m = metric(root,f)

              # cutoff - only write to file if metric > 1
              if m[0] > 1:
              total_dict += m[0]
              output.write(','.join(map(str,m))+"n")
              output.write(','.join([str(total_dict), "total","dictionary",root])+"n")

              # testing purposes
              # print(open(fn).read())


              Example-file - (without cutoff - using https://pyfiddle.io/):



              0.0,0.0,0.0011606216430664062,./main.py
              0.0,0.0,0.0,./myfiles.txt
              0.0,total,dictionary,./


              You can look up any line that contains ,total,dictionary,: 0.0,total,dictionary,./ for dictionary-totals.






              share|improve this answer




























                0












                0








                0







                Call 'python script.py startdir ~/somefile.txt'.



                You can use this as starting point:



                import os
                import sys
                import time

                def get_age_in_days(file_stats):
                """Calculate age in days from files stat."""
                return (time.time() - file_stats.st_mtime) // (60*60*24)

                def get_size_in_MB(file_stats):
                """Calculate file size in megabytes from files stat."""
                return file_stats.st_size / (1024 * 1024)

                def metric(root,f):
                """Uses root and f to create a metric for the file at 'os.path.join(root,f)'"""
                fn = os.path.join(root,f)
                fn_stat = os.stat(fn)
                age = get_age_in_days(fn_stat)
                size = get_size_in_MB(fn_stat)
                metric = age*size

                return [metric, age, size, fn]

                path = None
                fn = None
                if len(sys.argv)==3:
                path = sys.argv[1]
                fn = sys.argv[2]
                else:
                sys.exit(2)


                with open(fn,"w") as output:
                # walk directory recursivly and report anything with a metric > 1
                for root,dirs,files in os.walk(path):
                total_dict = 0
                for f in files:
                m = metric(root,f)

                # cutoff - only write to file if metric > 1
                if m[0] > 1:
                total_dict += m[0]
                output.write(','.join(map(str,m))+"n")
                output.write(','.join([str(total_dict), "total","dictionary",root])+"n")

                # testing purposes
                # print(open(fn).read())


                Example-file - (without cutoff - using https://pyfiddle.io/):



                0.0,0.0,0.0011606216430664062,./main.py
                0.0,0.0,0.0,./myfiles.txt
                0.0,total,dictionary,./


                You can look up any line that contains ,total,dictionary,: 0.0,total,dictionary,./ for dictionary-totals.






                share|improve this answer















                Call 'python script.py startdir ~/somefile.txt'.



                You can use this as starting point:



                import os
                import sys
                import time

                def get_age_in_days(file_stats):
                """Calculate age in days from files stat."""
                return (time.time() - file_stats.st_mtime) // (60*60*24)

                def get_size_in_MB(file_stats):
                """Calculate file size in megabytes from files stat."""
                return file_stats.st_size / (1024 * 1024)

                def metric(root,f):
                """Uses root and f to create a metric for the file at 'os.path.join(root,f)'"""
                fn = os.path.join(root,f)
                fn_stat = os.stat(fn)
                age = get_age_in_days(fn_stat)
                size = get_size_in_MB(fn_stat)
                metric = age*size

                return [metric, age, size, fn]

                path = None
                fn = None
                if len(sys.argv)==3:
                path = sys.argv[1]
                fn = sys.argv[2]
                else:
                sys.exit(2)


                with open(fn,"w") as output:
                # walk directory recursivly and report anything with a metric > 1
                for root,dirs,files in os.walk(path):
                total_dict = 0
                for f in files:
                m = metric(root,f)

                # cutoff - only write to file if metric > 1
                if m[0] > 1:
                total_dict += m[0]
                output.write(','.join(map(str,m))+"n")
                output.write(','.join([str(total_dict), "total","dictionary",root])+"n")

                # testing purposes
                # print(open(fn).read())


                Example-file - (without cutoff - using https://pyfiddle.io/):



                0.0,0.0,0.0011606216430664062,./main.py
                0.0,0.0,0.0,./myfiles.txt
                0.0,total,dictionary,./


                You can look up any line that contains ,total,dictionary,: 0.0,total,dictionary,./ for dictionary-totals.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jan 2 at 12:37

























                answered Jan 2 at 12:28









                Patrick ArtnerPatrick Artner

                25.6k62544




                25.6k62544






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54005963%2fhow-to-find-old-large-files-and-generate-metrics-for-directories-containg-those%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    MongoDB - Not Authorized To Execute Command

                    How to fix TextFormField cause rebuild widget in Flutter

                    in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith