PHP won't read full file into array, only partial












0















I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.



I read the file like this:



$data = file('csv.out');


Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.



Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.



Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.



Update: I verified my system is 64 bit by printing integer and float numbers:



<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)

$large_number = 2147483648;
var_dump($large_number); // float(2147483648)

$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)

$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)

$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)

$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)

print "PHP_INT_MAX: " . PHP_INT_MAX . "n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)n";

?>


The output from this script is:



int(2147483647)



int(2147483648)



int(50000000000)



int(9223372036854775807)



float(9.2233720368548E+18)



float(5.0E+19)



PHP_INT_MAX: 9223372036854775807



PHP_INT_SIZE: 8 bytes (64 bits)



So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?










share|improve this question




















  • 1





    1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the SplDoublyLinkedList or SplFixedArray instead of default array to reduce the RAM requirements.

    – user1597430
    Jan 3 at 0:19








  • 2





    Rather than reading the entire file into memory, I'd recommend using a file-pointer function like fgetcsv()

    – Phil
    Jan 3 at 0:21






  • 1





    As for your question, unfortunately I can't find any references to specific limits on file() but you would expect it is limited by available memory capacity.

    – Phil
    Jan 3 at 0:24











  • It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.

    – Corepuncher
    Jan 3 at 0:32











  • The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(

    – Corepuncher
    Jan 3 at 2:55


















0















I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.



I read the file like this:



$data = file('csv.out');


Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.



Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.



Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.



Update: I verified my system is 64 bit by printing integer and float numbers:



<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)

$large_number = 2147483648;
var_dump($large_number); // float(2147483648)

$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)

$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)

$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)

$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)

print "PHP_INT_MAX: " . PHP_INT_MAX . "n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)n";

?>


The output from this script is:



int(2147483647)



int(2147483648)



int(50000000000)



int(9223372036854775807)



float(9.2233720368548E+18)



float(5.0E+19)



PHP_INT_MAX: 9223372036854775807



PHP_INT_SIZE: 8 bytes (64 bits)



So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?










share|improve this question




















  • 1





    1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the SplDoublyLinkedList or SplFixedArray instead of default array to reduce the RAM requirements.

    – user1597430
    Jan 3 at 0:19








  • 2





    Rather than reading the entire file into memory, I'd recommend using a file-pointer function like fgetcsv()

    – Phil
    Jan 3 at 0:21






  • 1





    As for your question, unfortunately I can't find any references to specific limits on file() but you would expect it is limited by available memory capacity.

    – Phil
    Jan 3 at 0:24











  • It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.

    – Corepuncher
    Jan 3 at 0:32











  • The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(

    – Corepuncher
    Jan 3 at 2:55
















0












0








0








I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.



I read the file like this:



$data = file('csv.out');


Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.



Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.



Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.



Update: I verified my system is 64 bit by printing integer and float numbers:



<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)

$large_number = 2147483648;
var_dump($large_number); // float(2147483648)

$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)

$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)

$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)

$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)

print "PHP_INT_MAX: " . PHP_INT_MAX . "n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)n";

?>


The output from this script is:



int(2147483647)



int(2147483648)



int(50000000000)



int(9223372036854775807)



float(9.2233720368548E+18)



float(5.0E+19)



PHP_INT_MAX: 9223372036854775807



PHP_INT_SIZE: 8 bytes (64 bits)



So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?










share|improve this question
















I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.



I read the file like this:



$data = file('csv.out');


Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.



Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.



Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.



Update: I verified my system is 64 bit by printing integer and float numbers:



<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)

$large_number = 2147483648;
var_dump($large_number); // float(2147483648)

$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)

$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)

$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)

$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)

print "PHP_INT_MAX: " . PHP_INT_MAX . "n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)n";

?>


The output from this script is:



int(2147483647)



int(2147483648)



int(50000000000)



int(9223372036854775807)



float(9.2233720368548E+18)



float(5.0E+19)



PHP_INT_MAX: 9223372036854775807



PHP_INT_SIZE: 8 bytes (64 bits)



So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?







php arrays csv coredump






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 7 at 23:39







Corepuncher

















asked Jan 3 at 0:16









CorepuncherCorepuncher

732213




732213








  • 1





    1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the SplDoublyLinkedList or SplFixedArray instead of default array to reduce the RAM requirements.

    – user1597430
    Jan 3 at 0:19








  • 2





    Rather than reading the entire file into memory, I'd recommend using a file-pointer function like fgetcsv()

    – Phil
    Jan 3 at 0:21






  • 1





    As for your question, unfortunately I can't find any references to specific limits on file() but you would expect it is limited by available memory capacity.

    – Phil
    Jan 3 at 0:24











  • It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.

    – Corepuncher
    Jan 3 at 0:32











  • The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(

    – Corepuncher
    Jan 3 at 2:55
















  • 1





    1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the SplDoublyLinkedList or SplFixedArray instead of default array to reduce the RAM requirements.

    – user1597430
    Jan 3 at 0:19








  • 2





    Rather than reading the entire file into memory, I'd recommend using a file-pointer function like fgetcsv()

    – Phil
    Jan 3 at 0:21






  • 1





    As for your question, unfortunately I can't find any references to specific limits on file() but you would expect it is limited by available memory capacity.

    – Phil
    Jan 3 at 0:24











  • It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.

    – Corepuncher
    Jan 3 at 0:32











  • The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(

    – Corepuncher
    Jan 3 at 2:55










1




1





1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the SplDoublyLinkedList or SplFixedArray instead of default array to reduce the RAM requirements.

– user1597430
Jan 3 at 0:19







1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the SplDoublyLinkedList or SplFixedArray instead of default array to reduce the RAM requirements.

– user1597430
Jan 3 at 0:19






2




2





Rather than reading the entire file into memory, I'd recommend using a file-pointer function like fgetcsv()

– Phil
Jan 3 at 0:21





Rather than reading the entire file into memory, I'd recommend using a file-pointer function like fgetcsv()

– Phil
Jan 3 at 0:21




1




1





As for your question, unfortunately I can't find any references to specific limits on file() but you would expect it is limited by available memory capacity.

– Phil
Jan 3 at 0:24





As for your question, unfortunately I can't find any references to specific limits on file() but you would expect it is limited by available memory capacity.

– Phil
Jan 3 at 0:24













It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.

– Corepuncher
Jan 3 at 0:32





It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.

– Corepuncher
Jan 3 at 0:32













The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(

– Corepuncher
Jan 3 at 2:55







The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(

– Corepuncher
Jan 3 at 2:55














2 Answers
2






active

oldest

votes


















3














Some things that come to mind:




  • If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.

  • If reading the file takes too long, there could be time-outs.

  • If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.

  • Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)






share|improve this answer
























  • It is 64 bit PHP. And reading the files only takes about 15 seconds.

    – Corepuncher
    Jan 4 at 0:28



















0














I fixed it. All I had to do was change the way I read the files. Why...I do not know.



Old code that only reads 2.15 GB out of 6.0 GB:



$data = file('csv.out'); 


New code that reads the full 6.0 GB:



$data = array();

$i=1;
$handle = fopen('csv.out');

if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}


Feel free to shed some light on why. There must be some limitation when using



$var=file();


Interestingly, 2.15 GB is close to the 32 bit limit I read about.






share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014840%2fphp-wont-read-full-file-into-array-only-partial%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    Some things that come to mind:




    • If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.

    • If reading the file takes too long, there could be time-outs.

    • If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.

    • Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)






    share|improve this answer
























    • It is 64 bit PHP. And reading the files only takes about 15 seconds.

      – Corepuncher
      Jan 4 at 0:28
















    3














    Some things that come to mind:




    • If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.

    • If reading the file takes too long, there could be time-outs.

    • If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.

    • Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)






    share|improve this answer
























    • It is 64 bit PHP. And reading the files only takes about 15 seconds.

      – Corepuncher
      Jan 4 at 0:28














    3












    3








    3







    Some things that come to mind:




    • If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.

    • If reading the file takes too long, there could be time-outs.

    • If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.

    • Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)






    share|improve this answer













    Some things that come to mind:




    • If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.

    • If reading the file takes too long, there could be time-outs.

    • If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.

    • Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 3 at 0:42









    NiFFNiFF

    311




    311













    • It is 64 bit PHP. And reading the files only takes about 15 seconds.

      – Corepuncher
      Jan 4 at 0:28



















    • It is 64 bit PHP. And reading the files only takes about 15 seconds.

      – Corepuncher
      Jan 4 at 0:28

















    It is 64 bit PHP. And reading the files only takes about 15 seconds.

    – Corepuncher
    Jan 4 at 0:28





    It is 64 bit PHP. And reading the files only takes about 15 seconds.

    – Corepuncher
    Jan 4 at 0:28













    0














    I fixed it. All I had to do was change the way I read the files. Why...I do not know.



    Old code that only reads 2.15 GB out of 6.0 GB:



    $data = file('csv.out'); 


    New code that reads the full 6.0 GB:



    $data = array();

    $i=1;
    $handle = fopen('csv.out');

    if ($handle) {
    while (($data[$i] = fgets($handle)) !== false){
    // process the line read
    $i++;
    }


    Feel free to shed some light on why. There must be some limitation when using



    $var=file();


    Interestingly, 2.15 GB is close to the 32 bit limit I read about.






    share|improve this answer




























      0














      I fixed it. All I had to do was change the way I read the files. Why...I do not know.



      Old code that only reads 2.15 GB out of 6.0 GB:



      $data = file('csv.out'); 


      New code that reads the full 6.0 GB:



      $data = array();

      $i=1;
      $handle = fopen('csv.out');

      if ($handle) {
      while (($data[$i] = fgets($handle)) !== false){
      // process the line read
      $i++;
      }


      Feel free to shed some light on why. There must be some limitation when using



      $var=file();


      Interestingly, 2.15 GB is close to the 32 bit limit I read about.






      share|improve this answer


























        0












        0








        0







        I fixed it. All I had to do was change the way I read the files. Why...I do not know.



        Old code that only reads 2.15 GB out of 6.0 GB:



        $data = file('csv.out'); 


        New code that reads the full 6.0 GB:



        $data = array();

        $i=1;
        $handle = fopen('csv.out');

        if ($handle) {
        while (($data[$i] = fgets($handle)) !== false){
        // process the line read
        $i++;
        }


        Feel free to shed some light on why. There must be some limitation when using



        $var=file();


        Interestingly, 2.15 GB is close to the 32 bit limit I read about.






        share|improve this answer













        I fixed it. All I had to do was change the way I read the files. Why...I do not know.



        Old code that only reads 2.15 GB out of 6.0 GB:



        $data = file('csv.out'); 


        New code that reads the full 6.0 GB:



        $data = array();

        $i=1;
        $handle = fopen('csv.out');

        if ($handle) {
        while (($data[$i] = fgets($handle)) !== false){
        // process the line read
        $i++;
        }


        Feel free to shed some light on why. There must be some limitation when using



        $var=file();


        Interestingly, 2.15 GB is close to the 32 bit limit I read about.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 15 at 3:26









        CorepuncherCorepuncher

        732213




        732213






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014840%2fphp-wont-read-full-file-into-array-only-partial%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith