PHP won't read full file into array, only partial
I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.
I read the file like this:
$data = file('csv.out');
Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.
Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.
Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.
Update: I verified my system is 64 bit by printing integer and float numbers:
<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)
$large_number = 2147483648;
var_dump($large_number); // float(2147483648)
$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)
$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)
$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)
$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)
print "PHP_INT_MAX: " . PHP_INT_MAX . "n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)n";
?>
The output from this script is:
int(2147483647)
int(2147483648)
int(50000000000)
int(9223372036854775807)
float(9.2233720368548E+18)
float(5.0E+19)
PHP_INT_MAX: 9223372036854775807
PHP_INT_SIZE: 8 bytes (64 bits)
So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?
php arrays csv coredump
add a comment |
I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.
I read the file like this:
$data = file('csv.out');
Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.
Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.
Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.
Update: I verified my system is 64 bit by printing integer and float numbers:
<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)
$large_number = 2147483648;
var_dump($large_number); // float(2147483648)
$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)
$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)
$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)
$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)
print "PHP_INT_MAX: " . PHP_INT_MAX . "n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)n";
?>
The output from this script is:
int(2147483647)
int(2147483648)
int(50000000000)
int(9223372036854775807)
float(9.2233720368548E+18)
float(5.0E+19)
PHP_INT_MAX: 9223372036854775807
PHP_INT_SIZE: 8 bytes (64 bits)
So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?
php arrays csv coredump
1
1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into theSplDoublyLinkedListorSplFixedArrayinstead of default array to reduce the RAM requirements.
– user1597430
Jan 3 at 0:19
2
Rather than reading the entire file into memory, I'd recommend using a file-pointer function likefgetcsv()
– Phil
Jan 3 at 0:21
1
As for your question, unfortunately I can't find any references to specific limits onfile()but you would expect it is limited by available memory capacity.
– Phil
Jan 3 at 0:24
It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.
– Corepuncher
Jan 3 at 0:32
The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(
– Corepuncher
Jan 3 at 2:55
add a comment |
I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.
I read the file like this:
$data = file('csv.out');
Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.
Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.
Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.
Update: I verified my system is 64 bit by printing integer and float numbers:
<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)
$large_number = 2147483648;
var_dump($large_number); // float(2147483648)
$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)
$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)
$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)
$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)
print "PHP_INT_MAX: " . PHP_INT_MAX . "n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)n";
?>
The output from this script is:
int(2147483647)
int(2147483648)
int(50000000000)
int(9223372036854775807)
float(9.2233720368548E+18)
float(5.0E+19)
PHP_INT_MAX: 9223372036854775807
PHP_INT_SIZE: 8 bytes (64 bits)
So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?
php arrays csv coredump
I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.
I read the file like this:
$data = file('csv.out');
Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.
Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.
Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.
Update: I verified my system is 64 bit by printing integer and float numbers:
<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)
$large_number = 2147483648;
var_dump($large_number); // float(2147483648)
$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)
$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)
$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)
$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)
print "PHP_INT_MAX: " . PHP_INT_MAX . "n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)n";
?>
The output from this script is:
int(2147483647)
int(2147483648)
int(50000000000)
int(9223372036854775807)
float(9.2233720368548E+18)
float(5.0E+19)
PHP_INT_MAX: 9223372036854775807
PHP_INT_SIZE: 8 bytes (64 bits)
So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?
php arrays csv coredump
php arrays csv coredump
edited Jan 7 at 23:39
Corepuncher
asked Jan 3 at 0:16
CorepuncherCorepuncher
732213
732213
1
1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into theSplDoublyLinkedListorSplFixedArrayinstead of default array to reduce the RAM requirements.
– user1597430
Jan 3 at 0:19
2
Rather than reading the entire file into memory, I'd recommend using a file-pointer function likefgetcsv()
– Phil
Jan 3 at 0:21
1
As for your question, unfortunately I can't find any references to specific limits onfile()but you would expect it is limited by available memory capacity.
– Phil
Jan 3 at 0:24
It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.
– Corepuncher
Jan 3 at 0:32
The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(
– Corepuncher
Jan 3 at 2:55
add a comment |
1
1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into theSplDoublyLinkedListorSplFixedArrayinstead of default array to reduce the RAM requirements.
– user1597430
Jan 3 at 0:19
2
Rather than reading the entire file into memory, I'd recommend using a file-pointer function likefgetcsv()
– Phil
Jan 3 at 0:21
1
As for your question, unfortunately I can't find any references to specific limits onfile()but you would expect it is limited by available memory capacity.
– Phil
Jan 3 at 0:24
It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.
– Corepuncher
Jan 3 at 0:32
The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(
– Corepuncher
Jan 3 at 2:55
1
1
1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the
SplDoublyLinkedList or SplFixedArray instead of default array to reduce the RAM requirements.– user1597430
Jan 3 at 0:19
1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the
SplDoublyLinkedList or SplFixedArray instead of default array to reduce the RAM requirements.– user1597430
Jan 3 at 0:19
2
2
Rather than reading the entire file into memory, I'd recommend using a file-pointer function like
fgetcsv()– Phil
Jan 3 at 0:21
Rather than reading the entire file into memory, I'd recommend using a file-pointer function like
fgetcsv()– Phil
Jan 3 at 0:21
1
1
As for your question, unfortunately I can't find any references to specific limits on
file() but you would expect it is limited by available memory capacity.– Phil
Jan 3 at 0:24
As for your question, unfortunately I can't find any references to specific limits on
file() but you would expect it is limited by available memory capacity.– Phil
Jan 3 at 0:24
It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.
– Corepuncher
Jan 3 at 0:32
It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.
– Corepuncher
Jan 3 at 0:32
The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(
– Corepuncher
Jan 3 at 2:55
The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(
– Corepuncher
Jan 3 at 2:55
add a comment |
2 Answers
2
active
oldest
votes
Some things that come to mind:
- If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.
- If reading the file takes too long, there could be time-outs.
- If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.
- Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)
It is 64 bit PHP. And reading the files only takes about 15 seconds.
– Corepuncher
Jan 4 at 0:28
add a comment |
I fixed it. All I had to do was change the way I read the files. Why...I do not know.
Old code that only reads 2.15 GB out of 6.0 GB:
$data = file('csv.out');
New code that reads the full 6.0 GB:
$data = array();
$i=1;
$handle = fopen('csv.out');
if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}
Feel free to shed some light on why. There must be some limitation when using
$var=file();
Interestingly, 2.15 GB is close to the 32 bit limit I read about.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014840%2fphp-wont-read-full-file-into-array-only-partial%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Some things that come to mind:
- If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.
- If reading the file takes too long, there could be time-outs.
- If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.
- Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)
It is 64 bit PHP. And reading the files only takes about 15 seconds.
– Corepuncher
Jan 4 at 0:28
add a comment |
Some things that come to mind:
- If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.
- If reading the file takes too long, there could be time-outs.
- If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.
- Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)
It is 64 bit PHP. And reading the files only takes about 15 seconds.
– Corepuncher
Jan 4 at 0:28
add a comment |
Some things that come to mind:
- If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.
- If reading the file takes too long, there could be time-outs.
- If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.
- Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)
Some things that come to mind:
- If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.
- If reading the file takes too long, there could be time-outs.
- If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.
- Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)
answered Jan 3 at 0:42
NiFFNiFF
311
311
It is 64 bit PHP. And reading the files only takes about 15 seconds.
– Corepuncher
Jan 4 at 0:28
add a comment |
It is 64 bit PHP. And reading the files only takes about 15 seconds.
– Corepuncher
Jan 4 at 0:28
It is 64 bit PHP. And reading the files only takes about 15 seconds.
– Corepuncher
Jan 4 at 0:28
It is 64 bit PHP. And reading the files only takes about 15 seconds.
– Corepuncher
Jan 4 at 0:28
add a comment |
I fixed it. All I had to do was change the way I read the files. Why...I do not know.
Old code that only reads 2.15 GB out of 6.0 GB:
$data = file('csv.out');
New code that reads the full 6.0 GB:
$data = array();
$i=1;
$handle = fopen('csv.out');
if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}
Feel free to shed some light on why. There must be some limitation when using
$var=file();
Interestingly, 2.15 GB is close to the 32 bit limit I read about.
add a comment |
I fixed it. All I had to do was change the way I read the files. Why...I do not know.
Old code that only reads 2.15 GB out of 6.0 GB:
$data = file('csv.out');
New code that reads the full 6.0 GB:
$data = array();
$i=1;
$handle = fopen('csv.out');
if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}
Feel free to shed some light on why. There must be some limitation when using
$var=file();
Interestingly, 2.15 GB is close to the 32 bit limit I read about.
add a comment |
I fixed it. All I had to do was change the way I read the files. Why...I do not know.
Old code that only reads 2.15 GB out of 6.0 GB:
$data = file('csv.out');
New code that reads the full 6.0 GB:
$data = array();
$i=1;
$handle = fopen('csv.out');
if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}
Feel free to shed some light on why. There must be some limitation when using
$var=file();
Interestingly, 2.15 GB is close to the 32 bit limit I read about.
I fixed it. All I had to do was change the way I read the files. Why...I do not know.
Old code that only reads 2.15 GB out of 6.0 GB:
$data = file('csv.out');
New code that reads the full 6.0 GB:
$data = array();
$i=1;
$handle = fopen('csv.out');
if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}
Feel free to shed some light on why. There must be some limitation when using
$var=file();
Interestingly, 2.15 GB is close to the 32 bit limit I read about.
answered Jan 15 at 3:26
CorepuncherCorepuncher
732213
732213
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014840%2fphp-wont-read-full-file-into-array-only-partial%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown

1
1) Are you sure you need to read the whole file? Can you split your task into the small parts and read the file line by line? 2) You may read the file line by line and store the lines into the
SplDoublyLinkedListorSplFixedArrayinstead of default array to reduce the RAM requirements.– user1597430
Jan 3 at 0:19
2
Rather than reading the entire file into memory, I'd recommend using a file-pointer function like
fgetcsv()– Phil
Jan 3 at 0:21
1
As for your question, unfortunately I can't find any references to specific limits on
file()but you would expect it is limited by available memory capacity.– Phil
Jan 3 at 0:24
It's an enormous server with lots of memory. I'll have to try your suggestion though. Just seems like there is a per-file memory limit, since I raised the "memory_limit" to such a huge value, and the problem persists.
– Corepuncher
Jan 3 at 0:32
The reason I'm reading it into memory is because it has to do a huge nested loop. I have to compare each of the 3.2 million lines to each of the other 3.2 million lines. So not sure if the alternate methods above would be fast as memory? If so I will try. Otherwise, I may have to rewrite the whole thing in C :-(
– Corepuncher
Jan 3 at 2:55