Iteratively reading multiple cvs from different directories into dataframe then write to new csv
I have hit a wall. So far have the following code:
# define variables of each directory to be used
parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'
orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'
new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'
# Create list of original data files from orig_data_dir
orig_data =
for root, dirs, files in os.walk(orig_data_dir):
for file in files:
if file.endswith('.csv'):
orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv']
# Create list of new data files from new_data_dir
new_data =
for root, dirs, files in os.walk(new_data_dir):
for file in files:
if file.endswith('.csv'):
new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']
I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:
- Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)
- Do some pandas stuff
- Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)
- Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.
EDIT:
I have 1000s of csv files. With that said, i need to:
read in the first file pair to the same dataframe:
1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dirdo pandas stuff
write out the populated dataframe to a new file in parent_data_dir
Repeat for each file pair
2nd iteration would be: Test2.csv and Test2_2.csv
1000 iteration would be: Test1000.csv and Test1000_2.csv
Hope this helps clarify.
python pandas csv
add a comment |
I have hit a wall. So far have the following code:
# define variables of each directory to be used
parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'
orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'
new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'
# Create list of original data files from orig_data_dir
orig_data =
for root, dirs, files in os.walk(orig_data_dir):
for file in files:
if file.endswith('.csv'):
orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv']
# Create list of new data files from new_data_dir
new_data =
for root, dirs, files in os.walk(new_data_dir):
for file in files:
if file.endswith('.csv'):
new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']
I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:
- Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)
- Do some pandas stuff
- Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)
- Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.
EDIT:
I have 1000s of csv files. With that said, i need to:
read in the first file pair to the same dataframe:
1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dirdo pandas stuff
write out the populated dataframe to a new file in parent_data_dir
Repeat for each file pair
2nd iteration would be: Test2.csv and Test2_2.csv
1000 iteration would be: Test1000.csv and Test1000_2.csv
Hope this helps clarify.
python pandas csv
IIUC, why append to a file first and then split them up again and append to another file?
– Zanshin
Nov 20 '18 at 7:43
@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).
– Erik Cadaret
Nov 21 '18 at 2:40
When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?
– Zanshin
Nov 21 '18 at 3:52
@Zanshin please see edit above. I hope this clarifies desirable outcome.
– Erik Cadaret
Nov 21 '18 at 16:08
This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer
– Zanshin
Nov 21 '18 at 16:12
add a comment |
I have hit a wall. So far have the following code:
# define variables of each directory to be used
parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'
orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'
new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'
# Create list of original data files from orig_data_dir
orig_data =
for root, dirs, files in os.walk(orig_data_dir):
for file in files:
if file.endswith('.csv'):
orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv']
# Create list of new data files from new_data_dir
new_data =
for root, dirs, files in os.walk(new_data_dir):
for file in files:
if file.endswith('.csv'):
new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']
I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:
- Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)
- Do some pandas stuff
- Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)
- Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.
EDIT:
I have 1000s of csv files. With that said, i need to:
read in the first file pair to the same dataframe:
1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dirdo pandas stuff
write out the populated dataframe to a new file in parent_data_dir
Repeat for each file pair
2nd iteration would be: Test2.csv and Test2_2.csv
1000 iteration would be: Test1000.csv and Test1000_2.csv
Hope this helps clarify.
python pandas csv
I have hit a wall. So far have the following code:
# define variables of each directory to be used
parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'
orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'
new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'
# Create list of original data files from orig_data_dir
orig_data =
for root, dirs, files in os.walk(orig_data_dir):
for file in files:
if file.endswith('.csv'):
orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv']
# Create list of new data files from new_data_dir
new_data =
for root, dirs, files in os.walk(new_data_dir):
for file in files:
if file.endswith('.csv'):
new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']
I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:
- Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)
- Do some pandas stuff
- Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)
- Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.
EDIT:
I have 1000s of csv files. With that said, i need to:
read in the first file pair to the same dataframe:
1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dirdo pandas stuff
write out the populated dataframe to a new file in parent_data_dir
Repeat for each file pair
2nd iteration would be: Test2.csv and Test2_2.csv
1000 iteration would be: Test1000.csv and Test1000_2.csv
Hope this helps clarify.
python pandas csv
python pandas csv
edited Nov 21 '18 at 16:06
Erik Cadaret
asked Nov 20 '18 at 6:36


Erik CadaretErik Cadaret
133
133
IIUC, why append to a file first and then split them up again and append to another file?
– Zanshin
Nov 20 '18 at 7:43
@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).
– Erik Cadaret
Nov 21 '18 at 2:40
When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?
– Zanshin
Nov 21 '18 at 3:52
@Zanshin please see edit above. I hope this clarifies desirable outcome.
– Erik Cadaret
Nov 21 '18 at 16:08
This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer
– Zanshin
Nov 21 '18 at 16:12
add a comment |
IIUC, why append to a file first and then split them up again and append to another file?
– Zanshin
Nov 20 '18 at 7:43
@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).
– Erik Cadaret
Nov 21 '18 at 2:40
When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?
– Zanshin
Nov 21 '18 at 3:52
@Zanshin please see edit above. I hope this clarifies desirable outcome.
– Erik Cadaret
Nov 21 '18 at 16:08
This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer
– Zanshin
Nov 21 '18 at 16:12
IIUC, why append to a file first and then split them up again and append to another file?
– Zanshin
Nov 20 '18 at 7:43
IIUC, why append to a file first and then split them up again and append to another file?
– Zanshin
Nov 20 '18 at 7:43
@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).
– Erik Cadaret
Nov 21 '18 at 2:40
@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).
– Erik Cadaret
Nov 21 '18 at 2:40
When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?
– Zanshin
Nov 21 '18 at 3:52
When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?
– Zanshin
Nov 21 '18 at 3:52
@Zanshin please see edit above. I hope this clarifies desirable outcome.
– Erik Cadaret
Nov 21 '18 at 16:08
@Zanshin please see edit above. I hope this clarifies desirable outcome.
– Erik Cadaret
Nov 21 '18 at 16:08
This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer
– Zanshin
Nov 21 '18 at 16:12
This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer
– Zanshin
Nov 21 '18 at 16:12
add a comment |
2 Answers
2
active
oldest
votes
The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:
for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)
Here I accumulate the result in the Original file. Only one loop is necessary.
Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.
– Erik Cadaret
Nov 22 '18 at 2:52
I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.
– B. M.
Nov 22 '18 at 6:41
add a comment |
Something like this would help you:
from itertools import chain
import fnmatch
paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')
file1 =
file2 =
for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)
To create your dataframes you would do something like this;
df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)
df_file2 etc.
Note; the 'sep' in your csv might be different.
EDIT; I've changed endswith
with fnmatch.filter
, you can now use any pattern you like for matching the files you need in the different directories.
Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.
– Erik Cadaret
Nov 21 '18 at 2:30
Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.
– Zanshin
Nov 21 '18 at 3:18
On what will you match the files then? Parts of the filename that are similar? Not want they end on?
– Zanshin
Nov 21 '18 at 3:44
i will be matching entire filenames and distinguishing the original data from the new data by what they end in.
– Erik Cadaret
Nov 21 '18 at 16:09
I've put in an edit earlier, should help you now
– Zanshin
Nov 21 '18 at 16:13
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387513%2fiteratively-reading-multiple-cvs-from-different-directories-into-dataframe-then%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:
for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)
Here I accumulate the result in the Original file. Only one loop is necessary.
Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.
– Erik Cadaret
Nov 22 '18 at 2:52
I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.
– B. M.
Nov 22 '18 at 6:41
add a comment |
The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:
for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)
Here I accumulate the result in the Original file. Only one loop is necessary.
Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.
– Erik Cadaret
Nov 22 '18 at 2:52
I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.
– B. M.
Nov 22 '18 at 6:41
add a comment |
The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:
for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)
Here I accumulate the result in the Original file. Only one loop is necessary.
The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:
for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)
Here I accumulate the result in the Original file. Only one loop is necessary.
answered Nov 21 '18 at 16:34


B. M.B. M.
13.1k11934
13.1k11934
Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.
– Erik Cadaret
Nov 22 '18 at 2:52
I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.
– B. M.
Nov 22 '18 at 6:41
add a comment |
Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.
– Erik Cadaret
Nov 22 '18 at 2:52
I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.
– B. M.
Nov 22 '18 at 6:41
Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.
– Erik Cadaret
Nov 22 '18 at 2:52
Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.
– Erik Cadaret
Nov 22 '18 at 2:52
I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.
– B. M.
Nov 22 '18 at 6:41
I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.
– B. M.
Nov 22 '18 at 6:41
add a comment |
Something like this would help you:
from itertools import chain
import fnmatch
paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')
file1 =
file2 =
for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)
To create your dataframes you would do something like this;
df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)
df_file2 etc.
Note; the 'sep' in your csv might be different.
EDIT; I've changed endswith
with fnmatch.filter
, you can now use any pattern you like for matching the files you need in the different directories.
Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.
– Erik Cadaret
Nov 21 '18 at 2:30
Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.
– Zanshin
Nov 21 '18 at 3:18
On what will you match the files then? Parts of the filename that are similar? Not want they end on?
– Zanshin
Nov 21 '18 at 3:44
i will be matching entire filenames and distinguishing the original data from the new data by what they end in.
– Erik Cadaret
Nov 21 '18 at 16:09
I've put in an edit earlier, should help you now
– Zanshin
Nov 21 '18 at 16:13
|
show 1 more comment
Something like this would help you:
from itertools import chain
import fnmatch
paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')
file1 =
file2 =
for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)
To create your dataframes you would do something like this;
df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)
df_file2 etc.
Note; the 'sep' in your csv might be different.
EDIT; I've changed endswith
with fnmatch.filter
, you can now use any pattern you like for matching the files you need in the different directories.
Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.
– Erik Cadaret
Nov 21 '18 at 2:30
Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.
– Zanshin
Nov 21 '18 at 3:18
On what will you match the files then? Parts of the filename that are similar? Not want they end on?
– Zanshin
Nov 21 '18 at 3:44
i will be matching entire filenames and distinguishing the original data from the new data by what they end in.
– Erik Cadaret
Nov 21 '18 at 16:09
I've put in an edit earlier, should help you now
– Zanshin
Nov 21 '18 at 16:13
|
show 1 more comment
Something like this would help you:
from itertools import chain
import fnmatch
paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')
file1 =
file2 =
for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)
To create your dataframes you would do something like this;
df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)
df_file2 etc.
Note; the 'sep' in your csv might be different.
EDIT; I've changed endswith
with fnmatch.filter
, you can now use any pattern you like for matching the files you need in the different directories.
Something like this would help you:
from itertools import chain
import fnmatch
paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')
file1 =
file2 =
for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)
To create your dataframes you would do something like this;
df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)
df_file2 etc.
Note; the 'sep' in your csv might be different.
EDIT; I've changed endswith
with fnmatch.filter
, you can now use any pattern you like for matching the files you need in the different directories.
edited Nov 21 '18 at 11:04
answered Nov 20 '18 at 7:58
ZanshinZanshin
732421
732421
Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.
– Erik Cadaret
Nov 21 '18 at 2:30
Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.
– Zanshin
Nov 21 '18 at 3:18
On what will you match the files then? Parts of the filename that are similar? Not want they end on?
– Zanshin
Nov 21 '18 at 3:44
i will be matching entire filenames and distinguishing the original data from the new data by what they end in.
– Erik Cadaret
Nov 21 '18 at 16:09
I've put in an edit earlier, should help you now
– Zanshin
Nov 21 '18 at 16:13
|
show 1 more comment
Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.
– Erik Cadaret
Nov 21 '18 at 2:30
Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.
– Zanshin
Nov 21 '18 at 3:18
On what will you match the files then? Parts of the filename that are similar? Not want they end on?
– Zanshin
Nov 21 '18 at 3:44
i will be matching entire filenames and distinguishing the original data from the new data by what they end in.
– Erik Cadaret
Nov 21 '18 at 16:09
I've put in an edit earlier, should help you now
– Zanshin
Nov 21 '18 at 16:13
Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.
– Erik Cadaret
Nov 21 '18 at 2:30
Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.
– Erik Cadaret
Nov 21 '18 at 2:30
Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.
– Zanshin
Nov 21 '18 at 3:18
Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.
– Zanshin
Nov 21 '18 at 3:18
On what will you match the files then? Parts of the filename that are similar? Not want they end on?
– Zanshin
Nov 21 '18 at 3:44
On what will you match the files then? Parts of the filename that are similar? Not want they end on?
– Zanshin
Nov 21 '18 at 3:44
i will be matching entire filenames and distinguishing the original data from the new data by what they end in.
– Erik Cadaret
Nov 21 '18 at 16:09
i will be matching entire filenames and distinguishing the original data from the new data by what they end in.
– Erik Cadaret
Nov 21 '18 at 16:09
I've put in an edit earlier, should help you now
– Zanshin
Nov 21 '18 at 16:13
I've put in an edit earlier, should help you now
– Zanshin
Nov 21 '18 at 16:13
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387513%2fiteratively-reading-multiple-cvs-from-different-directories-into-dataframe-then%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
IIUC, why append to a file first and then split them up again and append to another file?
– Zanshin
Nov 20 '18 at 7:43
@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).
– Erik Cadaret
Nov 21 '18 at 2:40
When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?
– Zanshin
Nov 21 '18 at 3:52
@Zanshin please see edit above. I hope this clarifies desirable outcome.
– Erik Cadaret
Nov 21 '18 at 16:08
This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer
– Zanshin
Nov 21 '18 at 16:12