Iteratively reading multiple cvs from different directories into dataframe then write to new csv












2















I have hit a wall. So far have the following code:



# define variables of each directory to be used
parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'
orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'
new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'

# Create list of original data files from orig_data_dir
orig_data =
for root, dirs, files in os.walk(orig_data_dir):
for file in files:
if file.endswith('.csv'):
orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv']

# Create list of new data files from new_data_dir
new_data =
for root, dirs, files in os.walk(new_data_dir):
for file in files:
if file.endswith('.csv'):
new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']


I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:




  1. Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)

  2. Do some pandas stuff

  3. Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)

  4. Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.


EDIT:
I have 1000s of csv files. With that said, i need to:




  1. read in the first file pair to the same dataframe:
    1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dir


  2. do pandas stuff


  3. write out the populated dataframe to a new file in parent_data_dir


  4. Repeat for each file pair



2nd iteration would be: Test2.csv and Test2_2.csv



1000 iteration would be: Test1000.csv and Test1000_2.csv



Hope this helps clarify.










share|improve this question

























  • IIUC, why append to a file first and then split them up again and append to another file?

    – Zanshin
    Nov 20 '18 at 7:43











  • @Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

    – Erik Cadaret
    Nov 21 '18 at 2:40











  • When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

    – Zanshin
    Nov 21 '18 at 3:52











  • @Zanshin please see edit above. I hope this clarifies desirable outcome.

    – Erik Cadaret
    Nov 21 '18 at 16:08











  • This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

    – Zanshin
    Nov 21 '18 at 16:12


















2















I have hit a wall. So far have the following code:



# define variables of each directory to be used
parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'
orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'
new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'

# Create list of original data files from orig_data_dir
orig_data =
for root, dirs, files in os.walk(orig_data_dir):
for file in files:
if file.endswith('.csv'):
orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv']

# Create list of new data files from new_data_dir
new_data =
for root, dirs, files in os.walk(new_data_dir):
for file in files:
if file.endswith('.csv'):
new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']


I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:




  1. Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)

  2. Do some pandas stuff

  3. Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)

  4. Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.


EDIT:
I have 1000s of csv files. With that said, i need to:




  1. read in the first file pair to the same dataframe:
    1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dir


  2. do pandas stuff


  3. write out the populated dataframe to a new file in parent_data_dir


  4. Repeat for each file pair



2nd iteration would be: Test2.csv and Test2_2.csv



1000 iteration would be: Test1000.csv and Test1000_2.csv



Hope this helps clarify.










share|improve this question

























  • IIUC, why append to a file first and then split them up again and append to another file?

    – Zanshin
    Nov 20 '18 at 7:43











  • @Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

    – Erik Cadaret
    Nov 21 '18 at 2:40











  • When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

    – Zanshin
    Nov 21 '18 at 3:52











  • @Zanshin please see edit above. I hope this clarifies desirable outcome.

    – Erik Cadaret
    Nov 21 '18 at 16:08











  • This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

    – Zanshin
    Nov 21 '18 at 16:12
















2












2








2








I have hit a wall. So far have the following code:



# define variables of each directory to be used
parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'
orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'
new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'

# Create list of original data files from orig_data_dir
orig_data =
for root, dirs, files in os.walk(orig_data_dir):
for file in files:
if file.endswith('.csv'):
orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv']

# Create list of new data files from new_data_dir
new_data =
for root, dirs, files in os.walk(new_data_dir):
for file in files:
if file.endswith('.csv'):
new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']


I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:




  1. Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)

  2. Do some pandas stuff

  3. Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)

  4. Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.


EDIT:
I have 1000s of csv files. With that said, i need to:




  1. read in the first file pair to the same dataframe:
    1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dir


  2. do pandas stuff


  3. write out the populated dataframe to a new file in parent_data_dir


  4. Repeat for each file pair



2nd iteration would be: Test2.csv and Test2_2.csv



1000 iteration would be: Test1000.csv and Test1000_2.csv



Hope this helps clarify.










share|improve this question
















I have hit a wall. So far have the following code:



# define variables of each directory to be used
parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'
orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'
new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'

# Create list of original data files from orig_data_dir
orig_data =
for root, dirs, files in os.walk(orig_data_dir):
for file in files:
if file.endswith('.csv'):
orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv']

# Create list of new data files from new_data_dir
new_data =
for root, dirs, files in os.walk(new_data_dir):
for file in files:
if file.endswith('.csv'):
new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']


I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:




  1. Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)

  2. Do some pandas stuff

  3. Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)

  4. Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.


EDIT:
I have 1000s of csv files. With that said, i need to:




  1. read in the first file pair to the same dataframe:
    1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dir


  2. do pandas stuff


  3. write out the populated dataframe to a new file in parent_data_dir


  4. Repeat for each file pair



2nd iteration would be: Test2.csv and Test2_2.csv



1000 iteration would be: Test1000.csv and Test1000_2.csv



Hope this helps clarify.







python pandas csv






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 16:06







Erik Cadaret

















asked Nov 20 '18 at 6:36









Erik CadaretErik Cadaret

133




133













  • IIUC, why append to a file first and then split them up again and append to another file?

    – Zanshin
    Nov 20 '18 at 7:43











  • @Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

    – Erik Cadaret
    Nov 21 '18 at 2:40











  • When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

    – Zanshin
    Nov 21 '18 at 3:52











  • @Zanshin please see edit above. I hope this clarifies desirable outcome.

    – Erik Cadaret
    Nov 21 '18 at 16:08











  • This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

    – Zanshin
    Nov 21 '18 at 16:12





















  • IIUC, why append to a file first and then split them up again and append to another file?

    – Zanshin
    Nov 20 '18 at 7:43











  • @Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

    – Erik Cadaret
    Nov 21 '18 at 2:40











  • When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

    – Zanshin
    Nov 21 '18 at 3:52











  • @Zanshin please see edit above. I hope this clarifies desirable outcome.

    – Erik Cadaret
    Nov 21 '18 at 16:08











  • This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

    – Zanshin
    Nov 21 '18 at 16:12



















IIUC, why append to a file first and then split them up again and append to another file?

– Zanshin
Nov 20 '18 at 7:43





IIUC, why append to a file first and then split them up again and append to another file?

– Zanshin
Nov 20 '18 at 7:43













@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

– Erik Cadaret
Nov 21 '18 at 2:40





@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

– Erik Cadaret
Nov 21 '18 at 2:40













When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

– Zanshin
Nov 21 '18 at 3:52





When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

– Zanshin
Nov 21 '18 at 3:52













@Zanshin please see edit above. I hope this clarifies desirable outcome.

– Erik Cadaret
Nov 21 '18 at 16:08





@Zanshin please see edit above. I hope this clarifies desirable outcome.

– Erik Cadaret
Nov 21 '18 at 16:08













This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

– Zanshin
Nov 21 '18 at 16:12







This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

– Zanshin
Nov 21 '18 at 16:12














2 Answers
2






active

oldest

votes


















0














The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:



for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)


Here I accumulate the result in the Original file. Only one loop is necessary.






share|improve this answer
























  • Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

    – Erik Cadaret
    Nov 22 '18 at 2:52











  • I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

    – B. M.
    Nov 22 '18 at 6:41



















0














Something like this would help you:



from itertools import chain
import fnmatch

paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')

file1 =
file2 =

for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)

To create your dataframes you would do something like this;

df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)

df_file2 etc.


Note; the 'sep' in your csv might be different.



EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.






share|improve this answer


























  • Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

    – Erik Cadaret
    Nov 21 '18 at 2:30













  • Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

    – Zanshin
    Nov 21 '18 at 3:18











  • On what will you match the files then? Parts of the filename that are similar? Not want they end on?

    – Zanshin
    Nov 21 '18 at 3:44













  • i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

    – Erik Cadaret
    Nov 21 '18 at 16:09











  • I've put in an edit earlier, should help you now

    – Zanshin
    Nov 21 '18 at 16:13











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387513%2fiteratively-reading-multiple-cvs-from-different-directories-into-dataframe-then%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:



for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)


Here I accumulate the result in the Original file. Only one loop is necessary.






share|improve this answer
























  • Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

    – Erik Cadaret
    Nov 22 '18 at 2:52











  • I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

    – B. M.
    Nov 22 '18 at 6:41
















0














The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:



for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)


Here I accumulate the result in the Original file. Only one loop is necessary.






share|improve this answer
























  • Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

    – Erik Cadaret
    Nov 22 '18 at 2:52











  • I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

    – B. M.
    Nov 22 '18 at 6:41














0












0








0







The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:



for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)


Here I accumulate the result in the Original file. Only one loop is necessary.






share|improve this answer













The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:



for filename in os.listdir(orig_data_dir):
name,ext = os.path.splitext(filename)
filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
if os.path.isfile(filename_2):
df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
df_New=pd.read_csv(filename_2,index_col=0)
df_Orig.append(df_New).to_csv(orig_data_dir+filename)


Here I accumulate the result in the Original file. Only one loop is necessary.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 '18 at 16:34









B. M.B. M.

13.1k11934




13.1k11934













  • Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

    – Erik Cadaret
    Nov 22 '18 at 2:52











  • I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

    – B. M.
    Nov 22 '18 at 6:41



















  • Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

    – Erik Cadaret
    Nov 22 '18 at 2:52











  • I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

    – B. M.
    Nov 22 '18 at 6:41

















Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

– Erik Cadaret
Nov 22 '18 at 2:52





Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

– Erik Cadaret
Nov 22 '18 at 2:52













I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

– B. M.
Nov 22 '18 at 6:41





I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

– B. M.
Nov 22 '18 at 6:41













0














Something like this would help you:



from itertools import chain
import fnmatch

paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')

file1 =
file2 =

for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)

To create your dataframes you would do something like this;

df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)

df_file2 etc.


Note; the 'sep' in your csv might be different.



EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.






share|improve this answer


























  • Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

    – Erik Cadaret
    Nov 21 '18 at 2:30













  • Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

    – Zanshin
    Nov 21 '18 at 3:18











  • On what will you match the files then? Parts of the filename that are similar? Not want they end on?

    – Zanshin
    Nov 21 '18 at 3:44













  • i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

    – Erik Cadaret
    Nov 21 '18 at 16:09











  • I've put in an edit earlier, should help you now

    – Zanshin
    Nov 21 '18 at 16:13
















0














Something like this would help you:



from itertools import chain
import fnmatch

paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')

file1 =
file2 =

for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)

To create your dataframes you would do something like this;

df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)

df_file2 etc.


Note; the 'sep' in your csv might be different.



EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.






share|improve this answer


























  • Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

    – Erik Cadaret
    Nov 21 '18 at 2:30













  • Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

    – Zanshin
    Nov 21 '18 at 3:18











  • On what will you match the files then? Parts of the filename that are similar? Not want they end on?

    – Zanshin
    Nov 21 '18 at 3:44













  • i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

    – Erik Cadaret
    Nov 21 '18 at 16:09











  • I've put in an edit earlier, should help you now

    – Zanshin
    Nov 21 '18 at 16:13














0












0








0







Something like this would help you:



from itertools import chain
import fnmatch

paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')

file1 =
file2 =

for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)

To create your dataframes you would do something like this;

df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)

df_file2 etc.


Note; the 'sep' in your csv might be different.



EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.






share|improve this answer















Something like this would help you:



from itertools import chain
import fnmatch

paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')

file1 =
file2 =

for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):
for file in files:
if file in fnmatch.filter(files, '*1*.csv'):
file1.append(file)
if file in fnmatch.filter(files, '*2*.csv'):
file2.append(file)

To create your dataframes you would do something like this;

df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)

df_file2 etc.


Note; the 'sep' in your csv might be different.



EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 21 '18 at 11:04

























answered Nov 20 '18 at 7:58









ZanshinZanshin

732421




732421













  • Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

    – Erik Cadaret
    Nov 21 '18 at 2:30













  • Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

    – Zanshin
    Nov 21 '18 at 3:18











  • On what will you match the files then? Parts of the filename that are similar? Not want they end on?

    – Zanshin
    Nov 21 '18 at 3:44













  • i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

    – Erik Cadaret
    Nov 21 '18 at 16:09











  • I've put in an edit earlier, should help you now

    – Zanshin
    Nov 21 '18 at 16:13



















  • Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

    – Erik Cadaret
    Nov 21 '18 at 2:30













  • Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

    – Zanshin
    Nov 21 '18 at 3:18











  • On what will you match the files then? Parts of the filename that are similar? Not want they end on?

    – Zanshin
    Nov 21 '18 at 3:44













  • i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

    – Erik Cadaret
    Nov 21 '18 at 16:09











  • I've put in an edit earlier, should help you now

    – Zanshin
    Nov 21 '18 at 16:13

















Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

– Erik Cadaret
Nov 21 '18 at 2:30







Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

– Erik Cadaret
Nov 21 '18 at 2:30















Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

– Zanshin
Nov 21 '18 at 3:18





Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

– Zanshin
Nov 21 '18 at 3:18













On what will you match the files then? Parts of the filename that are similar? Not want they end on?

– Zanshin
Nov 21 '18 at 3:44







On what will you match the files then? Parts of the filename that are similar? Not want they end on?

– Zanshin
Nov 21 '18 at 3:44















i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

– Erik Cadaret
Nov 21 '18 at 16:09





i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

– Erik Cadaret
Nov 21 '18 at 16:09













I've put in an edit earlier, should help you now

– Zanshin
Nov 21 '18 at 16:13





I've put in an edit earlier, should help you now

– Zanshin
Nov 21 '18 at 16:13


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387513%2fiteratively-reading-multiple-cvs-from-different-directories-into-dataframe-then%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith