Iteratively reading multiple cvs from different directories into dataframe then write to new csv

I have hit a wall. So far have the following code:

# define variables of each directory to be used

parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'

orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'

new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'



# Create list of original data files from orig_data_dir

orig_data = 

for root, dirs, files in os.walk(orig_data_dir):

    for file in files:

        if file.endswith('.csv'):

            orig_data.append(file)

# It populates the file names located in the orig_data_dir

# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv'] 



# Create list of new data files from new_data_dir

new_data = 

for root, dirs, files in os.walk(new_data_dir):

    for file in files:

        if file.endswith('.csv'):

            new_data.append(file)

# It populates the file names located in the orig_data_dir

# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']

I have three csv files in each directory. The csv files that end with _2.csv have new data I would like to append to the old data into a new csv file for each respective pair. Each csv file has the exact same rows. What I am trying to do is the following:

Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)

Do some pandas stuff

Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)

Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.

EDIT:
I have 1000s of csv files. With that said, i need to:

read in the first file pair to the same dataframe:
1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dir

do pandas stuff

write out the populated dataframe to a new file in parent_data_dir

Repeat for each file pair

2nd iteration would be: Test2.csv and Test2_2.csv

1000 iteration would be: Test1000.csv and Test1000_2.csv

Hope this helps clarify.

edited Nov 21 '18 at 16:06

asked Nov 20 '18 at 6:36

Erik Cadaret

133

IIUC, why append to a file first and then split them up again and append to another file?

– Zanshin
Nov 20 '18 at 7:43

@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

– Erik Cadaret
Nov 21 '18 at 2:40

When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

– Zanshin
Nov 21 '18 at 3:52

@Zanshin please see edit above. I hope this clarifies desirable outcome.

– Erik Cadaret
Nov 21 '18 at 16:08

This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

– Zanshin
Nov 21 '18 at 16:12

add a comment |

I have hit a wall. So far have the following code:

# define variables of each directory to be used

parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'

orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'

new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'



# Create list of original data files from orig_data_dir

orig_data = 

for root, dirs, files in os.walk(orig_data_dir):

    for file in files:

        if file.endswith('.csv'):

            orig_data.append(file)

# It populates the file names located in the orig_data_dir

# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv'] 



# Create list of new data files from new_data_dir

new_data = 

for root, dirs, files in os.walk(new_data_dir):

    for file in files:

        if file.endswith('.csv'):

            new_data.append(file)

# It populates the file names located in the orig_data_dir

# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']

Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)

Do some pandas stuff

Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)

Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.

EDIT:
I have 1000s of csv files. With that said, i need to:

read in the first file pair to the same dataframe:
1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dir

do pandas stuff

write out the populated dataframe to a new file in parent_data_dir

Repeat for each file pair

2nd iteration would be: Test2.csv and Test2_2.csv

1000 iteration would be: Test1000.csv and Test1000_2.csv

Hope this helps clarify.

edited Nov 21 '18 at 16:06

asked Nov 20 '18 at 6:36

Erik Cadaret

133

IIUC, why append to a file first and then split them up again and append to another file?

– Zanshin
Nov 20 '18 at 7:43

@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

– Erik Cadaret
Nov 21 '18 at 2:40

When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

– Zanshin
Nov 21 '18 at 3:52

@Zanshin please see edit above. I hope this clarifies desirable outcome.

– Erik Cadaret
Nov 21 '18 at 16:08

This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

– Zanshin
Nov 21 '18 at 16:12

add a comment |

I have hit a wall. So far have the following code:

# define variables of each directory to be used

parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'

orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'

new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'



# Create list of original data files from orig_data_dir

orig_data = 

for root, dirs, files in os.walk(orig_data_dir):

    for file in files:

        if file.endswith('.csv'):

            orig_data.append(file)

# It populates the file names located in the orig_data_dir

# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv'] 



# Create list of new data files from new_data_dir

new_data = 

for root, dirs, files in os.walk(new_data_dir):

    for file in files:

        if file.endswith('.csv'):

            new_data.append(file)

# It populates the file names located in the orig_data_dir

# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']

Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)

Do some pandas stuff

Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)

Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.

EDIT:
I have 1000s of csv files. With that said, i need to:

read in the first file pair to the same dataframe:
1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dir

do pandas stuff

write out the populated dataframe to a new file in parent_data_dir

Repeat for each file pair

2nd iteration would be: Test2.csv and Test2_2.csv

1000 iteration would be: Test1000.csv and Test1000_2.csv

Hope this helps clarify.

edited Nov 21 '18 at 16:06

asked Nov 20 '18 at 6:36

Erik Cadaret

133

I have hit a wall. So far have the following code:

# define variables of each directory to be used

parent_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\'

orig_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\Original\'

new_data_dir = 'C:\Users\Admin\Documents\Python Scripts\Data\New\'



# Create list of original data files from orig_data_dir

orig_data = 

for root, dirs, files in os.walk(orig_data_dir):

    for file in files:

        if file.endswith('.csv'):

            orig_data.append(file)

# It populates the file names located in the orig_data_dir

# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv'] 



# Create list of new data files from new_data_dir

new_data = 

for root, dirs, files in os.walk(new_data_dir):

    for file in files:

        if file.endswith('.csv'):

            new_data.append(file)

# It populates the file names located in the orig_data_dir

# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']

Read Test1.csv and Test1_2.csv into one dataframe using the lists I created (if better way, I am open to this) (next iteration = Test2.csv and Test2_2.csv, etc.)

Do some pandas stuff

Write new file called Test_Compiled_1.csv (next iteration = Test_Compiled_2.csv, etc.)

Repeat until each csv pair from the two directories have been combined into a new csv file for each pair.

EDIT:
I have 1000s of csv files. With that said, i need to:

read in the first file pair to the same dataframe:
1st iteration: Test1.csv located in orig_data_dir and Test1_2.csv located in new_data_dir

do pandas stuff

write out the populated dataframe to a new file in parent_data_dir

Repeat for each file pair

2nd iteration would be: Test2.csv and Test2_2.csv

1000 iteration would be: Test1000.csv and Test1000_2.csv

Hope this helps clarify.

python pandas csv

edited Nov 21 '18 at 16:06

asked Nov 20 '18 at 6:36

Erik Cadaret

133

edited Nov 21 '18 at 16:06

asked Nov 20 '18 at 6:36

Erik Cadaret

133

edited Nov 21 '18 at 16:06

asked Nov 20 '18 at 6:36

Erik Cadaret

133

asked Nov 20 '18 at 6:36

Erik Cadaret

133

asked Nov 20 '18 at 6:36

Erik Cadaret

133

IIUC, why append to a file first and then split them up again and append to another file?

– Zanshin
Nov 20 '18 at 7:43

@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

– Erik Cadaret
Nov 21 '18 at 2:40

When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

– Zanshin
Nov 21 '18 at 3:52

@Zanshin please see edit above. I hope this clarifies desirable outcome.

– Erik Cadaret
Nov 21 '18 at 16:08

This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

– Zanshin
Nov 21 '18 at 16:12

add a comment |

IIUC, why append to a file first and then split them up again and append to another file?

– Zanshin
Nov 20 '18 at 7:43

@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

– Erik Cadaret
Nov 21 '18 at 2:40

When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

– Zanshin
Nov 21 '18 at 3:52

@Zanshin please see edit above. I hope this clarifies desirable outcome.

– Erik Cadaret
Nov 21 '18 at 16:08

This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

– Zanshin
Nov 21 '18 at 16:12

IIUC, why append to a file first and then split them up again and append to another file?

– Zanshin
Nov 20 '18 at 7:43

@Zanshin it's to keep each file pair seperate from the others as each csv file is associated with a location. I also tried to combine all of them together into one large dataframe before. However, Jupyter Notebook doesn't like it since the combined file size is large (GB's).

– Erik Cadaret
Nov 21 '18 at 2:40

When you import into your notebook nothing happens to their location. In the end you want to combine the files, which contain the same string in their filenames, into a dataframe, correct?

– Zanshin
Nov 21 '18 at 3:52

@Zanshin please see edit above. I hope this clarifies desirable outcome.

– Erik Cadaret
Nov 21 '18 at 16:08

This is too broad, the writing of the endresult you can find on this site. Steps 1 and 2 you can do with my answer

– Zanshin
Nov 21 '18 at 16:12

add a comment |

2 Answers
2

active

oldest

votes

The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:

for filename in os.listdir(orig_data_dir):

    name,ext = os.path.splitext(filename)

    filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old

    if os.path.isfile(filename_2):

        df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)

        df_New=pd.read_csv(filename_2,index_col=0)

        df_Orig.append(df_New).to_csv(orig_data_dir+filename)

Here I accumulate the result in the Original file. Only one loop is necessary.

answered Nov 21 '18 at 16:34

B. M.

13.1k11934

Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

– Erik Cadaret
Nov 22 '18 at 2:52

I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

– B. M.
Nov 22 '18 at 6:41

add a comment |

Something like this would help you:

from itertools import chain

import fnmatch



paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')



file1 = 

file2 = 



for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):

    for file in files:

    if file in fnmatch.filter(files, '*1*.csv'):

        file1.append(file)

    if file in fnmatch.filter(files, '*2*.csv'):

        file2.append(file)



To create your dataframes you would do something like this;



df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)



df_file2 etc.

Note; the 'sep' in your csv might be different.

EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.

edited Nov 21 '18 at 11:04

answered Nov 20 '18 at 7:58

Zanshin

732421

Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

– Erik Cadaret
Nov 21 '18 at 2:30

Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

– Zanshin
Nov 21 '18 at 3:18

On what will you match the files then? Parts of the filename that are similar? Not want they end on?

– Zanshin
Nov 21 '18 at 3:44

i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

– Erik Cadaret
Nov 21 '18 at 16:09

I've put in an edit earlier, should help you now

– Zanshin
Nov 21 '18 at 16:13

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387513%2fiteratively-reading-multiple-cvs-from-different-directories-into-dataframe-then%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:

for filename in os.listdir(orig_data_dir):

    name,ext = os.path.splitext(filename)

    filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old

    if os.path.isfile(filename_2):

        df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)

        df_New=pd.read_csv(filename_2,index_col=0)

        df_Orig.append(df_New).to_csv(orig_data_dir+filename)

Here I accumulate the result in the Original file. Only one loop is necessary.

answered Nov 21 '18 at 16:34

B. M.

13.1k11934

Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

– Erik Cadaret
Nov 22 '18 at 2:52

I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

– B. M.
Nov 22 '18 at 6:41

add a comment |

The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:

for filename in os.listdir(orig_data_dir):

    name,ext = os.path.splitext(filename)

    filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old

    if os.path.isfile(filename_2):

        df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)

        df_New=pd.read_csv(filename_2,index_col=0)

        df_Orig.append(df_New).to_csv(orig_data_dir+filename)

Here I accumulate the result in the Original file. Only one loop is necessary.

answered Nov 21 '18 at 16:34

B. M.

13.1k11934

Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

– Erik Cadaret
Nov 22 '18 at 2:52

I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

– B. M.
Nov 22 '18 at 6:41

add a comment |

The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:

for filename in os.listdir(orig_data_dir):

    name,ext = os.path.splitext(filename)

    filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old

    if os.path.isfile(filename_2):

        df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)

        df_New=pd.read_csv(filename_2,index_col=0)

        df_Orig.append(df_New).to_csv(orig_data_dir+filename)

Here I accumulate the result in the Original file. Only one loop is necessary.

answered Nov 21 '18 at 16:34

B. M.

13.1k11934

The best advice it to let the same names to the files in each directory,
and let only useful data in these directories. Here is a solution for different names:

for filename in os.listdir(orig_data_dir):

    name,ext = os.path.splitext(filename)

    filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old

    if os.path.isfile(filename_2):

        df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)

        df_New=pd.read_csv(filename_2,index_col=0)

        df_Orig.append(df_New).to_csv(orig_data_dir+filename)

Here I accumulate the result in the Original file. Only one loop is necessary.

answered Nov 21 '18 at 16:34

B. M.

13.1k11934

answered Nov 21 '18 at 16:34

B. M.

13.1k11934

answered Nov 21 '18 at 16:34

B. M.

13.1k11934

answered Nov 21 '18 at 16:34

B. M.

13.1k11934

Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

– Erik Cadaret
Nov 22 '18 at 2:52

I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

– B. M.
Nov 22 '18 at 6:41

add a comment |

Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

– Erik Cadaret
Nov 22 '18 at 2:52

I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

– B. M.
Nov 22 '18 at 6:41

Thank you for your efforts. This works for all steps as i was looking for. I added an exception for when some files have no data. I added this to your code input.

– Erik Cadaret
Nov 22 '18 at 2:52

I don't think it's the best way. Empty data is not as such a problem, and it's better to ensure elsewhere (at creation) that Original is clean.

– B. M.
Nov 22 '18 at 6:41

add a comment |

Something like this would help you:

from itertools import chain

import fnmatch



paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')



file1 = 

file2 = 



for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):

    for file in files:

    if file in fnmatch.filter(files, '*1*.csv'):

        file1.append(file)

    if file in fnmatch.filter(files, '*2*.csv'):

        file2.append(file)



To create your dataframes you would do something like this;



df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)



df_file2 etc.

Note; the 'sep' in your csv might be different.

EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.

edited Nov 21 '18 at 11:04

answered Nov 20 '18 at 7:58

Zanshin

732421

Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

– Erik Cadaret
Nov 21 '18 at 2:30

Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

– Zanshin
Nov 21 '18 at 3:18

On what will you match the files then? Parts of the filename that are similar? Not want they end on?

– Zanshin
Nov 21 '18 at 3:44

i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

– Erik Cadaret
Nov 21 '18 at 16:09

I've put in an edit earlier, should help you now

– Zanshin
Nov 21 '18 at 16:13

|
show 1 more comment

Something like this would help you:

from itertools import chain

import fnmatch



paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')



file1 = 

file2 = 



for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):

    for file in files:

    if file in fnmatch.filter(files, '*1*.csv'):

        file1.append(file)

    if file in fnmatch.filter(files, '*2*.csv'):

        file2.append(file)



To create your dataframes you would do something like this;



df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)



df_file2 etc.

Note; the 'sep' in your csv might be different.

EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.

edited Nov 21 '18 at 11:04

answered Nov 20 '18 at 7:58

Zanshin

732421

Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

– Erik Cadaret
Nov 21 '18 at 2:30

Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

– Zanshin
Nov 21 '18 at 3:18

On what will you match the files then? Parts of the filename that are similar? Not want they end on?

– Zanshin
Nov 21 '18 at 3:44

i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

– Erik Cadaret
Nov 21 '18 at 16:09

I've put in an edit earlier, should help you now

– Zanshin
Nov 21 '18 at 16:13

|
show 1 more comment

Something like this would help you:

from itertools import chain

import fnmatch



paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')



file1 = 

file2 = 



for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):

    for file in files:

    if file in fnmatch.filter(files, '*1*.csv'):

        file1.append(file)

    if file in fnmatch.filter(files, '*2*.csv'):

        file2.append(file)



To create your dataframes you would do something like this;



df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)



df_file2 etc.

Note; the 'sep' in your csv might be different.

EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.

edited Nov 21 '18 at 11:04

answered Nov 20 '18 at 7:58

Zanshin

732421

Something like this would help you:

from itertools import chain

import fnmatch



paths = ('/path/to/directory/one/', '/path/to/directory/two/', 'etc.', 'etc.')



file1 = 

file2 = 



for path, dirs, files in chain.from_iterable(os.walk(path) for path in paths):

    for file in files:

    if file in fnmatch.filter(files, '*1*.csv'):

        file1.append(file)

    if file in fnmatch.filter(files, '*2*.csv'):

        file2.append(file)



To create your dataframes you would do something like this;



df_file1 = pd.concat([pd.DataFrame(pd.read_csv(file1[0], sep=';')), pd.DataFrame(pd.read_csv(file1[1], sep=';'))], ignore_index=True)



df_file2 etc.

Note; the 'sep' in your csv might be different.

EDIT; I've changed endswith with fnmatch.filter, you can now use any pattern you like for matching the files you need in the different directories.

edited Nov 21 '18 at 11:04

answered Nov 20 '18 at 7:58

Zanshin

732421

edited Nov 21 '18 at 11:04

answered Nov 20 '18 at 7:58

Zanshin

732421

answered Nov 20 '18 at 7:58

Zanshin

732421

answered Nov 20 '18 at 7:58

Zanshin

732421

Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

– Erik Cadaret
Nov 21 '18 at 2:30

Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

– Zanshin
Nov 21 '18 at 3:18

On what will you match the files then? Parts of the filename that are similar? Not want they end on?

– Zanshin
Nov 21 '18 at 3:44

i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

– Erik Cadaret
Nov 21 '18 at 16:09

I've put in an edit earlier, should help you now

– Zanshin
Nov 21 '18 at 16:13

|
show 1 more comment

Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

– Erik Cadaret
Nov 21 '18 at 2:30

Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

– Zanshin
Nov 21 '18 at 3:18

On what will you match the files then? Parts of the filename that are similar? Not want they end on?

– Zanshin
Nov 21 '18 at 3:44

i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

– Erik Cadaret
Nov 21 '18 at 16:09

I've put in an edit earlier, should help you now

– Zanshin
Nov 21 '18 at 16:13

Thank you for the feedback. This is helpful in understanging how to access the directories iteratively. To clarify, I am going to be iterating over 1,000s of csv files (ones ending without "_2" and ones with "_2" which make up each respective pair). I just used three in each directory as an example. With that said, i am looking for a solution that will iteratively place each pair into a dataframe and then write the result to a file and repeat this until there are no pairs left to iterate over. Hope this helps clarify what i am seeking. Thank you for your assistance.

– Erik Cadaret
Nov 21 '18 at 2:30

Then your question is unclear. You should update. In your example you have the csv's ending on '1.csv' and this solution would work.

– Zanshin
Nov 21 '18 at 3:18

On what will you match the files then? Parts of the filename that are similar? Not want they end on?

– Zanshin
Nov 21 '18 at 3:44

i will be matching entire filenames and distinguishing the original data from the new data by what they end in.

– Erik Cadaret
Nov 21 '18 at 16:09

I've put in an edit earlier, should help you now

– Zanshin
Nov 21 '18 at 16:13

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu