How to match two file from two different folder if the part of the file name matches
I have files with matching strings in their names located in different locations and I need to match them.
Here is what the files looks like,
na_files
/Users/AS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou
/Users/AS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou
/Users/AS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou
And the following list of files in other directory
lb_files
/Users/DS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib
/Users/DS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib
/Users/DS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib
And I need to match the na_files
and lb_files
from both lists, to proceed with.
So I tried, using
na_files = grep("cou", list.files(DIR, recursive=T, full.names=T), value=T)
lb_files = grep("lib", list.files(DIR, recursive=T, full.names=T), value=T)
all_patients = NA
for(curr_file_idx in 1:length(pa_files)){
curr_file = pat_files[curr_file_idx]
libsize_file = libsize_files[curr_file_idx]
curr_pa = data.frame(fread(curr_file))
pa_id = strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id = substr(patient_id, 1, 5)
libsize = data.frame(fread(libsize_file))
pa_id2 = strsplit(libsize_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id2 = substr(pa_id2, 1, 5)
if(patient_id != pa_id2){
print(pa_id)
print(pa_id2)
print("WRONG LB")
return(1)
}
But this substring pattern only search one file and throws error for the rest.
Error in strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]] :
subscript out of bounds
I need any regular expression or a pattern which would work for all files. I need the exactly the this part after the period .M4DF
, .MSFG
, .MOHUA
should be matching from the three files.
r substr
add a comment |
I have files with matching strings in their names located in different locations and I need to match them.
Here is what the files looks like,
na_files
/Users/AS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou
/Users/AS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou
/Users/AS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou
And the following list of files in other directory
lb_files
/Users/DS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib
/Users/DS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib
/Users/DS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib
And I need to match the na_files
and lb_files
from both lists, to proceed with.
So I tried, using
na_files = grep("cou", list.files(DIR, recursive=T, full.names=T), value=T)
lb_files = grep("lib", list.files(DIR, recursive=T, full.names=T), value=T)
all_patients = NA
for(curr_file_idx in 1:length(pa_files)){
curr_file = pat_files[curr_file_idx]
libsize_file = libsize_files[curr_file_idx]
curr_pa = data.frame(fread(curr_file))
pa_id = strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id = substr(patient_id, 1, 5)
libsize = data.frame(fread(libsize_file))
pa_id2 = strsplit(libsize_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id2 = substr(pa_id2, 1, 5)
if(patient_id != pa_id2){
print(pa_id)
print(pa_id2)
print("WRONG LB")
return(1)
}
But this substring pattern only search one file and throws error for the rest.
Error in strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]] :
subscript out of bounds
I need any regular expression or a pattern which would work for all files. I need the exactly the this part after the period .M4DF
, .MSFG
, .MOHUA
should be matching from the three files.
r substr
add a comment |
I have files with matching strings in their names located in different locations and I need to match them.
Here is what the files looks like,
na_files
/Users/AS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou
/Users/AS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou
/Users/AS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou
And the following list of files in other directory
lb_files
/Users/DS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib
/Users/DS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib
/Users/DS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib
And I need to match the na_files
and lb_files
from both lists, to proceed with.
So I tried, using
na_files = grep("cou", list.files(DIR, recursive=T, full.names=T), value=T)
lb_files = grep("lib", list.files(DIR, recursive=T, full.names=T), value=T)
all_patients = NA
for(curr_file_idx in 1:length(pa_files)){
curr_file = pat_files[curr_file_idx]
libsize_file = libsize_files[curr_file_idx]
curr_pa = data.frame(fread(curr_file))
pa_id = strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id = substr(patient_id, 1, 5)
libsize = data.frame(fread(libsize_file))
pa_id2 = strsplit(libsize_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id2 = substr(pa_id2, 1, 5)
if(patient_id != pa_id2){
print(pa_id)
print(pa_id2)
print("WRONG LB")
return(1)
}
But this substring pattern only search one file and throws error for the rest.
Error in strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]] :
subscript out of bounds
I need any regular expression or a pattern which would work for all files. I need the exactly the this part after the period .M4DF
, .MSFG
, .MOHUA
should be matching from the three files.
r substr
I have files with matching strings in their names located in different locations and I need to match them.
Here is what the files looks like,
na_files
/Users/AS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou
/Users/AS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou
/Users/AS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou
And the following list of files in other directory
lb_files
/Users/DS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib
/Users/DS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib
/Users/DS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib
And I need to match the na_files
and lb_files
from both lists, to proceed with.
So I tried, using
na_files = grep("cou", list.files(DIR, recursive=T, full.names=T), value=T)
lb_files = grep("lib", list.files(DIR, recursive=T, full.names=T), value=T)
all_patients = NA
for(curr_file_idx in 1:length(pa_files)){
curr_file = pat_files[curr_file_idx]
libsize_file = libsize_files[curr_file_idx]
curr_pa = data.frame(fread(curr_file))
pa_id = strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id = substr(patient_id, 1, 5)
libsize = data.frame(fread(libsize_file))
pa_id2 = strsplit(libsize_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id2 = substr(pa_id2, 1, 5)
if(patient_id != pa_id2){
print(pa_id)
print(pa_id2)
print("WRONG LB")
return(1)
}
But this substring pattern only search one file and throws error for the rest.
Error in strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]] :
subscript out of bounds
I need any regular expression or a pattern which would work for all files. I need the exactly the this part after the period .M4DF
, .MSFG
, .MOHUA
should be matching from the three files.
r substr
r substr
edited Nov 21 '18 at 14:14
user1017373
asked Nov 21 '18 at 13:42
user1017373user1017373
70411025
70411025
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
based on your data, extract the file names for na_files
and lb_files
na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")
na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")
na_files_name %in% lb_files_name #checks the file names match
Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA
– user1017373
Nov 21 '18 at 14:12
this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.
– e.matt
Nov 22 '18 at 9:34
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413406%2fhow-to-match-two-file-from-two-different-folder-if-the-part-of-the-file-name-mat%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
based on your data, extract the file names for na_files
and lb_files
na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")
na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")
na_files_name %in% lb_files_name #checks the file names match
Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA
– user1017373
Nov 21 '18 at 14:12
this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.
– e.matt
Nov 22 '18 at 9:34
add a comment |
based on your data, extract the file names for na_files
and lb_files
na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")
na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")
na_files_name %in% lb_files_name #checks the file names match
Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA
– user1017373
Nov 21 '18 at 14:12
this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.
– e.matt
Nov 22 '18 at 9:34
add a comment |
based on your data, extract the file names for na_files
and lb_files
na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")
na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")
na_files_name %in% lb_files_name #checks the file names match
based on your data, extract the file names for na_files
and lb_files
na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")
na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")
na_files_name %in% lb_files_name #checks the file names match
answered Nov 21 '18 at 14:03
e.matte.matt
8517
8517
Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA
– user1017373
Nov 21 '18 at 14:12
this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.
– e.matt
Nov 22 '18 at 9:34
add a comment |
Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA
– user1017373
Nov 21 '18 at 14:12
this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.
– e.matt
Nov 22 '18 at 9:34
Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA
– user1017373
Nov 21 '18 at 14:12
Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA
– user1017373
Nov 21 '18 at 14:12
this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.
– e.matt
Nov 22 '18 at 9:34
this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.
– e.matt
Nov 22 '18 at 9:34
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413406%2fhow-to-match-two-file-from-two-different-folder-if-the-part-of-the-file-name-mat%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown