How to match two file from two different folder if the part of the file name matches












0















I have files with matching strings in their names located in different locations and I need to match them.
Here is what the files looks like,



    na_files
/Users/AS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou
/Users/AS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou
/Users/AS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou


And the following list of files in other directory



lb_files
/Users/DS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib
/Users/DS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib
/Users/DS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib


And I need to match the na_files and lb_files from both lists, to proceed with.
So I tried, using



na_files = grep("cou", list.files(DIR, recursive=T, full.names=T), value=T)
lb_files = grep("lib", list.files(DIR, recursive=T, full.names=T), value=T)
all_patients = NA



for(curr_file_idx in 1:length(pa_files)){
curr_file = pat_files[curr_file_idx]
libsize_file = libsize_files[curr_file_idx]
curr_pa = data.frame(fread(curr_file))
pa_id = strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id = substr(patient_id, 1, 5)
libsize = data.frame(fread(libsize_file))
pa_id2 = strsplit(libsize_file, "[.][P][Z][T][-]")[[1]][[2]]
pa_id2 = substr(pa_id2, 1, 5)
if(patient_id != pa_id2){
print(pa_id)
print(pa_id2)
print("WRONG LB")
return(1)
}


But this substring pattern only search one file and throws error for the rest.



Error in strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]] : 
subscript out of bounds


I need any regular expression or a pattern which would work for all files. I need the exactly the this part after the period .M4DF, .MSFG, .MOHUA should be matching from the three files.










share|improve this question





























    0















    I have files with matching strings in their names located in different locations and I need to match them.
    Here is what the files looks like,



        na_files
    /Users/AS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou
    /Users/AS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou
    /Users/AS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou


    And the following list of files in other directory



    lb_files
    /Users/DS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib
    /Users/DS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib
    /Users/DS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib


    And I need to match the na_files and lb_files from both lists, to proceed with.
    So I tried, using



    na_files = grep("cou", list.files(DIR, recursive=T, full.names=T), value=T)
    lb_files = grep("lib", list.files(DIR, recursive=T, full.names=T), value=T)
    all_patients = NA



    for(curr_file_idx in 1:length(pa_files)){
    curr_file = pat_files[curr_file_idx]
    libsize_file = libsize_files[curr_file_idx]
    curr_pa = data.frame(fread(curr_file))
    pa_id = strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]]
    pa_id = substr(patient_id, 1, 5)
    libsize = data.frame(fread(libsize_file))
    pa_id2 = strsplit(libsize_file, "[.][P][Z][T][-]")[[1]][[2]]
    pa_id2 = substr(pa_id2, 1, 5)
    if(patient_id != pa_id2){
    print(pa_id)
    print(pa_id2)
    print("WRONG LB")
    return(1)
    }


    But this substring pattern only search one file and throws error for the rest.



    Error in strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]] : 
    subscript out of bounds


    I need any regular expression or a pattern which would work for all files. I need the exactly the this part after the period .M4DF, .MSFG, .MOHUA should be matching from the three files.










    share|improve this question



























      0












      0








      0








      I have files with matching strings in their names located in different locations and I need to match them.
      Here is what the files looks like,



          na_files
      /Users/AS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou
      /Users/AS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou
      /Users/AS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou


      And the following list of files in other directory



      lb_files
      /Users/DS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib
      /Users/DS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib
      /Users/DS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib


      And I need to match the na_files and lb_files from both lists, to proceed with.
      So I tried, using



      na_files = grep("cou", list.files(DIR, recursive=T, full.names=T), value=T)
      lb_files = grep("lib", list.files(DIR, recursive=T, full.names=T), value=T)
      all_patients = NA



      for(curr_file_idx in 1:length(pa_files)){
      curr_file = pat_files[curr_file_idx]
      libsize_file = libsize_files[curr_file_idx]
      curr_pa = data.frame(fread(curr_file))
      pa_id = strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]]
      pa_id = substr(patient_id, 1, 5)
      libsize = data.frame(fread(libsize_file))
      pa_id2 = strsplit(libsize_file, "[.][P][Z][T][-]")[[1]][[2]]
      pa_id2 = substr(pa_id2, 1, 5)
      if(patient_id != pa_id2){
      print(pa_id)
      print(pa_id2)
      print("WRONG LB")
      return(1)
      }


      But this substring pattern only search one file and throws error for the rest.



      Error in strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]] : 
      subscript out of bounds


      I need any regular expression or a pattern which would work for all files. I need the exactly the this part after the period .M4DF, .MSFG, .MOHUA should be matching from the three files.










      share|improve this question
















      I have files with matching strings in their names located in different locations and I need to match them.
      Here is what the files looks like,



          na_files
      /Users/AS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou
      /Users/AS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou
      /Users/AS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou


      And the following list of files in other directory



      lb_files
      /Users/DS/SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib
      /Users/DS/SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib
      /Users/DS/SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib


      And I need to match the na_files and lb_files from both lists, to proceed with.
      So I tried, using



      na_files = grep("cou", list.files(DIR, recursive=T, full.names=T), value=T)
      lb_files = grep("lib", list.files(DIR, recursive=T, full.names=T), value=T)
      all_patients = NA



      for(curr_file_idx in 1:length(pa_files)){
      curr_file = pat_files[curr_file_idx]
      libsize_file = libsize_files[curr_file_idx]
      curr_pa = data.frame(fread(curr_file))
      pa_id = strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]]
      pa_id = substr(patient_id, 1, 5)
      libsize = data.frame(fread(libsize_file))
      pa_id2 = strsplit(libsize_file, "[.][P][Z][T][-]")[[1]][[2]]
      pa_id2 = substr(pa_id2, 1, 5)
      if(patient_id != pa_id2){
      print(pa_id)
      print(pa_id2)
      print("WRONG LB")
      return(1)
      }


      But this substring pattern only search one file and throws error for the rest.



      Error in strsplit(curr_file, "[.][P][Z][T][-]")[[1]][[2]] : 
      subscript out of bounds


      I need any regular expression or a pattern which would work for all files. I need the exactly the this part after the period .M4DF, .MSFG, .MOHUA should be matching from the three files.







      r substr






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 '18 at 14:14







      user1017373

















      asked Nov 21 '18 at 13:42









      user1017373user1017373

      70411025




      70411025
























          1 Answer
          1






          active

          oldest

          votes


















          0














          based on your data, extract the file names for na_files and lb_files



          na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
          lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")

          na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
          lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")


          na_files_name %in% lb_files_name #checks the file names match





          share|improve this answer
























          • Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA

            – user1017373
            Nov 21 '18 at 14:12











          • this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.

            – e.matt
            Nov 22 '18 at 9:34











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413406%2fhow-to-match-two-file-from-two-different-folder-if-the-part-of-the-file-name-mat%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          based on your data, extract the file names for na_files and lb_files



          na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
          lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")

          na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
          lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")


          na_files_name %in% lb_files_name #checks the file names match





          share|improve this answer
























          • Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA

            – user1017373
            Nov 21 '18 at 14:12











          • this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.

            – e.matt
            Nov 22 '18 at 9:34
















          0














          based on your data, extract the file names for na_files and lb_files



          na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
          lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")

          na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
          lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")


          na_files_name %in% lb_files_name #checks the file names match





          share|improve this answer
























          • Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA

            – user1017373
            Nov 21 '18 at 14:12











          • this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.

            – e.matt
            Nov 22 '18 at 9:34














          0












          0








          0







          based on your data, extract the file names for na_files and lb_files



          na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
          lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")

          na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
          lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")


          na_files_name %in% lb_files_name #checks the file names match





          share|improve this answer













          based on your data, extract the file names for na_files and lb_files



          na_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.cou","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.cou","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.cou")
          lb_files<-c("SAB-M-13.M4DF-2.T-bR.r1-v1_0-ADDFF.0087.AD.lib","SAB-M-32.MSFG-2.T-bR.r1-v1_0-ADDFF.3989.AD.lib","SAB-M-3.PZT-MOHUA-3.T-bR.r1-v1_0-ADDFF.6188.AD.lib")

          na_files_name<-stringi::stri_extract(na_files,regex = "[^.]+") #extracts file name up to first .
          lb_files_name<-stringi::stri_extract(lb_files,regex = "[^.]+")


          na_files_name %in% lb_files_name #checks the file names match






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 14:03









          e.matte.matt

          8517




          8517













          • Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA

            – user1017373
            Nov 21 '18 at 14:12











          • this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.

            – e.matt
            Nov 22 '18 at 9:34



















          • Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA

            – user1017373
            Nov 21 '18 at 14:12











          • this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.

            – e.matt
            Nov 22 '18 at 9:34

















          Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA

          – user1017373
          Nov 21 '18 at 14:12





          Sorry I need exactly the part after period to matching with files from two directories. For example M4DF, MSFG and MOHUA

          – user1017373
          Nov 21 '18 at 14:12













          this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.

          – e.matt
          Nov 22 '18 at 9:34





          this is tricky as your file name patter isn't consistent as MOHUA as PZT- preceding it.

          – e.matt
          Nov 22 '18 at 9:34




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413406%2fhow-to-match-two-file-from-two-different-folder-if-the-part-of-the-file-name-mat%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          How to fix TextFormField cause rebuild widget in Flutter

          Npm cannot find a required file even through it is in the searched directory