Efficient design of work flow to merge tables together












1















I need to create a table that has about 20k rows, 10k columns. The table will be created by merging a dozens of source tables say A,B,C,..,F, and the columns must be ordered accordingly (columns in A first, then columns in B, etc). Table A sets up the denominator, and has all the observations needed in the final table. All tables have the same unique key for join/merge.



Each source table are independent from one another. And the number of columns/what columns in the source table to be merged cannot be known until the source table is ready. Or some times, a source tables needs to be updated again after it is already merged.



I have been doing like A left join B, left join C, left join D...., and I have the output of each merge saved incrementally.



In a situation when C needs to be updated to C*, and merged back again. Is there a way to efficiently do it, in stead of doing something like merge C*, merge D,...all over again for the tables after C*.



Or sometimes I have A, B, D...F ready. And I merged them according to the order. Then C is ready. I need to merge C to A,B, and repeat the merging for other tables all over again. Any way to rethink the creating of the final output, to avoid the redundancy??



Also, should index on the key be created in all of the source tables in this matter?










share|improve this question

























  • What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.

    – user159193
    Nov 20 '18 at 21:08











  • I just found the MERGE documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.

    – JNevill
    Nov 20 '18 at 21:09











  • What are the keys that you are merging on? If you have a unique key then just merge them all in one step.

    – Tom
    Nov 20 '18 at 21:12













  • @JNevill I feel like it is more of a design of work flow, than a coding question.

    – user159193
    Nov 20 '18 at 21:13













  • @Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.

    – user159193
    Nov 20 '18 at 21:16
















1















I need to create a table that has about 20k rows, 10k columns. The table will be created by merging a dozens of source tables say A,B,C,..,F, and the columns must be ordered accordingly (columns in A first, then columns in B, etc). Table A sets up the denominator, and has all the observations needed in the final table. All tables have the same unique key for join/merge.



Each source table are independent from one another. And the number of columns/what columns in the source table to be merged cannot be known until the source table is ready. Or some times, a source tables needs to be updated again after it is already merged.



I have been doing like A left join B, left join C, left join D...., and I have the output of each merge saved incrementally.



In a situation when C needs to be updated to C*, and merged back again. Is there a way to efficiently do it, in stead of doing something like merge C*, merge D,...all over again for the tables after C*.



Or sometimes I have A, B, D...F ready. And I merged them according to the order. Then C is ready. I need to merge C to A,B, and repeat the merging for other tables all over again. Any way to rethink the creating of the final output, to avoid the redundancy??



Also, should index on the key be created in all of the source tables in this matter?










share|improve this question

























  • What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.

    – user159193
    Nov 20 '18 at 21:08











  • I just found the MERGE documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.

    – JNevill
    Nov 20 '18 at 21:09











  • What are the keys that you are merging on? If you have a unique key then just merge them all in one step.

    – Tom
    Nov 20 '18 at 21:12













  • @JNevill I feel like it is more of a design of work flow, than a coding question.

    – user159193
    Nov 20 '18 at 21:13













  • @Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.

    – user159193
    Nov 20 '18 at 21:16














1












1








1


1






I need to create a table that has about 20k rows, 10k columns. The table will be created by merging a dozens of source tables say A,B,C,..,F, and the columns must be ordered accordingly (columns in A first, then columns in B, etc). Table A sets up the denominator, and has all the observations needed in the final table. All tables have the same unique key for join/merge.



Each source table are independent from one another. And the number of columns/what columns in the source table to be merged cannot be known until the source table is ready. Or some times, a source tables needs to be updated again after it is already merged.



I have been doing like A left join B, left join C, left join D...., and I have the output of each merge saved incrementally.



In a situation when C needs to be updated to C*, and merged back again. Is there a way to efficiently do it, in stead of doing something like merge C*, merge D,...all over again for the tables after C*.



Or sometimes I have A, B, D...F ready. And I merged them according to the order. Then C is ready. I need to merge C to A,B, and repeat the merging for other tables all over again. Any way to rethink the creating of the final output, to avoid the redundancy??



Also, should index on the key be created in all of the source tables in this matter?










share|improve this question
















I need to create a table that has about 20k rows, 10k columns. The table will be created by merging a dozens of source tables say A,B,C,..,F, and the columns must be ordered accordingly (columns in A first, then columns in B, etc). Table A sets up the denominator, and has all the observations needed in the final table. All tables have the same unique key for join/merge.



Each source table are independent from one another. And the number of columns/what columns in the source table to be merged cannot be known until the source table is ready. Or some times, a source tables needs to be updated again after it is already merged.



I have been doing like A left join B, left join C, left join D...., and I have the output of each merge saved incrementally.



In a situation when C needs to be updated to C*, and merged back again. Is there a way to efficiently do it, in stead of doing something like merge C*, merge D,...all over again for the tables after C*.



Or sometimes I have A, B, D...F ready. And I merged them according to the order. Then C is ready. I need to merge C to A,B, and repeat the merging for other tables all over again. Any way to rethink the creating of the final output, to avoid the redundancy??



Also, should index on the key be created in all of the source tables in this matter?







sql merge sas left-join






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 22:19







user159193

















asked Nov 20 '18 at 20:57









user159193user159193

94




94













  • What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.

    – user159193
    Nov 20 '18 at 21:08











  • I just found the MERGE documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.

    – JNevill
    Nov 20 '18 at 21:09











  • What are the keys that you are merging on? If you have a unique key then just merge them all in one step.

    – Tom
    Nov 20 '18 at 21:12













  • @JNevill I feel like it is more of a design of work flow, than a coding question.

    – user159193
    Nov 20 '18 at 21:13













  • @Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.

    – user159193
    Nov 20 '18 at 21:16



















  • What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.

    – user159193
    Nov 20 '18 at 21:08











  • I just found the MERGE documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.

    – JNevill
    Nov 20 '18 at 21:09











  • What are the keys that you are merging on? If you have a unique key then just merge them all in one step.

    – Tom
    Nov 20 '18 at 21:12













  • @JNevill I feel like it is more of a design of work flow, than a coding question.

    – user159193
    Nov 20 '18 at 21:13













  • @Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.

    – user159193
    Nov 20 '18 at 21:16

















What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.

– user159193
Nov 20 '18 at 21:08





What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.

– user159193
Nov 20 '18 at 21:08













I just found the MERGE documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.

– JNevill
Nov 20 '18 at 21:09





I just found the MERGE documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.

– JNevill
Nov 20 '18 at 21:09













What are the keys that you are merging on? If you have a unique key then just merge them all in one step.

– Tom
Nov 20 '18 at 21:12







What are the keys that you are merging on? If you have a unique key then just merge them all in one step.

– Tom
Nov 20 '18 at 21:12















@JNevill I feel like it is more of a design of work flow, than a coding question.

– user159193
Nov 20 '18 at 21:13







@JNevill I feel like it is more of a design of work flow, than a coding question.

– user159193
Nov 20 '18 at 21:13















@Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.

– user159193
Nov 20 '18 at 21:16





@Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.

– user159193
Nov 20 '18 at 21:16












3 Answers
3






active

oldest

votes


















0














If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.



data want ;
merge a b c d e f ;
by id;
run;


You can put the list of tables into a macro variable.



%let table_list=a b c2 d e f;
data want ;
merge &table_list;
by id;
run;


You could have the list in a metadata table and generate the code from the metadata table.



proc sql noprint;
select memname
into :table_list separated by ' '
from my_metadata
order by order_num
;
quit;


In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.



Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.



%let table_list=a b new_c d e f ;
%let droplist=c1 c2 c3 c4;
data new ;
* Force new list of variables using NEW list of tables in order ;
if 0 then set &table_list ;
merge old(drop=&droplist) new_c ;
by id;
run;





share|improve this answer


























  • about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?

    – user159193
    Nov 20 '18 at 22:17











  • A merge (or update) statement actually rebuilds the entire table. The modify statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge, update or SQL alter table. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.

    – Richard
    Nov 21 '18 at 16:26



















0














This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.



%macro merging(otds, dsnum, keyvar);
%do i=1 %to &dsnum;
data out&i;
set in&i;
if ^missing(&keyvar);
run;
%end;
%do i=1 %to &dsnum;
proc sort data=out&i nodupkey;
by &keyvar;
run;
%end;
data &otds;
merge
%do i = 1 %to &dsnum;
out&i
%end;
;
by &keyvar;
if &keyvar ne .;
run;
%mend;

%merge(outds,10,Key_Or_Id);


So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!






share|improve this answer































    0














    @Tom answer for merge is best way to combine the source tables initially.



    The issue of an initial situation with some table C having columns, say, z1-z999 and then a subsequent C* with columns say z1-z250 and zed500-zed1729. Where should the original C columns not in C* (those z251-z999) lay in the column order of the combined table ?



    The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0 can be used with data sets in a merge in order to force the implicit construct of the program data vector (PDV) to have a desired column order.



    For example



    data history;
    merge
    a(obs=0)
    b(obs=0)
    c(obs=0)
    d(obs=0)
    e(obs=0)
    history a b c d e;
    ;
    by key;
    run;


    The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e. Any variables in history not in a-e will appear at the far right of the PDV in the order of as seen in the history being updated.



    Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401403%2fefficient-design-of-work-flow-to-merge-tables-together%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.



      data want ;
      merge a b c d e f ;
      by id;
      run;


      You can put the list of tables into a macro variable.



      %let table_list=a b c2 d e f;
      data want ;
      merge &table_list;
      by id;
      run;


      You could have the list in a metadata table and generate the code from the metadata table.



      proc sql noprint;
      select memname
      into :table_list separated by ' '
      from my_metadata
      order by order_num
      ;
      quit;


      In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.



      Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.



      %let table_list=a b new_c d e f ;
      %let droplist=c1 c2 c3 c4;
      data new ;
      * Force new list of variables using NEW list of tables in order ;
      if 0 then set &table_list ;
      merge old(drop=&droplist) new_c ;
      by id;
      run;





      share|improve this answer


























      • about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?

        – user159193
        Nov 20 '18 at 22:17











      • A merge (or update) statement actually rebuilds the entire table. The modify statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge, update or SQL alter table. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.

        – Richard
        Nov 21 '18 at 16:26
















      0














      If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.



      data want ;
      merge a b c d e f ;
      by id;
      run;


      You can put the list of tables into a macro variable.



      %let table_list=a b c2 d e f;
      data want ;
      merge &table_list;
      by id;
      run;


      You could have the list in a metadata table and generate the code from the metadata table.



      proc sql noprint;
      select memname
      into :table_list separated by ' '
      from my_metadata
      order by order_num
      ;
      quit;


      In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.



      Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.



      %let table_list=a b new_c d e f ;
      %let droplist=c1 c2 c3 c4;
      data new ;
      * Force new list of variables using NEW list of tables in order ;
      if 0 then set &table_list ;
      merge old(drop=&droplist) new_c ;
      by id;
      run;





      share|improve this answer


























      • about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?

        – user159193
        Nov 20 '18 at 22:17











      • A merge (or update) statement actually rebuilds the entire table. The modify statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge, update or SQL alter table. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.

        – Richard
        Nov 21 '18 at 16:26














      0












      0








      0







      If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.



      data want ;
      merge a b c d e f ;
      by id;
      run;


      You can put the list of tables into a macro variable.



      %let table_list=a b c2 d e f;
      data want ;
      merge &table_list;
      by id;
      run;


      You could have the list in a metadata table and generate the code from the metadata table.



      proc sql noprint;
      select memname
      into :table_list separated by ' '
      from my_metadata
      order by order_num
      ;
      quit;


      In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.



      Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.



      %let table_list=a b new_c d e f ;
      %let droplist=c1 c2 c3 c4;
      data new ;
      * Force new list of variables using NEW list of tables in order ;
      if 0 then set &table_list ;
      merge old(drop=&droplist) new_c ;
      by id;
      run;





      share|improve this answer















      If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.



      data want ;
      merge a b c d e f ;
      by id;
      run;


      You can put the list of tables into a macro variable.



      %let table_list=a b c2 d e f;
      data want ;
      merge &table_list;
      by id;
      run;


      You could have the list in a metadata table and generate the code from the metadata table.



      proc sql noprint;
      select memname
      into :table_list separated by ' '
      from my_metadata
      order by order_num
      ;
      quit;


      In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.



      Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.



      %let table_list=a b new_c d e f ;
      %let droplist=c1 c2 c3 c4;
      data new ;
      * Force new list of variables using NEW list of tables in order ;
      if 0 then set &table_list ;
      merge old(drop=&droplist) new_c ;
      by id;
      run;






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 20 '18 at 21:40

























      answered Nov 20 '18 at 21:25









      TomTom

      23.2k2718




      23.2k2718













      • about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?

        – user159193
        Nov 20 '18 at 22:17











      • A merge (or update) statement actually rebuilds the entire table. The modify statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge, update or SQL alter table. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.

        – Richard
        Nov 21 '18 at 16:26



















      • about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?

        – user159193
        Nov 20 '18 at 22:17











      • A merge (or update) statement actually rebuilds the entire table. The modify statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge, update or SQL alter table. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.

        – Richard
        Nov 21 '18 at 16:26

















      about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?

      – user159193
      Nov 20 '18 at 22:17





      about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?

      – user159193
      Nov 20 '18 at 22:17













      A merge (or update) statement actually rebuilds the entire table. The modify statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge, update or SQL alter table. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.

      – Richard
      Nov 21 '18 at 16:26





      A merge (or update) statement actually rebuilds the entire table. The modify statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge, update or SQL alter table. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.

      – Richard
      Nov 21 '18 at 16:26













      0














      This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.



      %macro merging(otds, dsnum, keyvar);
      %do i=1 %to &dsnum;
      data out&i;
      set in&i;
      if ^missing(&keyvar);
      run;
      %end;
      %do i=1 %to &dsnum;
      proc sort data=out&i nodupkey;
      by &keyvar;
      run;
      %end;
      data &otds;
      merge
      %do i = 1 %to &dsnum;
      out&i
      %end;
      ;
      by &keyvar;
      if &keyvar ne .;
      run;
      %mend;

      %merge(outds,10,Key_Or_Id);


      So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!






      share|improve this answer




























        0














        This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.



        %macro merging(otds, dsnum, keyvar);
        %do i=1 %to &dsnum;
        data out&i;
        set in&i;
        if ^missing(&keyvar);
        run;
        %end;
        %do i=1 %to &dsnum;
        proc sort data=out&i nodupkey;
        by &keyvar;
        run;
        %end;
        data &otds;
        merge
        %do i = 1 %to &dsnum;
        out&i
        %end;
        ;
        by &keyvar;
        if &keyvar ne .;
        run;
        %mend;

        %merge(outds,10,Key_Or_Id);


        So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!






        share|improve this answer


























          0












          0








          0







          This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.



          %macro merging(otds, dsnum, keyvar);
          %do i=1 %to &dsnum;
          data out&i;
          set in&i;
          if ^missing(&keyvar);
          run;
          %end;
          %do i=1 %to &dsnum;
          proc sort data=out&i nodupkey;
          by &keyvar;
          run;
          %end;
          data &otds;
          merge
          %do i = 1 %to &dsnum;
          out&i
          %end;
          ;
          by &keyvar;
          if &keyvar ne .;
          run;
          %mend;

          %merge(outds,10,Key_Or_Id);


          So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!






          share|improve this answer













          This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.



          %macro merging(otds, dsnum, keyvar);
          %do i=1 %to &dsnum;
          data out&i;
          set in&i;
          if ^missing(&keyvar);
          run;
          %end;
          %do i=1 %to &dsnum;
          proc sort data=out&i nodupkey;
          by &keyvar;
          run;
          %end;
          data &otds;
          merge
          %do i = 1 %to &dsnum;
          out&i
          %end;
          ;
          by &keyvar;
          if &keyvar ne .;
          run;
          %mend;

          %merge(outds,10,Key_Or_Id);


          So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 '18 at 21:31









          RhythmRhythm

          2586




          2586























              0














              @Tom answer for merge is best way to combine the source tables initially.



              The issue of an initial situation with some table C having columns, say, z1-z999 and then a subsequent C* with columns say z1-z250 and zed500-zed1729. Where should the original C columns not in C* (those z251-z999) lay in the column order of the combined table ?



              The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0 can be used with data sets in a merge in order to force the implicit construct of the program data vector (PDV) to have a desired column order.



              For example



              data history;
              merge
              a(obs=0)
              b(obs=0)
              c(obs=0)
              d(obs=0)
              e(obs=0)
              history a b c d e;
              ;
              by key;
              run;


              The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e. Any variables in history not in a-e will appear at the far right of the PDV in the order of as seen in the history being updated.



              Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.






              share|improve this answer




























                0














                @Tom answer for merge is best way to combine the source tables initially.



                The issue of an initial situation with some table C having columns, say, z1-z999 and then a subsequent C* with columns say z1-z250 and zed500-zed1729. Where should the original C columns not in C* (those z251-z999) lay in the column order of the combined table ?



                The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0 can be used with data sets in a merge in order to force the implicit construct of the program data vector (PDV) to have a desired column order.



                For example



                data history;
                merge
                a(obs=0)
                b(obs=0)
                c(obs=0)
                d(obs=0)
                e(obs=0)
                history a b c d e;
                ;
                by key;
                run;


                The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e. Any variables in history not in a-e will appear at the far right of the PDV in the order of as seen in the history being updated.



                Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.






                share|improve this answer


























                  0












                  0








                  0







                  @Tom answer for merge is best way to combine the source tables initially.



                  The issue of an initial situation with some table C having columns, say, z1-z999 and then a subsequent C* with columns say z1-z250 and zed500-zed1729. Where should the original C columns not in C* (those z251-z999) lay in the column order of the combined table ?



                  The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0 can be used with data sets in a merge in order to force the implicit construct of the program data vector (PDV) to have a desired column order.



                  For example



                  data history;
                  merge
                  a(obs=0)
                  b(obs=0)
                  c(obs=0)
                  d(obs=0)
                  e(obs=0)
                  history a b c d e;
                  ;
                  by key;
                  run;


                  The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e. Any variables in history not in a-e will appear at the far right of the PDV in the order of as seen in the history being updated.



                  Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.






                  share|improve this answer













                  @Tom answer for merge is best way to combine the source tables initially.



                  The issue of an initial situation with some table C having columns, say, z1-z999 and then a subsequent C* with columns say z1-z250 and zed500-zed1729. Where should the original C columns not in C* (those z251-z999) lay in the column order of the combined table ?



                  The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0 can be used with data sets in a merge in order to force the implicit construct of the program data vector (PDV) to have a desired column order.



                  For example



                  data history;
                  merge
                  a(obs=0)
                  b(obs=0)
                  c(obs=0)
                  d(obs=0)
                  e(obs=0)
                  history a b c d e;
                  ;
                  by key;
                  run;


                  The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e. Any variables in history not in a-e will appear at the far right of the PDV in the order of as seen in the history being updated.



                  Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 20 '18 at 21:43









                  RichardRichard

                  8,60721227




                  8,60721227






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401403%2fefficient-design-of-work-flow-to-merge-tables-together%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      MongoDB - Not Authorized To Execute Command

                      How to fix TextFormField cause rebuild widget in Flutter

                      in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith