Efficient design of work flow to merge tables together
I need to create a table that has about 20k rows, 10k columns. The table will be created by merging a dozens of source tables say A,B,C,..,F, and the columns must be ordered accordingly (columns in A first, then columns in B, etc). Table A sets up the denominator, and has all the observations needed in the final table. All tables have the same unique key for join/merge.
Each source table are independent from one another. And the number of columns/what columns in the source table to be merged cannot be known until the source table is ready. Or some times, a source tables needs to be updated again after it is already merged.
I have been doing like A left join B, left join C, left join D...., and I have the output of each merge saved incrementally.
In a situation when C needs to be updated to C*, and merged back again. Is there a way to efficiently do it, in stead of doing something like merge C*, merge D,...all over again for the tables after C*.
Or sometimes I have A, B, D...F ready. And I merged them according to the order. Then C is ready. I need to merge C to A,B, and repeat the merging for other tables all over again. Any way to rethink the creating of the final output, to avoid the redundancy??
Also, should index on the key be created in all of the source tables in this matter?
sql merge sas left-join
|
show 4 more comments
I need to create a table that has about 20k rows, 10k columns. The table will be created by merging a dozens of source tables say A,B,C,..,F, and the columns must be ordered accordingly (columns in A first, then columns in B, etc). Table A sets up the denominator, and has all the observations needed in the final table. All tables have the same unique key for join/merge.
Each source table are independent from one another. And the number of columns/what columns in the source table to be merged cannot be known until the source table is ready. Or some times, a source tables needs to be updated again after it is already merged.
I have been doing like A left join B, left join C, left join D...., and I have the output of each merge saved incrementally.
In a situation when C needs to be updated to C*, and merged back again. Is there a way to efficiently do it, in stead of doing something like merge C*, merge D,...all over again for the tables after C*.
Or sometimes I have A, B, D...F ready. And I merged them according to the order. Then C is ready. I need to merge C to A,B, and repeat the merging for other tables all over again. Any way to rethink the creating of the final output, to avoid the redundancy??
Also, should index on the key be created in all of the source tables in this matter?
sql merge sas left-join
What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.
– user159193
Nov 20 '18 at 21:08
I just found theMERGE
documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.
– JNevill
Nov 20 '18 at 21:09
What are the keys that you are merging on? If you have a unique key then just merge them all in one step.
– Tom
Nov 20 '18 at 21:12
@JNevill I feel like it is more of a design of work flow, than a coding question.
– user159193
Nov 20 '18 at 21:13
@Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.
– user159193
Nov 20 '18 at 21:16
|
show 4 more comments
I need to create a table that has about 20k rows, 10k columns. The table will be created by merging a dozens of source tables say A,B,C,..,F, and the columns must be ordered accordingly (columns in A first, then columns in B, etc). Table A sets up the denominator, and has all the observations needed in the final table. All tables have the same unique key for join/merge.
Each source table are independent from one another. And the number of columns/what columns in the source table to be merged cannot be known until the source table is ready. Or some times, a source tables needs to be updated again after it is already merged.
I have been doing like A left join B, left join C, left join D...., and I have the output of each merge saved incrementally.
In a situation when C needs to be updated to C*, and merged back again. Is there a way to efficiently do it, in stead of doing something like merge C*, merge D,...all over again for the tables after C*.
Or sometimes I have A, B, D...F ready. And I merged them according to the order. Then C is ready. I need to merge C to A,B, and repeat the merging for other tables all over again. Any way to rethink the creating of the final output, to avoid the redundancy??
Also, should index on the key be created in all of the source tables in this matter?
sql merge sas left-join
I need to create a table that has about 20k rows, 10k columns. The table will be created by merging a dozens of source tables say A,B,C,..,F, and the columns must be ordered accordingly (columns in A first, then columns in B, etc). Table A sets up the denominator, and has all the observations needed in the final table. All tables have the same unique key for join/merge.
Each source table are independent from one another. And the number of columns/what columns in the source table to be merged cannot be known until the source table is ready. Or some times, a source tables needs to be updated again after it is already merged.
I have been doing like A left join B, left join C, left join D...., and I have the output of each merge saved incrementally.
In a situation when C needs to be updated to C*, and merged back again. Is there a way to efficiently do it, in stead of doing something like merge C*, merge D,...all over again for the tables after C*.
Or sometimes I have A, B, D...F ready. And I merged them according to the order. Then C is ready. I need to merge C to A,B, and repeat the merging for other tables all over again. Any way to rethink the creating of the final output, to avoid the redundancy??
Also, should index on the key be created in all of the source tables in this matter?
sql merge sas left-join
sql merge sas left-join
edited Nov 20 '18 at 22:19
user159193
asked Nov 20 '18 at 20:57
user159193user159193
94
94
What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.
– user159193
Nov 20 '18 at 21:08
I just found theMERGE
documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.
– JNevill
Nov 20 '18 at 21:09
What are the keys that you are merging on? If you have a unique key then just merge them all in one step.
– Tom
Nov 20 '18 at 21:12
@JNevill I feel like it is more of a design of work flow, than a coding question.
– user159193
Nov 20 '18 at 21:13
@Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.
– user159193
Nov 20 '18 at 21:16
|
show 4 more comments
What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.
– user159193
Nov 20 '18 at 21:08
I just found theMERGE
documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.
– JNevill
Nov 20 '18 at 21:09
What are the keys that you are merging on? If you have a unique key then just merge them all in one step.
– Tom
Nov 20 '18 at 21:12
@JNevill I feel like it is more of a design of work flow, than a coding question.
– user159193
Nov 20 '18 at 21:13
@Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.
– user159193
Nov 20 '18 at 21:16
What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.
– user159193
Nov 20 '18 at 21:08
What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.
– user159193
Nov 20 '18 at 21:08
I just found the
MERGE
documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.– JNevill
Nov 20 '18 at 21:09
I just found the
MERGE
documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.– JNevill
Nov 20 '18 at 21:09
What are the keys that you are merging on? If you have a unique key then just merge them all in one step.
– Tom
Nov 20 '18 at 21:12
What are the keys that you are merging on? If you have a unique key then just merge them all in one step.
– Tom
Nov 20 '18 at 21:12
@JNevill I feel like it is more of a design of work flow, than a coding question.
– user159193
Nov 20 '18 at 21:13
@JNevill I feel like it is more of a design of work flow, than a coding question.
– user159193
Nov 20 '18 at 21:13
@Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.
– user159193
Nov 20 '18 at 21:16
@Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.
– user159193
Nov 20 '18 at 21:16
|
show 4 more comments
3 Answers
3
active
oldest
votes
If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.
data want ;
merge a b c d e f ;
by id;
run;
You can put the list of tables into a macro variable.
%let table_list=a b c2 d e f;
data want ;
merge &table_list;
by id;
run;
You could have the list in a metadata table and generate the code from the metadata table.
proc sql noprint;
select memname
into :table_list separated by ' '
from my_metadata
order by order_num
;
quit;
In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.
Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.
%let table_list=a b new_c d e f ;
%let droplist=c1 c2 c3 c4;
data new ;
* Force new list of variables using NEW list of tables in order ;
if 0 then set &table_list ;
merge old(drop=&droplist) new_c ;
by id;
run;
about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?
– user159193
Nov 20 '18 at 22:17
Amerge
(orupdate
) statement actually rebuilds the entire table. Themodify
statement can update rows, delete rows or add rows. To add columns to a table you will have to usemerge
,update
or SQLalter table
. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.
– Richard
Nov 21 '18 at 16:26
add a comment |
This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.
%macro merging(otds, dsnum, keyvar);
%do i=1 %to &dsnum;
data out&i;
set in&i;
if ^missing(&keyvar);
run;
%end;
%do i=1 %to &dsnum;
proc sort data=out&i nodupkey;
by &keyvar;
run;
%end;
data &otds;
merge
%do i = 1 %to &dsnum;
out&i
%end;
;
by &keyvar;
if &keyvar ne .;
run;
%mend;
%merge(outds,10,Key_Or_Id);
So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!
add a comment |
@Tom answer for merge is best way to combine the source tables initially.
The issue of an initial situation with some table C
having columns, say, z1-z999
and then a subsequent C*
with columns say z1-z250
and zed500-zed1729
. Where should the original C
columns not in C*
(those z251-z999
) lay in the column order of the combined table ?
The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0
can be used with data sets in a merge
in order to force the implicit construct of the program data vector (PDV) to have a desired column order.
For example
data history;
merge
a(obs=0)
b(obs=0)
c(obs=0)
d(obs=0)
e(obs=0)
history a b c d e;
;
by key;
run;
The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e
. Any variables in history
not in a-e
will appear at the far right of the PDV in the order of as seen in the history
being updated.
Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401403%2fefficient-design-of-work-flow-to-merge-tables-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.
data want ;
merge a b c d e f ;
by id;
run;
You can put the list of tables into a macro variable.
%let table_list=a b c2 d e f;
data want ;
merge &table_list;
by id;
run;
You could have the list in a metadata table and generate the code from the metadata table.
proc sql noprint;
select memname
into :table_list separated by ' '
from my_metadata
order by order_num
;
quit;
In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.
Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.
%let table_list=a b new_c d e f ;
%let droplist=c1 c2 c3 c4;
data new ;
* Force new list of variables using NEW list of tables in order ;
if 0 then set &table_list ;
merge old(drop=&droplist) new_c ;
by id;
run;
about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?
– user159193
Nov 20 '18 at 22:17
Amerge
(orupdate
) statement actually rebuilds the entire table. Themodify
statement can update rows, delete rows or add rows. To add columns to a table you will have to usemerge
,update
or SQLalter table
. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.
– Richard
Nov 21 '18 at 16:26
add a comment |
If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.
data want ;
merge a b c d e f ;
by id;
run;
You can put the list of tables into a macro variable.
%let table_list=a b c2 d e f;
data want ;
merge &table_list;
by id;
run;
You could have the list in a metadata table and generate the code from the metadata table.
proc sql noprint;
select memname
into :table_list separated by ' '
from my_metadata
order by order_num
;
quit;
In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.
Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.
%let table_list=a b new_c d e f ;
%let droplist=c1 c2 c3 c4;
data new ;
* Force new list of variables using NEW list of tables in order ;
if 0 then set &table_list ;
merge old(drop=&droplist) new_c ;
by id;
run;
about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?
– user159193
Nov 20 '18 at 22:17
Amerge
(orupdate
) statement actually rebuilds the entire table. Themodify
statement can update rows, delete rows or add rows. To add columns to a table you will have to usemerge
,update
or SQLalter table
. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.
– Richard
Nov 21 '18 at 16:26
add a comment |
If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.
data want ;
merge a b c d e f ;
by id;
run;
You can put the list of tables into a macro variable.
%let table_list=a b c2 d e f;
data want ;
merge &table_list;
by id;
run;
You could have the list in a metadata table and generate the code from the metadata table.
proc sql noprint;
select memname
into :table_list separated by ' '
from my_metadata
order by order_num
;
quit;
In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.
Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.
%let table_list=a b new_c d e f ;
%let droplist=c1 c2 c3 c4;
data new ;
* Force new list of variables using NEW list of tables in order ;
if 0 then set &table_list ;
merge old(drop=&droplist) new_c ;
by id;
run;
If the ids are unique (that is you are NOT doing 1-to-N or N-to-M matching) then just merge all of the tables at once.
data want ;
merge a b c d e f ;
by id;
run;
You can put the list of tables into a macro variable.
%let table_list=a b c2 d e f;
data want ;
merge &table_list;
by id;
run;
You could have the list in a metadata table and generate the code from the metadata table.
proc sql noprint;
select memname
into :table_list separated by ' '
from my_metadata
order by order_num
;
quit;
In terms of updating just one of the tables here is a method that should set variable names in the proper order, but without having to re-read all of the data from all of the tables.
Make a list of columns you want to drop from the current "master" table that came from OLD_C in macro variable let's call DROPLIST. To make sure the columns are in order you still need the list of tables, but we don't need to actually re-read all of them, we just need to be able to open them so that SAS can find the variable names in order. Note this list needs to reference the new version of C so that it sees the variables in that version of C.
%let table_list=a b new_c d e f ;
%let droplist=c1 c2 c3 c4;
data new ;
* Force new list of variables using NEW list of tables in order ;
if 0 then set &table_list ;
merge old(drop=&droplist) new_c ;
by id;
run;
edited Nov 20 '18 at 21:40
answered Nov 20 '18 at 21:25
TomTom
23.2k2718
23.2k2718
about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?
– user159193
Nov 20 '18 at 22:17
Amerge
(orupdate
) statement actually rebuilds the entire table. Themodify
statement can update rows, delete rows or add rows. To add columns to a table you will have to usemerge
,update
or SQLalter table
. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.
– Richard
Nov 21 '18 at 16:26
add a comment |
about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?
– user159193
Nov 20 '18 at 22:17
Amerge
(orupdate
) statement actually rebuilds the entire table. Themodify
statement can update rows, delete rows or add rows. To add columns to a table you will have to usemerge
,update
or SQLalter table
. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.
– Richard
Nov 21 '18 at 16:26
about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?
– user159193
Nov 20 '18 at 22:17
about updating just one of the tables, when the old is very huge, compared to new_c in terms of the number of columns, similar in the number of rows. the merge will be efficient? Also should I create an index on id in all the source tables. Would this speed things up more?
– user159193
Nov 20 '18 at 22:17
A
merge
(or update
) statement actually rebuilds the entire table. The modify
statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge
, update
or SQL alter table
. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.– Richard
Nov 21 '18 at 16:26
A
merge
(or update
) statement actually rebuilds the entire table. The modify
statement can update rows, delete rows or add rows. To add columns to a table you will have to use merge
, update
or SQL alter table
. Adding an index to a table that is going to be rebuilt only makes sense if you are querying against the table downstream -- otherwise I would call it wheel spinning, expending cpu and disk resources to construct an index for indexing-sake.– Richard
Nov 21 '18 at 16:26
add a comment |
This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.
%macro merging(otds, dsnum, keyvar);
%do i=1 %to &dsnum;
data out&i;
set in&i;
if ^missing(&keyvar);
run;
%end;
%do i=1 %to &dsnum;
proc sort data=out&i nodupkey;
by &keyvar;
run;
%end;
data &otds;
merge
%do i = 1 %to &dsnum;
out&i
%end;
;
by &keyvar;
if &keyvar ne .;
run;
%mend;
%merge(outds,10,Key_Or_Id);
So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!
add a comment |
This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.
%macro merging(otds, dsnum, keyvar);
%do i=1 %to &dsnum;
data out&i;
set in&i;
if ^missing(&keyvar);
run;
%end;
%do i=1 %to &dsnum;
proc sort data=out&i nodupkey;
by &keyvar;
run;
%end;
data &otds;
merge
%do i = 1 %to &dsnum;
out&i
%end;
;
by &keyvar;
if &keyvar ne .;
run;
%mend;
%merge(outds,10,Key_Or_Id);
So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!
add a comment |
This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.
%macro merging(otds, dsnum, keyvar);
%do i=1 %to &dsnum;
data out&i;
set in&i;
if ^missing(&keyvar);
run;
%end;
%do i=1 %to &dsnum;
proc sort data=out&i nodupkey;
by &keyvar;
run;
%end;
data &otds;
merge
%do i = 1 %to &dsnum;
out&i
%end;
;
by &keyvar;
if &keyvar ne .;
run;
%mend;
%merge(outds,10,Key_Or_Id);
So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!
This is what I do, when I pull data from sql database. I extract data from different tables and create separate sas datasets mostly in a sequential order, so that it is easy to merge them separately through macros. Then I merge them based on common keys.
%macro merging(otds, dsnum, keyvar);
%do i=1 %to &dsnum;
data out&i;
set in&i;
if ^missing(&keyvar);
run;
%end;
%do i=1 %to &dsnum;
proc sort data=out&i nodupkey;
by &keyvar;
run;
%end;
data &otds;
merge
%do i = 1 %to &dsnum;
out&i
%end;
;
by &keyvar;
if &keyvar ne .;
run;
%mend;
%merge(outds,10,Key_Or_Id);
So in the above code, I have 10 tables in1-in10 and these are merged with common keys. If any of these tables are updated later then we can again merge using same approach.. hope this helps!
answered Nov 20 '18 at 21:31
RhythmRhythm
2586
2586
add a comment |
add a comment |
@Tom answer for merge is best way to combine the source tables initially.
The issue of an initial situation with some table C
having columns, say, z1-z999
and then a subsequent C*
with columns say z1-z250
and zed500-zed1729
. Where should the original C
columns not in C*
(those z251-z999
) lay in the column order of the combined table ?
The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0
can be used with data sets in a merge
in order to force the implicit construct of the program data vector (PDV) to have a desired column order.
For example
data history;
merge
a(obs=0)
b(obs=0)
c(obs=0)
d(obs=0)
e(obs=0)
history a b c d e;
;
by key;
run;
The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e
. Any variables in history
not in a-e
will appear at the far right of the PDV in the order of as seen in the history
being updated.
Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.
add a comment |
@Tom answer for merge is best way to combine the source tables initially.
The issue of an initial situation with some table C
having columns, say, z1-z999
and then a subsequent C*
with columns say z1-z250
and zed500-zed1729
. Where should the original C
columns not in C*
(those z251-z999
) lay in the column order of the combined table ?
The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0
can be used with data sets in a merge
in order to force the implicit construct of the program data vector (PDV) to have a desired column order.
For example
data history;
merge
a(obs=0)
b(obs=0)
c(obs=0)
d(obs=0)
e(obs=0)
history a b c d e;
;
by key;
run;
The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e
. Any variables in history
not in a-e
will appear at the far right of the PDV in the order of as seen in the history
being updated.
Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.
add a comment |
@Tom answer for merge is best way to combine the source tables initially.
The issue of an initial situation with some table C
having columns, say, z1-z999
and then a subsequent C*
with columns say z1-z250
and zed500-zed1729
. Where should the original C
columns not in C*
(those z251-z999
) lay in the column order of the combined table ?
The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0
can be used with data sets in a merge
in order to force the implicit construct of the program data vector (PDV) to have a desired column order.
For example
data history;
merge
a(obs=0)
b(obs=0)
c(obs=0)
d(obs=0)
e(obs=0)
history a b c d e;
;
by key;
run;
The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e
. Any variables in history
not in a-e
will appear at the far right of the PDV in the order of as seen in the history
being updated.
Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.
@Tom answer for merge is best way to combine the source tables initially.
The issue of an initial situation with some table C
having columns, say, z1-z999
and then a subsequent C*
with columns say z1-z250
and zed500-zed1729
. Where should the original C
columns not in C*
(those z251-z999
) lay in the column order of the combined table ?
The case of a maintained history, merged in with new data of similar but variant structure can be somewhat managed if the new data always encompasses a desired domain of the history. The data set option OBS=0
can be used with data sets in a merge
in order to force the implicit construct of the program data vector (PDV) to have a desired column order.
For example
data history;
merge
a(obs=0)
b(obs=0)
c(obs=0)
d(obs=0)
e(obs=0)
history a b c d e;
;
by key;
run;
The variables in the PDV will be ordered in the same manner as the order of appearance in data sets a-e
. Any variables in history
not in a-e
will appear at the far right of the PDV in the order of as seen in the history
being updated.
Once you understand how the PDV construction can be manipulated, you can develop construction abstractions or rules and enforce the desired source code generation through macro programming.
answered Nov 20 '18 at 21:43
RichardRichard
8,60721227
8,60721227
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401403%2fefficient-design-of-work-flow-to-merge-tables-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What I want to do in SQL context is left join. Table A includes all the observations in the final table. The rest of the tables can have less obs.
– user159193
Nov 20 '18 at 21:08
I just found the
MERGE
documentation in SAS. I see now that it is very much like a LEFT OUTER JOIN. That is interesting. I'll delete my comment since this is definitely a SAS question that needs a SAS expert. Not much to be done on the SQL side here.– JNevill
Nov 20 '18 at 21:09
What are the keys that you are merging on? If you have a unique key then just merge them all in one step.
– Tom
Nov 20 '18 at 21:12
@JNevill I feel like it is more of a design of work flow, than a coding question.
– user159193
Nov 20 '18 at 21:13
@Tom But I need to have the columns in a specific order, and after the merge, some tables may be updated. Yes. all same unique key.
– user159193
Nov 20 '18 at 21:16