Which pyspark methods should I use for this table join?

up vote
-2
down vote

favorite

Article

|------|-----------|-------|

|  ID  | PARENT_ID | _data |

|------|-----------|-------|

|  12  |    34     |  mom  |

|------|-----------|-------|

|  5   |    34     |  dad  |

|------|-----------|-------|





Article_Meta

|-------|---------|------------|

|  ID   | USER_ID | COMMENT_ID |

|-------|---------|------------|

|  12   |  [3]    |  [ 7, 8]   |

|-------|---------|------------|

|  34   |  [6]    |  [ 1, 2]   |

|-------|---------|------------|



Result: Article + Article_Metadata

        ID 12 has User ID 3 and 6 because 

        ID = Article_Meta#12 has User_ID 3 AND

        ParentID = Article_Meta#34 has USER_ID 6         



|------|-----------|-------|---------|------------|

|  ID  | PARENT_ID | _data | USER_ID | COMMENT_ID |

|------|-----------|-------|---------|------------|

|  12  |    34     |  mom  | [ 3, 6] |[7, 8, 1, 2]|

|------|-----------|-------|---------|------------|

|  5   |    34     |  dad  |  [6]    |  [ 1, 2]   |

|------|-----------|-------|---------|------------|

I have a table Article and I would like to join it with Article_Meta.

As you can see Article has an ID and a ParentID. Both this columns belong to the Article_Meta ID column.

How should I join Article with Article_Meta so that the USER_ID and COMMENT_ID are the combined result of the Article_PARENT_ID AND Article_ID in the MetaData Table? (Wich pyspark methods should I use?)

More Explanation:
In the Result Table Article #12 has USER_ID [3, 6] that's because Article #12 belongs to Article_Meta #12 and #34 (Parent ID)

asked yesterday

John Smith

2,58673767

add a comment |

up vote
-2
down vote

favorite

Article

|------|-----------|-------|

|  ID  | PARENT_ID | _data |

|------|-----------|-------|

|  12  |    34     |  mom  |

|------|-----------|-------|

|  5   |    34     |  dad  |

|------|-----------|-------|





Article_Meta

|-------|---------|------------|

|  ID   | USER_ID | COMMENT_ID |

|-------|---------|------------|

|  12   |  [3]    |  [ 7, 8]   |

|-------|---------|------------|

|  34   |  [6]    |  [ 1, 2]   |

|-------|---------|------------|



Result: Article + Article_Metadata

        ID 12 has User ID 3 and 6 because 

        ID = Article_Meta#12 has User_ID 3 AND

        ParentID = Article_Meta#34 has USER_ID 6         



|------|-----------|-------|---------|------------|

|  ID  | PARENT_ID | _data | USER_ID | COMMENT_ID |

|------|-----------|-------|---------|------------|

|  12  |    34     |  mom  | [ 3, 6] |[7, 8, 1, 2]|

|------|-----------|-------|---------|------------|

|  5   |    34     |  dad  |  [6]    |  [ 1, 2]   |

|------|-----------|-------|---------|------------|

I have a table Article and I would like to join it with Article_Meta.

As you can see Article has an ID and a ParentID. Both this columns belong to the Article_Meta ID column.

More Explanation:
In the Result Table Article #12 has USER_ID [3, 6] that's because Article #12 belongs to Article_Meta #12 and #34 (Parent ID)

asked yesterday

John Smith

2,58673767

add a comment |

up vote
-2
down vote

favorite

Article

|------|-----------|-------|

|  ID  | PARENT_ID | _data |

|------|-----------|-------|

|  12  |    34     |  mom  |

|------|-----------|-------|

|  5   |    34     |  dad  |

|------|-----------|-------|





Article_Meta

|-------|---------|------------|

|  ID   | USER_ID | COMMENT_ID |

|-------|---------|------------|

|  12   |  [3]    |  [ 7, 8]   |

|-------|---------|------------|

|  34   |  [6]    |  [ 1, 2]   |

|-------|---------|------------|



Result: Article + Article_Metadata

        ID 12 has User ID 3 and 6 because 

        ID = Article_Meta#12 has User_ID 3 AND

        ParentID = Article_Meta#34 has USER_ID 6         



|------|-----------|-------|---------|------------|

|  ID  | PARENT_ID | _data | USER_ID | COMMENT_ID |

|------|-----------|-------|---------|------------|

|  12  |    34     |  mom  | [ 3, 6] |[7, 8, 1, 2]|

|------|-----------|-------|---------|------------|

|  5   |    34     |  dad  |  [6]    |  [ 1, 2]   |

|------|-----------|-------|---------|------------|

I have a table Article and I would like to join it with Article_Meta.

As you can see Article has an ID and a ParentID. Both this columns belong to the Article_Meta ID column.

More Explanation:
In the Result Table Article #12 has USER_ID [3, 6] that's because Article #12 belongs to Article_Meta #12 and #34 (Parent ID)

asked yesterday

John Smith

2,58673767

Article

|------|-----------|-------|

|  ID  | PARENT_ID | _data |

|------|-----------|-------|

|  12  |    34     |  mom  |

|------|-----------|-------|

|  5   |    34     |  dad  |

|------|-----------|-------|





Article_Meta

|-------|---------|------------|

|  ID   | USER_ID | COMMENT_ID |

|-------|---------|------------|

|  12   |  [3]    |  [ 7, 8]   |

|-------|---------|------------|

|  34   |  [6]    |  [ 1, 2]   |

|-------|---------|------------|



Result: Article + Article_Metadata

        ID 12 has User ID 3 and 6 because 

        ID = Article_Meta#12 has User_ID 3 AND

        ParentID = Article_Meta#34 has USER_ID 6         



|------|-----------|-------|---------|------------|

|  ID  | PARENT_ID | _data | USER_ID | COMMENT_ID |

|------|-----------|-------|---------|------------|

|  12  |    34     |  mom  | [ 3, 6] |[7, 8, 1, 2]|

|------|-----------|-------|---------|------------|

|  5   |    34     |  dad  |  [6]    |  [ 1, 2]   |

|------|-----------|-------|---------|------------|

I have a table Article and I would like to join it with Article_Meta.

As you can see Article has an ID and a ParentID. Both this columns belong to the Article_Meta ID column.

More Explanation:
In the Result Table Article #12 has USER_ID [3, 6] that's because Article #12 belongs to Article_Meta #12 and #34 (Parent ID)

apache-spark pyspark apache-spark-sql

asked yesterday

John Smith

2,58673767

asked yesterday

John Smith

2,58673767

asked yesterday

John Smith

2,58673767

asked yesterday

John Smith

2,58673767

asked yesterday

John Smith

2,58673767

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372706%2fwhich-pyspark-methods-should-i-use-for-this-table-join%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu