Which pyspark methods should I use for this table join?
up vote
-2
down vote
favorite
Article
|------|-----------|-------|
| ID | PARENT_ID | _data |
|------|-----------|-------|
| 12 | 34 | mom |
|------|-----------|-------|
| 5 | 34 | dad |
|------|-----------|-------|
Article_Meta
|-------|---------|------------|
| ID | USER_ID | COMMENT_ID |
|-------|---------|------------|
| 12 | [3] | [ 7, 8] |
|-------|---------|------------|
| 34 | [6] | [ 1, 2] |
|-------|---------|------------|
Result: Article + Article_Metadata
ID 12 has User ID 3 and 6 because
ID = Article_Meta#12 has User_ID 3 AND
ParentID = Article_Meta#34 has USER_ID 6
|------|-----------|-------|---------|------------|
| ID | PARENT_ID | _data | USER_ID | COMMENT_ID |
|------|-----------|-------|---------|------------|
| 12 | 34 | mom | [ 3, 6] |[7, 8, 1, 2]|
|------|-----------|-------|---------|------------|
| 5 | 34 | dad | [6] | [ 1, 2] |
|------|-----------|-------|---------|------------|
I have a table Article
and I would like to join it with Article_Meta
.
As you can see Article
has an ID
and a ParentID
. Both this columns belong to the Article_Meta
ID
column.
How should I join Article with Article_Meta so that the USER_ID and COMMENT_ID are the combined result of the Article_PARENT_ID AND Article_ID in the MetaData Table? (Wich pyspark methods should I use?)
More Explanation:
In the Result Table Article #12
has USER_ID [3, 6] that's because Article #12
belongs to Article_Meta #12
and #34
(Parent ID)
apache-spark pyspark apache-spark-sql
add a comment |
up vote
-2
down vote
favorite
Article
|------|-----------|-------|
| ID | PARENT_ID | _data |
|------|-----------|-------|
| 12 | 34 | mom |
|------|-----------|-------|
| 5 | 34 | dad |
|------|-----------|-------|
Article_Meta
|-------|---------|------------|
| ID | USER_ID | COMMENT_ID |
|-------|---------|------------|
| 12 | [3] | [ 7, 8] |
|-------|---------|------------|
| 34 | [6] | [ 1, 2] |
|-------|---------|------------|
Result: Article + Article_Metadata
ID 12 has User ID 3 and 6 because
ID = Article_Meta#12 has User_ID 3 AND
ParentID = Article_Meta#34 has USER_ID 6
|------|-----------|-------|---------|------------|
| ID | PARENT_ID | _data | USER_ID | COMMENT_ID |
|------|-----------|-------|---------|------------|
| 12 | 34 | mom | [ 3, 6] |[7, 8, 1, 2]|
|------|-----------|-------|---------|------------|
| 5 | 34 | dad | [6] | [ 1, 2] |
|------|-----------|-------|---------|------------|
I have a table Article
and I would like to join it with Article_Meta
.
As you can see Article
has an ID
and a ParentID
. Both this columns belong to the Article_Meta
ID
column.
How should I join Article with Article_Meta so that the USER_ID and COMMENT_ID are the combined result of the Article_PARENT_ID AND Article_ID in the MetaData Table? (Wich pyspark methods should I use?)
More Explanation:
In the Result Table Article #12
has USER_ID [3, 6] that's because Article #12
belongs to Article_Meta #12
and #34
(Parent ID)
apache-spark pyspark apache-spark-sql
add a comment |
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
Article
|------|-----------|-------|
| ID | PARENT_ID | _data |
|------|-----------|-------|
| 12 | 34 | mom |
|------|-----------|-------|
| 5 | 34 | dad |
|------|-----------|-------|
Article_Meta
|-------|---------|------------|
| ID | USER_ID | COMMENT_ID |
|-------|---------|------------|
| 12 | [3] | [ 7, 8] |
|-------|---------|------------|
| 34 | [6] | [ 1, 2] |
|-------|---------|------------|
Result: Article + Article_Metadata
ID 12 has User ID 3 and 6 because
ID = Article_Meta#12 has User_ID 3 AND
ParentID = Article_Meta#34 has USER_ID 6
|------|-----------|-------|---------|------------|
| ID | PARENT_ID | _data | USER_ID | COMMENT_ID |
|------|-----------|-------|---------|------------|
| 12 | 34 | mom | [ 3, 6] |[7, 8, 1, 2]|
|------|-----------|-------|---------|------------|
| 5 | 34 | dad | [6] | [ 1, 2] |
|------|-----------|-------|---------|------------|
I have a table Article
and I would like to join it with Article_Meta
.
As you can see Article
has an ID
and a ParentID
. Both this columns belong to the Article_Meta
ID
column.
How should I join Article with Article_Meta so that the USER_ID and COMMENT_ID are the combined result of the Article_PARENT_ID AND Article_ID in the MetaData Table? (Wich pyspark methods should I use?)
More Explanation:
In the Result Table Article #12
has USER_ID [3, 6] that's because Article #12
belongs to Article_Meta #12
and #34
(Parent ID)
apache-spark pyspark apache-spark-sql
Article
|------|-----------|-------|
| ID | PARENT_ID | _data |
|------|-----------|-------|
| 12 | 34 | mom |
|------|-----------|-------|
| 5 | 34 | dad |
|------|-----------|-------|
Article_Meta
|-------|---------|------------|
| ID | USER_ID | COMMENT_ID |
|-------|---------|------------|
| 12 | [3] | [ 7, 8] |
|-------|---------|------------|
| 34 | [6] | [ 1, 2] |
|-------|---------|------------|
Result: Article + Article_Metadata
ID 12 has User ID 3 and 6 because
ID = Article_Meta#12 has User_ID 3 AND
ParentID = Article_Meta#34 has USER_ID 6
|------|-----------|-------|---------|------------|
| ID | PARENT_ID | _data | USER_ID | COMMENT_ID |
|------|-----------|-------|---------|------------|
| 12 | 34 | mom | [ 3, 6] |[7, 8, 1, 2]|
|------|-----------|-------|---------|------------|
| 5 | 34 | dad | [6] | [ 1, 2] |
|------|-----------|-------|---------|------------|
I have a table Article
and I would like to join it with Article_Meta
.
As you can see Article
has an ID
and a ParentID
. Both this columns belong to the Article_Meta
ID
column.
How should I join Article with Article_Meta so that the USER_ID and COMMENT_ID are the combined result of the Article_PARENT_ID AND Article_ID in the MetaData Table? (Wich pyspark methods should I use?)
More Explanation:
In the Result Table Article #12
has USER_ID [3, 6] that's because Article #12
belongs to Article_Meta #12
and #34
(Parent ID)
apache-spark pyspark apache-spark-sql
apache-spark pyspark apache-spark-sql
asked yesterday
John Smith
2,58673767
2,58673767
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372706%2fwhich-pyspark-methods-should-i-use-for-this-table-join%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown