ArangoDB - Slow query performance on cluster
I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.
FOR existing IN ExistingCollection
LET matches = (
FOR temp IN TempCollection
FILTER temp._key == existing._key
RETURN true
)
FILTER LENGTH(matches) == 0
RETURN existing
When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.
However, when I run this in a cluster (single DB, single Coordinator), even when the DB and Coord are on the same physical host (different containers), I have to add a LIMIT 1000
after the initial FOR existing ...
to keep it from timing out! Still, this limited result returns in almost 7 seconds!
Looking at the Execution Plan, I see that there are several REMOTE
and GATHER
statements after the LET matches ...
SubqueryNode. From what I can gather, the problem stems from the separation of the data storage and memory structure used to filter this data.
My question: can this type of operation be done efficiently on a cluster?
I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution.
arangodb
add a comment |
I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.
FOR existing IN ExistingCollection
LET matches = (
FOR temp IN TempCollection
FILTER temp._key == existing._key
RETURN true
)
FILTER LENGTH(matches) == 0
RETURN existing
When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.
However, when I run this in a cluster (single DB, single Coordinator), even when the DB and Coord are on the same physical host (different containers), I have to add a LIMIT 1000
after the initial FOR existing ...
to keep it from timing out! Still, this limited result returns in almost 7 seconds!
Looking at the Execution Plan, I see that there are several REMOTE
and GATHER
statements after the LET matches ...
SubqueryNode. From what I can gather, the problem stems from the separation of the data storage and memory structure used to filter this data.
My question: can this type of operation be done efficiently on a cluster?
I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution.
arangodb
add a comment |
I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.
FOR existing IN ExistingCollection
LET matches = (
FOR temp IN TempCollection
FILTER temp._key == existing._key
RETURN true
)
FILTER LENGTH(matches) == 0
RETURN existing
When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.
However, when I run this in a cluster (single DB, single Coordinator), even when the DB and Coord are on the same physical host (different containers), I have to add a LIMIT 1000
after the initial FOR existing ...
to keep it from timing out! Still, this limited result returns in almost 7 seconds!
Looking at the Execution Plan, I see that there are several REMOTE
and GATHER
statements after the LET matches ...
SubqueryNode. From what I can gather, the problem stems from the separation of the data storage and memory structure used to filter this data.
My question: can this type of operation be done efficiently on a cluster?
I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution.
arangodb
I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.
FOR existing IN ExistingCollection
LET matches = (
FOR temp IN TempCollection
FILTER temp._key == existing._key
RETURN true
)
FILTER LENGTH(matches) == 0
RETURN existing
When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.
However, when I run this in a cluster (single DB, single Coordinator), even when the DB and Coord are on the same physical host (different containers), I have to add a LIMIT 1000
after the initial FOR existing ...
to keep it from timing out! Still, this limited result returns in almost 7 seconds!
Looking at the Execution Plan, I see that there are several REMOTE
and GATHER
statements after the LET matches ...
SubqueryNode. From what I can gather, the problem stems from the separation of the data storage and memory structure used to filter this data.
My question: can this type of operation be done efficiently on a cluster?
I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution.
arangodb
arangodb
asked Dec 29 '18 at 1:56


kerrykerry
33228
33228
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.
You can avoid subqueries with the following query. It loads all document _key
's into RAM - but that should be no problem with your rather small collections.
LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)
LET TempCollection = (FOR temp IN c1 RETURN temp._key)
RETURN MINUS(ExistingCollection, TempCollection)
Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!
– kerry
Jan 2 at 18:11
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53966083%2farangodb-slow-query-performance-on-cluster%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.
You can avoid subqueries with the following query. It loads all document _key
's into RAM - but that should be no problem with your rather small collections.
LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)
LET TempCollection = (FOR temp IN c1 RETURN temp._key)
RETURN MINUS(ExistingCollection, TempCollection)
Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!
– kerry
Jan 2 at 18:11
add a comment |
Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.
You can avoid subqueries with the following query. It loads all document _key
's into RAM - but that should be no problem with your rather small collections.
LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)
LET TempCollection = (FOR temp IN c1 RETURN temp._key)
RETURN MINUS(ExistingCollection, TempCollection)
Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!
– kerry
Jan 2 at 18:11
add a comment |
Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.
You can avoid subqueries with the following query. It loads all document _key
's into RAM - but that should be no problem with your rather small collections.
LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)
LET TempCollection = (FOR temp IN c1 RETURN temp._key)
RETURN MINUS(ExistingCollection, TempCollection)
Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.
You can avoid subqueries with the following query. It loads all document _key
's into RAM - but that should be no problem with your rather small collections.
LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)
LET TempCollection = (FOR temp IN c1 RETURN temp._key)
RETURN MINUS(ExistingCollection, TempCollection)
answered Jan 2 at 10:59


IngoIngo
1496
1496
Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!
– kerry
Jan 2 at 18:11
add a comment |
Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!
– kerry
Jan 2 at 18:11
Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!
– kerry
Jan 2 at 18:11
Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!
– kerry
Jan 2 at 18:11
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53966083%2farangodb-slow-query-performance-on-cluster%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown