ArangoDB - Slow query performance on cluster

I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.

FOR existing IN ExistingCollection

    LET matches = (

        FOR temp IN TempCollection

            FILTER temp._key == existing._key

            RETURN true

    )

    FILTER LENGTH(matches) == 0

    RETURN existing

When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.

However, when I run this in a cluster (single DB, single Coordinator), even when the DB and Coord are on the same physical host (different containers), I have to add a LIMIT 1000 after the initial FOR existing ... to keep it from timing out! Still, this limited result returns in almost 7 seconds!

Looking at the Execution Plan, I see that there are several REMOTE and GATHER statements after the LET matches ... SubqueryNode. From what I can gather, the problem stems from the separation of the data storage and memory structure used to filter this data.

My question: can this type of operation be done efficiently on a cluster?

I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution.

asked Dec 29 '18 at 1:56

kerry

33228

add a comment |

I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.

FOR existing IN ExistingCollection

    LET matches = (

        FOR temp IN TempCollection

            FILTER temp._key == existing._key

            RETURN true

    )

    FILTER LENGTH(matches) == 0

    RETURN existing

When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.

My question: can this type of operation be done efficiently on a cluster?

I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution.

asked Dec 29 '18 at 1:56

kerry

33228

add a comment |

I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.

FOR existing IN ExistingCollection

    LET matches = (

        FOR temp IN TempCollection

            FILTER temp._key == existing._key

            RETURN true

    )

    FILTER LENGTH(matches) == 0

    RETURN existing

When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.

My question: can this type of operation be done efficiently on a cluster?

I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution.

asked Dec 29 '18 at 1:56

kerry

33228

I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.

FOR existing IN ExistingCollection

    LET matches = (

        FOR temp IN TempCollection

            FILTER temp._key == existing._key

            RETURN true

    )

    FILTER LENGTH(matches) == 0

    RETURN existing

When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.

My question: can this type of operation be done efficiently on a cluster?

I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution.

arangodb

asked Dec 29 '18 at 1:56

kerry

33228

asked Dec 29 '18 at 1:56

kerry

33228

asked Dec 29 '18 at 1:56

kerry

33228

asked Dec 29 '18 at 1:56

kerry

33228

asked Dec 29 '18 at 1:56

kerry

33228

add a comment |

1 Answer
1

active

oldest

votes

Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.

You can avoid subqueries with the following query. It loads all document _key's into RAM - but that should be no problem with your rather small collections.

LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)

LET TempCollection = (FOR temp IN c1 RETURN temp._key)

RETURN MINUS(ExistingCollection, TempCollection)

answered Jan 2 at 10:59

Ingo

1496

Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!

– kerry
Jan 2 at 18:11

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53966083%2farangodb-slow-query-performance-on-cluster%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.

You can avoid subqueries with the following query. It loads all document _key's into RAM - but that should be no problem with your rather small collections.

LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)

LET TempCollection = (FOR temp IN c1 RETURN temp._key)

RETURN MINUS(ExistingCollection, TempCollection)

answered Jan 2 at 10:59

Ingo

1496

Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!

– kerry
Jan 2 at 18:11

add a comment |

Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.

You can avoid subqueries with the following query. It loads all document _key's into RAM - but that should be no problem with your rather small collections.

LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)

LET TempCollection = (FOR temp IN c1 RETURN temp._key)

RETURN MINUS(ExistingCollection, TempCollection)

answered Jan 2 at 10:59

Ingo

1496

Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!

– kerry
Jan 2 at 18:11

add a comment |

Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.

You can avoid subqueries with the following query. It loads all document _key's into RAM - but that should be no problem with your rather small collections.

LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)

LET TempCollection = (FOR temp IN c1 RETURN temp._key)

RETURN MINUS(ExistingCollection, TempCollection)

answered Jan 2 at 10:59

Ingo

1496

Your query executes one subquery for each document in the existing collection. Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown.

You can avoid subqueries with the following query. It loads all document _key's into RAM - but that should be no problem with your rather small collections.

LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)

LET TempCollection = (FOR temp IN c1 RETURN temp._key)

RETURN MINUS(ExistingCollection, TempCollection)

answered Jan 2 at 10:59

Ingo

1496

answered Jan 2 at 10:59

Ingo

1496

answered Jan 2 at 10:59

Ingo

1496

answered Jan 2 at 10:59

Ingo

1496

Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!

– kerry
Jan 2 at 18:11

add a comment |

Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!

– kerry
Jan 2 at 18:11

Excellent! I'm not sure how I missed the MINUS() function, but this does exactly what I need. Thanks!

– kerry
Jan 2 at 18:11

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu