Finding Objects Most Similar to Other Objects as a “Linear Combination”
$begingroup$
I have a small collection of objects, ${o_i}_{i=1}^{20},$ that have several properties (all quantitative, or can be made to be quantitative) that differ among themselves. So $o_1$ has, say, $10$ properties, ${p_{1j}}_{j=1}^{10},$ and something similar for the other $19$ objects. I have $5$ more objects that are of the same general kind as the original $20,$ but have yet different properties. Unfortunately, these properties are not even the same categories among all $25$ total objects. Some of the properties are common to all the objects, and some are not.
There is one important property, I'll call "usage", that is common to all $25$ objects. We'll let the usage of object $i$ be written as $u_i$. Note that $u_ige 0;forall,i.$
What I would like to do is find a way to write each of the $5$ new usages as a linear combination of the $20$ original object usages, with objects among the $20$ that are more similar contributing more.
For example, let's take one of the new objects, $o_{21}$. I would like to write
$$u_{21}=sum_{i=1}^{20}a_i u_i,qquad 0le a_ile 1;forall,i,qquad text{s.t.};sum_{i=1}^{20}a_i=1.$$
Now suppose that $o_7$ was the most similar to $o_{21}:$ then $a_7$ should be larger than all the other $a_{i}$'s. It is not important that this "linear combination" allow me to predict the properties of $o_{21}.$ It's only important that $sum_{i=1}^{20}a_i=1$ and that objects more similar to $o_{21}$ have correspondingly larger $a_i$'s.
The target variables here are, for each of the $5$ new objects, the $a_i$'s that satisfy the above criteria. The $u_i$ are known for the $20$ original objects, and unknown for the $5$ new objects, so the $u_i$ are also target variables for the new objects. However, as knowing the $a_i$ will determine the $u_i$ for the new objects, the immediate goal for this question is to find the $a_i$'s.
Now it's not too difficult to find out which of the $20$ original objects are "closest" to, say, $o_{21}:$ normalize the common quantitative categories, drop the ones not common to all, and use the Euclidean distance norm. (I have no a priori notion of which properties might be more important than others, so I'd rather treat them all on an equal footing.) If I divided each of these distances by the sum total of all the distances, I would get the opposite of what I want: the "closer" objects would have correspondingly smaller $a_i$'s.
So it comes down to this question: what sort of function would be good to switch this around, so that the closer objects get larger $a_i$'s? Subtract each distance from the maximum distance and then normalize?
norm data-analysis
$endgroup$
|
show 7 more comments
$begingroup$
I have a small collection of objects, ${o_i}_{i=1}^{20},$ that have several properties (all quantitative, or can be made to be quantitative) that differ among themselves. So $o_1$ has, say, $10$ properties, ${p_{1j}}_{j=1}^{10},$ and something similar for the other $19$ objects. I have $5$ more objects that are of the same general kind as the original $20,$ but have yet different properties. Unfortunately, these properties are not even the same categories among all $25$ total objects. Some of the properties are common to all the objects, and some are not.
There is one important property, I'll call "usage", that is common to all $25$ objects. We'll let the usage of object $i$ be written as $u_i$. Note that $u_ige 0;forall,i.$
What I would like to do is find a way to write each of the $5$ new usages as a linear combination of the $20$ original object usages, with objects among the $20$ that are more similar contributing more.
For example, let's take one of the new objects, $o_{21}$. I would like to write
$$u_{21}=sum_{i=1}^{20}a_i u_i,qquad 0le a_ile 1;forall,i,qquad text{s.t.};sum_{i=1}^{20}a_i=1.$$
Now suppose that $o_7$ was the most similar to $o_{21}:$ then $a_7$ should be larger than all the other $a_{i}$'s. It is not important that this "linear combination" allow me to predict the properties of $o_{21}.$ It's only important that $sum_{i=1}^{20}a_i=1$ and that objects more similar to $o_{21}$ have correspondingly larger $a_i$'s.
The target variables here are, for each of the $5$ new objects, the $a_i$'s that satisfy the above criteria. The $u_i$ are known for the $20$ original objects, and unknown for the $5$ new objects, so the $u_i$ are also target variables for the new objects. However, as knowing the $a_i$ will determine the $u_i$ for the new objects, the immediate goal for this question is to find the $a_i$'s.
Now it's not too difficult to find out which of the $20$ original objects are "closest" to, say, $o_{21}:$ normalize the common quantitative categories, drop the ones not common to all, and use the Euclidean distance norm. (I have no a priori notion of which properties might be more important than others, so I'd rather treat them all on an equal footing.) If I divided each of these distances by the sum total of all the distances, I would get the opposite of what I want: the "closer" objects would have correspondingly smaller $a_i$'s.
So it comes down to this question: what sort of function would be good to switch this around, so that the closer objects get larger $a_i$'s? Subtract each distance from the maximum distance and then normalize?
norm data-analysis
$endgroup$
$begingroup$
The first problem that I can see here is that there's no reason why your original objects should span the vector space of properties.
$endgroup$
– user3482749
Jan 19 at 20:39
$begingroup$
That's unimportant in my application. This application is kind of like Natural Language Processing, e.g., if I'm wanting to find a word most similar to another in meaning. In a situation like that, it's not important if the set of words you have to work with spans the space of all meanings. You just want the closest one. In this case, I don't want to lose the information that the objects farther away give me, hence the linear combination. I could just go with the closest, but in this application, I think the linear combination gives me a bit more finesse.
$endgroup$
– Adrian Keister
Jan 19 at 20:42
$begingroup$
It isn't, though: the very first thing that you do is write your new object as a linear combination of the old ones. If the old objects don't span, how can you know that's even possible?
$endgroup$
– user3482749
Jan 19 at 20:43
$begingroup$
You're quite right; I already know it's not possible in any exact sense. The "linear combination" is only there as a way of expressing the idea that $o_{21}$ is like all these $20$ objects, but it's more like $o_7$ than $o_9,$ and it's more like $o_{10}$ than $o_{3}$.
$endgroup$
– Adrian Keister
Jan 19 at 20:46
$begingroup$
Perhaps, then, you could instead edit the question to say what you mean. Because right now, it's essentially impossible to answer.
$endgroup$
– user3482749
Jan 19 at 20:53
|
show 7 more comments
$begingroup$
I have a small collection of objects, ${o_i}_{i=1}^{20},$ that have several properties (all quantitative, or can be made to be quantitative) that differ among themselves. So $o_1$ has, say, $10$ properties, ${p_{1j}}_{j=1}^{10},$ and something similar for the other $19$ objects. I have $5$ more objects that are of the same general kind as the original $20,$ but have yet different properties. Unfortunately, these properties are not even the same categories among all $25$ total objects. Some of the properties are common to all the objects, and some are not.
There is one important property, I'll call "usage", that is common to all $25$ objects. We'll let the usage of object $i$ be written as $u_i$. Note that $u_ige 0;forall,i.$
What I would like to do is find a way to write each of the $5$ new usages as a linear combination of the $20$ original object usages, with objects among the $20$ that are more similar contributing more.
For example, let's take one of the new objects, $o_{21}$. I would like to write
$$u_{21}=sum_{i=1}^{20}a_i u_i,qquad 0le a_ile 1;forall,i,qquad text{s.t.};sum_{i=1}^{20}a_i=1.$$
Now suppose that $o_7$ was the most similar to $o_{21}:$ then $a_7$ should be larger than all the other $a_{i}$'s. It is not important that this "linear combination" allow me to predict the properties of $o_{21}.$ It's only important that $sum_{i=1}^{20}a_i=1$ and that objects more similar to $o_{21}$ have correspondingly larger $a_i$'s.
The target variables here are, for each of the $5$ new objects, the $a_i$'s that satisfy the above criteria. The $u_i$ are known for the $20$ original objects, and unknown for the $5$ new objects, so the $u_i$ are also target variables for the new objects. However, as knowing the $a_i$ will determine the $u_i$ for the new objects, the immediate goal for this question is to find the $a_i$'s.
Now it's not too difficult to find out which of the $20$ original objects are "closest" to, say, $o_{21}:$ normalize the common quantitative categories, drop the ones not common to all, and use the Euclidean distance norm. (I have no a priori notion of which properties might be more important than others, so I'd rather treat them all on an equal footing.) If I divided each of these distances by the sum total of all the distances, I would get the opposite of what I want: the "closer" objects would have correspondingly smaller $a_i$'s.
So it comes down to this question: what sort of function would be good to switch this around, so that the closer objects get larger $a_i$'s? Subtract each distance from the maximum distance and then normalize?
norm data-analysis
$endgroup$
I have a small collection of objects, ${o_i}_{i=1}^{20},$ that have several properties (all quantitative, or can be made to be quantitative) that differ among themselves. So $o_1$ has, say, $10$ properties, ${p_{1j}}_{j=1}^{10},$ and something similar for the other $19$ objects. I have $5$ more objects that are of the same general kind as the original $20,$ but have yet different properties. Unfortunately, these properties are not even the same categories among all $25$ total objects. Some of the properties are common to all the objects, and some are not.
There is one important property, I'll call "usage", that is common to all $25$ objects. We'll let the usage of object $i$ be written as $u_i$. Note that $u_ige 0;forall,i.$
What I would like to do is find a way to write each of the $5$ new usages as a linear combination of the $20$ original object usages, with objects among the $20$ that are more similar contributing more.
For example, let's take one of the new objects, $o_{21}$. I would like to write
$$u_{21}=sum_{i=1}^{20}a_i u_i,qquad 0le a_ile 1;forall,i,qquad text{s.t.};sum_{i=1}^{20}a_i=1.$$
Now suppose that $o_7$ was the most similar to $o_{21}:$ then $a_7$ should be larger than all the other $a_{i}$'s. It is not important that this "linear combination" allow me to predict the properties of $o_{21}.$ It's only important that $sum_{i=1}^{20}a_i=1$ and that objects more similar to $o_{21}$ have correspondingly larger $a_i$'s.
The target variables here are, for each of the $5$ new objects, the $a_i$'s that satisfy the above criteria. The $u_i$ are known for the $20$ original objects, and unknown for the $5$ new objects, so the $u_i$ are also target variables for the new objects. However, as knowing the $a_i$ will determine the $u_i$ for the new objects, the immediate goal for this question is to find the $a_i$'s.
Now it's not too difficult to find out which of the $20$ original objects are "closest" to, say, $o_{21}:$ normalize the common quantitative categories, drop the ones not common to all, and use the Euclidean distance norm. (I have no a priori notion of which properties might be more important than others, so I'd rather treat them all on an equal footing.) If I divided each of these distances by the sum total of all the distances, I would get the opposite of what I want: the "closer" objects would have correspondingly smaller $a_i$'s.
So it comes down to this question: what sort of function would be good to switch this around, so that the closer objects get larger $a_i$'s? Subtract each distance from the maximum distance and then normalize?
norm data-analysis
norm data-analysis
edited Jan 19 at 20:58
Adrian Keister
asked Jan 19 at 20:27
Adrian KeisterAdrian Keister
5,27371933
5,27371933
$begingroup$
The first problem that I can see here is that there's no reason why your original objects should span the vector space of properties.
$endgroup$
– user3482749
Jan 19 at 20:39
$begingroup$
That's unimportant in my application. This application is kind of like Natural Language Processing, e.g., if I'm wanting to find a word most similar to another in meaning. In a situation like that, it's not important if the set of words you have to work with spans the space of all meanings. You just want the closest one. In this case, I don't want to lose the information that the objects farther away give me, hence the linear combination. I could just go with the closest, but in this application, I think the linear combination gives me a bit more finesse.
$endgroup$
– Adrian Keister
Jan 19 at 20:42
$begingroup$
It isn't, though: the very first thing that you do is write your new object as a linear combination of the old ones. If the old objects don't span, how can you know that's even possible?
$endgroup$
– user3482749
Jan 19 at 20:43
$begingroup$
You're quite right; I already know it's not possible in any exact sense. The "linear combination" is only there as a way of expressing the idea that $o_{21}$ is like all these $20$ objects, but it's more like $o_7$ than $o_9,$ and it's more like $o_{10}$ than $o_{3}$.
$endgroup$
– Adrian Keister
Jan 19 at 20:46
$begingroup$
Perhaps, then, you could instead edit the question to say what you mean. Because right now, it's essentially impossible to answer.
$endgroup$
– user3482749
Jan 19 at 20:53
|
show 7 more comments
$begingroup$
The first problem that I can see here is that there's no reason why your original objects should span the vector space of properties.
$endgroup$
– user3482749
Jan 19 at 20:39
$begingroup$
That's unimportant in my application. This application is kind of like Natural Language Processing, e.g., if I'm wanting to find a word most similar to another in meaning. In a situation like that, it's not important if the set of words you have to work with spans the space of all meanings. You just want the closest one. In this case, I don't want to lose the information that the objects farther away give me, hence the linear combination. I could just go with the closest, but in this application, I think the linear combination gives me a bit more finesse.
$endgroup$
– Adrian Keister
Jan 19 at 20:42
$begingroup$
It isn't, though: the very first thing that you do is write your new object as a linear combination of the old ones. If the old objects don't span, how can you know that's even possible?
$endgroup$
– user3482749
Jan 19 at 20:43
$begingroup$
You're quite right; I already know it's not possible in any exact sense. The "linear combination" is only there as a way of expressing the idea that $o_{21}$ is like all these $20$ objects, but it's more like $o_7$ than $o_9,$ and it's more like $o_{10}$ than $o_{3}$.
$endgroup$
– Adrian Keister
Jan 19 at 20:46
$begingroup$
Perhaps, then, you could instead edit the question to say what you mean. Because right now, it's essentially impossible to answer.
$endgroup$
– user3482749
Jan 19 at 20:53
$begingroup$
The first problem that I can see here is that there's no reason why your original objects should span the vector space of properties.
$endgroup$
– user3482749
Jan 19 at 20:39
$begingroup$
The first problem that I can see here is that there's no reason why your original objects should span the vector space of properties.
$endgroup$
– user3482749
Jan 19 at 20:39
$begingroup$
That's unimportant in my application. This application is kind of like Natural Language Processing, e.g., if I'm wanting to find a word most similar to another in meaning. In a situation like that, it's not important if the set of words you have to work with spans the space of all meanings. You just want the closest one. In this case, I don't want to lose the information that the objects farther away give me, hence the linear combination. I could just go with the closest, but in this application, I think the linear combination gives me a bit more finesse.
$endgroup$
– Adrian Keister
Jan 19 at 20:42
$begingroup$
That's unimportant in my application. This application is kind of like Natural Language Processing, e.g., if I'm wanting to find a word most similar to another in meaning. In a situation like that, it's not important if the set of words you have to work with spans the space of all meanings. You just want the closest one. In this case, I don't want to lose the information that the objects farther away give me, hence the linear combination. I could just go with the closest, but in this application, I think the linear combination gives me a bit more finesse.
$endgroup$
– Adrian Keister
Jan 19 at 20:42
$begingroup$
It isn't, though: the very first thing that you do is write your new object as a linear combination of the old ones. If the old objects don't span, how can you know that's even possible?
$endgroup$
– user3482749
Jan 19 at 20:43
$begingroup$
It isn't, though: the very first thing that you do is write your new object as a linear combination of the old ones. If the old objects don't span, how can you know that's even possible?
$endgroup$
– user3482749
Jan 19 at 20:43
$begingroup$
You're quite right; I already know it's not possible in any exact sense. The "linear combination" is only there as a way of expressing the idea that $o_{21}$ is like all these $20$ objects, but it's more like $o_7$ than $o_9,$ and it's more like $o_{10}$ than $o_{3}$.
$endgroup$
– Adrian Keister
Jan 19 at 20:46
$begingroup$
You're quite right; I already know it's not possible in any exact sense. The "linear combination" is only there as a way of expressing the idea that $o_{21}$ is like all these $20$ objects, but it's more like $o_7$ than $o_9,$ and it's more like $o_{10}$ than $o_{3}$.
$endgroup$
– Adrian Keister
Jan 19 at 20:46
$begingroup$
Perhaps, then, you could instead edit the question to say what you mean. Because right now, it's essentially impossible to answer.
$endgroup$
– user3482749
Jan 19 at 20:53
$begingroup$
Perhaps, then, you could instead edit the question to say what you mean. Because right now, it's essentially impossible to answer.
$endgroup$
– user3482749
Jan 19 at 20:53
|
show 7 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3079780%2ffinding-objects-most-similar-to-other-objects-as-a-linear-combination%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3079780%2ffinding-objects-most-similar-to-other-objects-as-a-linear-combination%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
The first problem that I can see here is that there's no reason why your original objects should span the vector space of properties.
$endgroup$
– user3482749
Jan 19 at 20:39
$begingroup$
That's unimportant in my application. This application is kind of like Natural Language Processing, e.g., if I'm wanting to find a word most similar to another in meaning. In a situation like that, it's not important if the set of words you have to work with spans the space of all meanings. You just want the closest one. In this case, I don't want to lose the information that the objects farther away give me, hence the linear combination. I could just go with the closest, but in this application, I think the linear combination gives me a bit more finesse.
$endgroup$
– Adrian Keister
Jan 19 at 20:42
$begingroup$
It isn't, though: the very first thing that you do is write your new object as a linear combination of the old ones. If the old objects don't span, how can you know that's even possible?
$endgroup$
– user3482749
Jan 19 at 20:43
$begingroup$
You're quite right; I already know it's not possible in any exact sense. The "linear combination" is only there as a way of expressing the idea that $o_{21}$ is like all these $20$ objects, but it's more like $o_7$ than $o_9,$ and it's more like $o_{10}$ than $o_{3}$.
$endgroup$
– Adrian Keister
Jan 19 at 20:46
$begingroup$
Perhaps, then, you could instead edit the question to say what you mean. Because right now, it's essentially impossible to answer.
$endgroup$
– user3482749
Jan 19 at 20:53