Show that $x, y to -log(operatorname{sigmoid}(x) - operatorname{sigmoid}(y))$ is convex for $x

I've managed to essentially brute force the problem by calculating the Hessian of the function, and showing that its determinant and trace are non-negative.

This was done by using a change of variable to reduce the problem to showing that two certain polynomials are positive over a subset of $[0,1]^2$, proving that it's non-negative in a neighborhood of its zeros, and numerically checking that it's positive away from them.

This solution feels a bit too messy for me, so I was wondering if there isn't a cleaner approach one could use. (I'm aware we could use Sylvester's criterion to simplify the numerical step, but I'd like to avoid using that as well if possible.)

For reference, the expression of the Hessian is.
$$H(x,y) = begin{bmatrix}
-s(1-s)(1-2s)(s-t) + s^2(1-s)^2 && -s(1-s)t(1-t) \
-s(1-s)t(1-t) && t(1-t)(1-2t)(s-t) + t^2(1-t)^2
end{bmatrix}.$$

where $s=operatorname{sigmoid}(x), t=operatorname{sigmoid}(y)$.

edited Jan 21 at 20:09

asked Jan 21 at 17:49

Kitegi

4351921

$begingroup$
Do you have a particular sigmoid function? [Ask because difference of sigmoids might be negative, so can't take log. Maybe need absolute value of difference...]
$endgroup$
– coffeemath
Jan 21 at 17:58

$begingroup$
@coffeemath $operatorname{sigmoid}(x) = frac{1}{1+exp(-x)}$ in this case.
$endgroup$
– Kitegi
Jan 21 at 18:12

$begingroup$
Does that mean you impose one of $x<y, y<x$ to ensure input to log positive?
$endgroup$
– coffeemath
Jan 21 at 18:14

1

$begingroup$
@coffeemath Yes, the domain is $mathbb R^2$ s.t $x > y$.
$endgroup$
– Kitegi
Jan 21 at 18:18

add a comment |

For reference, the expression of the Hessian is.
$$H(x,y) = begin{bmatrix}
-s(1-s)(1-2s)(s-t) + s^2(1-s)^2 && -s(1-s)t(1-t) \
-s(1-s)t(1-t) && t(1-t)(1-2t)(s-t) + t^2(1-t)^2
end{bmatrix}.$$

where $s=operatorname{sigmoid}(x), t=operatorname{sigmoid}(y)$.

edited Jan 21 at 20:09

asked Jan 21 at 17:49

Kitegi

4351921

$begingroup$
Do you have a particular sigmoid function? [Ask because difference of sigmoids might be negative, so can't take log. Maybe need absolute value of difference...]
$endgroup$
– coffeemath
Jan 21 at 17:58

$begingroup$
@coffeemath $operatorname{sigmoid}(x) = frac{1}{1+exp(-x)}$ in this case.
$endgroup$
– Kitegi
Jan 21 at 18:12

$begingroup$
Does that mean you impose one of $x<y, y<x$ to ensure input to log positive?
$endgroup$
– coffeemath
Jan 21 at 18:14

1

$begingroup$
@coffeemath Yes, the domain is $mathbb R^2$ s.t $x > y$.
$endgroup$
– Kitegi
Jan 21 at 18:18

add a comment |

For reference, the expression of the Hessian is.
$$H(x,y) = begin{bmatrix}
-s(1-s)(1-2s)(s-t) + s^2(1-s)^2 && -s(1-s)t(1-t) \
-s(1-s)t(1-t) && t(1-t)(1-2t)(s-t) + t^2(1-t)^2
end{bmatrix}.$$

where $s=operatorname{sigmoid}(x), t=operatorname{sigmoid}(y)$.

edited Jan 21 at 20:09

asked Jan 21 at 17:49

Kitegi

4351921

For reference, the expression of the Hessian is.
$$H(x,y) = begin{bmatrix}
-s(1-s)(1-2s)(s-t) + s^2(1-s)^2 && -s(1-s)t(1-t) \
-s(1-s)t(1-t) && t(1-t)(1-2t)(s-t) + t^2(1-t)^2
end{bmatrix}.$$

where $s=operatorname{sigmoid}(x), t=operatorname{sigmoid}(y)$.

convex-analysis

edited Jan 21 at 20:09

asked Jan 21 at 17:49

Kitegi

4351921

edited Jan 21 at 20:09

asked Jan 21 at 17:49

Kitegi

4351921

edited Jan 21 at 20:09

asked Jan 21 at 17:49

Kitegi

4351921

asked Jan 21 at 17:49

Kitegi

4351921

asked Jan 21 at 17:49

Kitegi

4351921

$begingroup$
Do you have a particular sigmoid function? [Ask because difference of sigmoids might be negative, so can't take log. Maybe need absolute value of difference...]
$endgroup$
– coffeemath
Jan 21 at 17:58

$begingroup$
@coffeemath $operatorname{sigmoid}(x) = frac{1}{1+exp(-x)}$ in this case.
$endgroup$
– Kitegi
Jan 21 at 18:12

$begingroup$
Does that mean you impose one of $x<y, y<x$ to ensure input to log positive?
$endgroup$
– coffeemath
Jan 21 at 18:14

1

$begingroup$
@coffeemath Yes, the domain is $mathbb R^2$ s.t $x > y$.
$endgroup$
– Kitegi
Jan 21 at 18:18

add a comment |

$begingroup$
Do you have a particular sigmoid function? [Ask because difference of sigmoids might be negative, so can't take log. Maybe need absolute value of difference...]
$endgroup$
– coffeemath
Jan 21 at 17:58

$begingroup$
@coffeemath $operatorname{sigmoid}(x) = frac{1}{1+exp(-x)}$ in this case.
$endgroup$
– Kitegi
Jan 21 at 18:12

$begingroup$
Does that mean you impose one of $x<y, y<x$ to ensure input to log positive?
$endgroup$
– coffeemath
Jan 21 at 18:14

1

$begingroup$
@coffeemath Yes, the domain is $mathbb R^2$ s.t $x > y$.
$endgroup$
– Kitegi
Jan 21 at 18:18

Do you have a particular sigmoid function? [Ask because difference of sigmoids might be negative, so can't take log. Maybe need absolute value of difference...]

– coffeemath
Jan 21 at 17:58

@coffeemath $operatorname{sigmoid}(x) = frac{1}{1+exp(-x)}$ in this case.

– Kitegi
Jan 21 at 18:12

Does that mean you impose one of $x<y, y<x$ to ensure input to log positive?

– coffeemath
Jan 21 at 18:14

@coffeemath Yes, the domain is $mathbb R^2$ s.t $x > y$.

– Kitegi
Jan 21 at 18:18

add a comment |

2 Answers
2

active

oldest

votes

+200

Assume $0 leq t < s leq 1$.

Consider $f(s,t)=-s(1-s)(1-2s)(s-t) + s^2(1-s)^2 = s(1-s)(s^2+t-2st)$.
Each factor is nonnegative (the infimum over $s$ for the third factor occurs at $s=t$), so the (1,1) position of the Hessian is nonnegative. Analogously, the (2,2) position is nonnegative, so the trace is nonnegative.

The determinant is
$$begin{align}g(s,t) &= -s(1-s)(1-2s)t(1-t)(1-2t)(s-t)^2 \
& qquad + s^2(1-s)^2t(1-t)(1-2t)(s-t) - t^2(1-t)^2s(1-s)(1-2s)(s-t) \
&= -s(1-s)t(1-t)(s-t)^2(2st-s-t).
end{align}$$
For the first expression I wrote the Hessian as $(a+b)(c+d)-H_{12}^2$ and noticed that $bd=H_{12}^2$. Then I used this tool to simplify the expression (click more forms to see the one I copied). Now $s(1-s) geq 0$, $t(1-t) geq 0$, $(s-t)^2 geq 0$, so for the Hessian to be nonnegative, it remains to be proven that $2st-s-tleq0$. We have:
$$sup_{s,t}{2st-s-t} = sup_t sup_s{ 2st-s-t }= sup_t begin{cases}t-1 & text{if } 2t-1geq 0 \ -2t(1-t) & text{otherwise.}end{cases}$$
When $2t-1geq 0$, the derivative with respect to $s$ is positive, so the supremum is attained at the largest possible value for $s$ (which is $s=1)$. Conversely, in the second branch you plug in the smallest possible value ($s=t$). Both branches are nonpositive.

Et voila!

answered Jan 24 at 22:20

LinAlg

10k1521

$begingroup$
Well, that was anticlimactic. But I have no complaints. Note that you can simplify the last part by writing $2st-s-t = -((1-t)s + (1-s)t)$, which is clearly nonpositive.$$ $$ The simple fact I was overlooking was that instead of trying to show that the trace was nonnegative, I could just handle the diagonal terms separately. Since it's also a necessary condition for the matrix to be positive semidefinite. ($H_{i,i} = e_i^top H e_i geq 0$).
$endgroup$
– Kitegi
Jan 24 at 22:54

add a comment |

A possible approach is to use the fact that $log det X$ is concave for $X$ positive definite. For a proof of this statement see Boyd & Vandenberghe page 74.

Set $$ X = begin{pmatrix} e^x & e^y \ (1 + e^y)^{-1} & (1 + e^x)^{-1}end{pmatrix}$$ such that $det X = text{sigmoid}(x) - text{sigmoid}(y)$ and substitute $a = e^x$ and $b=e^y$. The characteristic polynomial is quadratic and it is a straightforward calculation to show that both eigenvalues of $X$ are positive if $frac{a}{1+a} > frac{b}{1+b}$.

edited Jan 25 at 22:27

answered Jan 24 at 23:05

g g

1,351417

$begingroup$
An elegant solution, but I prefer the other answer since I wanted something more elementary, in this case.
$endgroup$
– Kitegi
Jan 24 at 23:17

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3082169%2fshow-that-x-y-to-log-operatornamesigmoidx-operatornamesigmoidy%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

+200

Assume $0 leq t < s leq 1$.

Et voila!

answered Jan 24 at 22:20

LinAlg

10k1521

$begingroup$
Well, that was anticlimactic. But I have no complaints. Note that you can simplify the last part by writing $2st-s-t = -((1-t)s + (1-s)t)$, which is clearly nonpositive.$$ $$ The simple fact I was overlooking was that instead of trying to show that the trace was nonnegative, I could just handle the diagonal terms separately. Since it's also a necessary condition for the matrix to be positive semidefinite. ($H_{i,i} = e_i^top H e_i geq 0$).
$endgroup$
– Kitegi
Jan 24 at 22:54

add a comment |

+200

Assume $0 leq t < s leq 1$.

Et voila!

answered Jan 24 at 22:20

LinAlg

10k1521

$begingroup$
Well, that was anticlimactic. But I have no complaints. Note that you can simplify the last part by writing $2st-s-t = -((1-t)s + (1-s)t)$, which is clearly nonpositive.$$ $$ The simple fact I was overlooking was that instead of trying to show that the trace was nonnegative, I could just handle the diagonal terms separately. Since it's also a necessary condition for the matrix to be positive semidefinite. ($H_{i,i} = e_i^top H e_i geq 0$).
$endgroup$
– Kitegi
Jan 24 at 22:54

add a comment |

+200

Assume $0 leq t < s leq 1$.

Et voila!

answered Jan 24 at 22:20

LinAlg

10k1521

Assume $0 leq t < s leq 1$.

Et voila!

answered Jan 24 at 22:20

LinAlg

10k1521

answered Jan 24 at 22:20

LinAlg

10k1521

answered Jan 24 at 22:20

LinAlg

10k1521

answered Jan 24 at 22:20

LinAlg

10k1521

$begingroup$
Well, that was anticlimactic. But I have no complaints. Note that you can simplify the last part by writing $2st-s-t = -((1-t)s + (1-s)t)$, which is clearly nonpositive.$$ $$ The simple fact I was overlooking was that instead of trying to show that the trace was nonnegative, I could just handle the diagonal terms separately. Since it's also a necessary condition for the matrix to be positive semidefinite. ($H_{i,i} = e_i^top H e_i geq 0$).
$endgroup$
– Kitegi
Jan 24 at 22:54

add a comment |

$begingroup$
Well, that was anticlimactic. But I have no complaints. Note that you can simplify the last part by writing $2st-s-t = -((1-t)s + (1-s)t)$, which is clearly nonpositive.$$ $$ The simple fact I was overlooking was that instead of trying to show that the trace was nonnegative, I could just handle the diagonal terms separately. Since it's also a necessary condition for the matrix to be positive semidefinite. ($H_{i,i} = e_i^top H e_i geq 0$).
$endgroup$
– Kitegi
Jan 24 at 22:54

Well, that was anticlimactic. But I have no complaints. Note that you can simplify the last part by writing $2st-s-t = -((1-t)s + (1-s)t)$, which is clearly nonpositive.$$ $$ The simple fact I was overlooking was that instead of trying to show that the trace was nonnegative, I could just handle the diagonal terms separately. Since it's also a necessary condition for the matrix to be positive semidefinite. ($H_{i,i} = e_i^top H e_i geq 0$).

– Kitegi
Jan 24 at 22:54

add a comment |

A possible approach is to use the fact that $log det X$ is concave for $X$ positive definite. For a proof of this statement see Boyd & Vandenberghe page 74.

edited Jan 25 at 22:27

answered Jan 24 at 23:05

g g

1,351417

$begingroup$
An elegant solution, but I prefer the other answer since I wanted something more elementary, in this case.
$endgroup$
– Kitegi
Jan 24 at 23:17

add a comment |

A possible approach is to use the fact that $log det X$ is concave for $X$ positive definite. For a proof of this statement see Boyd & Vandenberghe page 74.

edited Jan 25 at 22:27

answered Jan 24 at 23:05

g g

1,351417

$begingroup$
An elegant solution, but I prefer the other answer since I wanted something more elementary, in this case.
$endgroup$
– Kitegi
Jan 24 at 23:17

add a comment |

A possible approach is to use the fact that $log det X$ is concave for $X$ positive definite. For a proof of this statement see Boyd & Vandenberghe page 74.

edited Jan 25 at 22:27

answered Jan 24 at 23:05

g g

1,351417

A possible approach is to use the fact that $log det X$ is concave for $X$ positive definite. For a proof of this statement see Boyd & Vandenberghe page 74.

edited Jan 25 at 22:27

answered Jan 24 at 23:05

g g

1,351417

edited Jan 25 at 22:27

answered Jan 24 at 23:05

g g

1,351417

answered Jan 24 at 23:05

g g

1,351417

answered Jan 24 at 23:05

g g

1,351417

$begingroup$
An elegant solution, but I prefer the other answer since I wanted something more elementary, in this case.
$endgroup$
– Kitegi
Jan 24 at 23:17

add a comment |

$begingroup$
An elegant solution, but I prefer the other answer since I wanted something more elementary, in this case.
$endgroup$
– Kitegi
Jan 24 at 23:17

An elegant solution, but I prefer the other answer since I wanted something more elementary, in this case.

– Kitegi
Jan 24 at 23:17

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu