How can i process multi loss in pytorch?

enter image description here

Such as this, I want to using some auxiliary loss to promoting my model performance.

Which type code can implement it in pytorch?

#one

loss1.backward()

loss2.backward()

loss3.backward()

optimizer.step()

#two

loss1.backward()

optimizer.step() 

loss2.backward()

optimizer.step() 

loss3.backward()

optimizer.step()   

#three

loss = loss1+loss2+loss3

loss.backward()

optimizer.step()

Thanks for your answer!

edited Jan 2 at 9:21

KonstantinosKokos

1,4471413

asked Jan 1 at 10:12

heiheihei

709

add a comment |

enter image description here

Such as this, I want to using some auxiliary loss to promoting my model performance.

Which type code can implement it in pytorch?

#one

loss1.backward()

loss2.backward()

loss3.backward()

optimizer.step()

#two

loss1.backward()

optimizer.step() 

loss2.backward()

optimizer.step() 

loss3.backward()

optimizer.step()   

#three

loss = loss1+loss2+loss3

loss.backward()

optimizer.step()

Thanks for your answer!

edited Jan 2 at 9:21

KonstantinosKokos

1,4471413

asked Jan 1 at 10:12

heiheihei

709

add a comment |

enter image description here

Such as this, I want to using some auxiliary loss to promoting my model performance.

Which type code can implement it in pytorch?

#one

loss1.backward()

loss2.backward()

loss3.backward()

optimizer.step()

#two

loss1.backward()

optimizer.step() 

loss2.backward()

optimizer.step() 

loss3.backward()

optimizer.step()   

#three

loss = loss1+loss2+loss3

loss.backward()

optimizer.step()

Thanks for your answer!

edited Jan 2 at 9:21

KonstantinosKokos

1,4471413

asked Jan 1 at 10:12

heiheihei

709

enter image description here

Such as this, I want to using some auxiliary loss to promoting my model performance.

Which type code can implement it in pytorch?

#one

loss1.backward()

loss2.backward()

loss3.backward()

optimizer.step()

#two

loss1.backward()

optimizer.step() 

loss2.backward()

optimizer.step() 

loss3.backward()

optimizer.step()   

#three

loss = loss1+loss2+loss3

loss.backward()

optimizer.step()

Thanks for your answer!

python pytorch

edited Jan 2 at 9:21

KonstantinosKokos

1,4471413

asked Jan 1 at 10:12

heiheihei

709

edited Jan 2 at 9:21

KonstantinosKokos

1,4471413

asked Jan 1 at 10:12

heiheihei

709

edited Jan 2 at 9:21

KonstantinosKokos

1,4471413

edited Jan 2 at 9:21

KonstantinosKokos

1,4471413

edited Jan 2 at 9:21

KonstantinosKokos

1,4471413

asked Jan 1 at 10:12

heiheihei

709

asked Jan 1 at 10:12

heiheihei

709

asked Jan 1 at 10:12

heiheihei

709

add a comment |

2 Answers
2

active

oldest

votes

First and 3rd attempt are exactly the same and correct, while 2nd approach is completely wrong.

Reason is, in Pytorch, low layer gradients are Not "overwritten" by subsequent backward() calls, rather they are accumulated, or summed. This makes first and 3rd approach identical, though 1st approach might be preferable if you have low-memory GPU/RAM, since a batch size of 1024 with immediate backward() + step() call is same as having 8 batches of size 128 and 8 backward() calls, with one step() call in the end.

To illustrate the idea, here is a simple example. We want to get our tensor x closest to [40,50,60] simultaneously:

x = torch.tensor([1.0],requires_grad=True)

loss1 = criterion(40,x)

loss2 = criterion(50,x)

loss3 = criterion(60,x)

Now the first approach: (we use tensor.grad to get current gradient for our tensor x)

loss1.backward()

loss2.backward()

loss3.backward()



print(x.grad)

This outputs : tensor([-294.])

The third approach:

loss = loss1+loss2+loss3

loss.backward()

print(x.grad)

Again the output is : tensor([-294.])

2nd approach is different because we don't call opt.zero_grad after calling step() method. This means in all 3 step calls gradients of first backward call is used. For example, if 3 losses provide gradients 5,1,4 for same weight, instead of having 10 (=5+1+4), now your weight will have 5*3+1*2+4*1=21 as gradient.

I do agree with the conclusion though, use 3rd approach if memory isn't issue. For further reading : Link 1,Link 2

edited Jan 3 at 8:57

answered Jan 1 at 16:22

Shihab Shahriar

1,322714

1

I used the third method,it's worked.Thank you for your patience and careful reply.

– heiheihei
Jan 2 at 8:18

add a comment |

-- Comment on first approach removed, see other answer --

Your second approach would require that you backpropagate with retain_graph=True, which incurs heavy computational costs. Moreover, it is wrong, since you would have updated the network weights with the first optimizer step, and then your next backward() call would compute the gradients prior to the update, which means that the second step() call would insert noise into your updates. If on the other hand you performed another forward() call to backpropagate through the updated weights, you would end up having an asynchronous optimization, since the first layers would be updated once with the first step(), and then once more for each subsequent step() call (not wrong per se, but inefficient and probably not what you wanted in the first place).

Long story short, the way to go is the last approach. Reduce each loss into a scalar, sum the losses and backpropagate the resulting loss. Side note; make sure your reduction scheme makes sense (e.g. if you are using reduction='sum' and the losses correspond to a multi-label classification, remember that the number of classes per objective is different, so the relative weight contributed by each loss would also be different)

edited Jan 1 at 18:15

answered Jan 1 at 11:48

KonstantinosKokos

1,4471413

1

I believe this might have few mistakes: link1,Link2. Please let me know if I've myself made any error....

– Shihab Shahriar
Jan 1 at 16:29

1

Good point, thanks for noticing and sorry for the wrong information-- I was certain that this was the case for some reason.

– KonstantinosKokos
Jan 1 at 18:09

1

Thank you for your patience and careful answer.Your answer gives me some inspiration,but I'm dealing with objection detection problem,so I don't know what impact it will have on classification.Thanks again!

– heiheihei
Jan 2 at 8:29

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994625%2fhow-can-i-process-multi-loss-in-pytorch%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

First and 3rd attempt are exactly the same and correct, while 2nd approach is completely wrong.

To illustrate the idea, here is a simple example. We want to get our tensor x closest to [40,50,60] simultaneously:

x = torch.tensor([1.0],requires_grad=True)

loss1 = criterion(40,x)

loss2 = criterion(50,x)

loss3 = criterion(60,x)

Now the first approach: (we use tensor.grad to get current gradient for our tensor x)

loss1.backward()

loss2.backward()

loss3.backward()



print(x.grad)

This outputs : tensor([-294.])

The third approach:

loss = loss1+loss2+loss3

loss.backward()

print(x.grad)

Again the output is : tensor([-294.])

I do agree with the conclusion though, use 3rd approach if memory isn't issue. For further reading : Link 1,Link 2

edited Jan 3 at 8:57

answered Jan 1 at 16:22

Shihab Shahriar

1,322714

1

I used the third method,it's worked.Thank you for your patience and careful reply.

– heiheihei
Jan 2 at 8:18

add a comment |

First and 3rd attempt are exactly the same and correct, while 2nd approach is completely wrong.

To illustrate the idea, here is a simple example. We want to get our tensor x closest to [40,50,60] simultaneously:

x = torch.tensor([1.0],requires_grad=True)

loss1 = criterion(40,x)

loss2 = criterion(50,x)

loss3 = criterion(60,x)

Now the first approach: (we use tensor.grad to get current gradient for our tensor x)

loss1.backward()

loss2.backward()

loss3.backward()



print(x.grad)

This outputs : tensor([-294.])

The third approach:

loss = loss1+loss2+loss3

loss.backward()

print(x.grad)

Again the output is : tensor([-294.])

I do agree with the conclusion though, use 3rd approach if memory isn't issue. For further reading : Link 1,Link 2

edited Jan 3 at 8:57

answered Jan 1 at 16:22

Shihab Shahriar

1,322714

1

I used the third method,it's worked.Thank you for your patience and careful reply.

– heiheihei
Jan 2 at 8:18

add a comment |

First and 3rd attempt are exactly the same and correct, while 2nd approach is completely wrong.

To illustrate the idea, here is a simple example. We want to get our tensor x closest to [40,50,60] simultaneously:

x = torch.tensor([1.0],requires_grad=True)

loss1 = criterion(40,x)

loss2 = criterion(50,x)

loss3 = criterion(60,x)

Now the first approach: (we use tensor.grad to get current gradient for our tensor x)

loss1.backward()

loss2.backward()

loss3.backward()



print(x.grad)

This outputs : tensor([-294.])

The third approach:

loss = loss1+loss2+loss3

loss.backward()

print(x.grad)

Again the output is : tensor([-294.])

I do agree with the conclusion though, use 3rd approach if memory isn't issue. For further reading : Link 1,Link 2

edited Jan 3 at 8:57

answered Jan 1 at 16:22

Shihab Shahriar

1,322714

First and 3rd attempt are exactly the same and correct, while 2nd approach is completely wrong.

To illustrate the idea, here is a simple example. We want to get our tensor x closest to [40,50,60] simultaneously:

x = torch.tensor([1.0],requires_grad=True)

loss1 = criterion(40,x)

loss2 = criterion(50,x)

loss3 = criterion(60,x)

Now the first approach: (we use tensor.grad to get current gradient for our tensor x)

loss1.backward()

loss2.backward()

loss3.backward()



print(x.grad)

This outputs : tensor([-294.])

The third approach:

loss = loss1+loss2+loss3

loss.backward()

print(x.grad)

Again the output is : tensor([-294.])

I do agree with the conclusion though, use 3rd approach if memory isn't issue. For further reading : Link 1,Link 2

edited Jan 3 at 8:57

answered Jan 1 at 16:22

Shihab Shahriar

1,322714

edited Jan 3 at 8:57

answered Jan 1 at 16:22

Shihab Shahriar

1,322714

answered Jan 1 at 16:22

Shihab Shahriar

1,322714

answered Jan 1 at 16:22

Shihab Shahriar

1,322714

1

I used the third method,it's worked.Thank you for your patience and careful reply.

– heiheihei
Jan 2 at 8:18

add a comment |

1

I used the third method,it's worked.Thank you for your patience and careful reply.

– heiheihei
Jan 2 at 8:18

I used the third method,it's worked.Thank you for your patience and careful reply.

– heiheihei
Jan 2 at 8:18

add a comment |

-- Comment on first approach removed, see other answer --

edited Jan 1 at 18:15

answered Jan 1 at 11:48

KonstantinosKokos

1,4471413

1

I believe this might have few mistakes: link1,Link2. Please let me know if I've myself made any error....

– Shihab Shahriar
Jan 1 at 16:29

1

Good point, thanks for noticing and sorry for the wrong information-- I was certain that this was the case for some reason.

– KonstantinosKokos
Jan 1 at 18:09

1

Thank you for your patience and careful answer.Your answer gives me some inspiration,but I'm dealing with objection detection problem,so I don't know what impact it will have on classification.Thanks again!

– heiheihei
Jan 2 at 8:29

add a comment |

-- Comment on first approach removed, see other answer --

edited Jan 1 at 18:15

answered Jan 1 at 11:48

KonstantinosKokos

1,4471413

1

I believe this might have few mistakes: link1,Link2. Please let me know if I've myself made any error....

– Shihab Shahriar
Jan 1 at 16:29

1

Good point, thanks for noticing and sorry for the wrong information-- I was certain that this was the case for some reason.

– KonstantinosKokos
Jan 1 at 18:09

1

Thank you for your patience and careful answer.Your answer gives me some inspiration,but I'm dealing with objection detection problem,so I don't know what impact it will have on classification.Thanks again!

– heiheihei
Jan 2 at 8:29

add a comment |

-- Comment on first approach removed, see other answer --

edited Jan 1 at 18:15

answered Jan 1 at 11:48

KonstantinosKokos

1,4471413

-- Comment on first approach removed, see other answer --

edited Jan 1 at 18:15

answered Jan 1 at 11:48

KonstantinosKokos

1,4471413

edited Jan 1 at 18:15

answered Jan 1 at 11:48

KonstantinosKokos

1,4471413

answered Jan 1 at 11:48

KonstantinosKokos

1,4471413

answered Jan 1 at 11:48

KonstantinosKokos

1,4471413

1

I believe this might have few mistakes: link1,Link2. Please let me know if I've myself made any error....

– Shihab Shahriar
Jan 1 at 16:29

1

Good point, thanks for noticing and sorry for the wrong information-- I was certain that this was the case for some reason.

– KonstantinosKokos
Jan 1 at 18:09

1

Thank you for your patience and careful answer.Your answer gives me some inspiration,but I'm dealing with objection detection problem,so I don't know what impact it will have on classification.Thanks again!

– heiheihei
Jan 2 at 8:29

add a comment |

1

I believe this might have few mistakes: link1,Link2. Please let me know if I've myself made any error....

– Shihab Shahriar
Jan 1 at 16:29

1

Good point, thanks for noticing and sorry for the wrong information-- I was certain that this was the case for some reason.

– KonstantinosKokos
Jan 1 at 18:09

1

Thank you for your patience and careful answer.Your answer gives me some inspiration,but I'm dealing with objection detection problem,so I don't know what impact it will have on classification.Thanks again!

– heiheihei
Jan 2 at 8:29

I believe this might have few mistakes: link1,Link2. Please let me know if I've myself made any error....

– Shihab Shahriar
Jan 1 at 16:29

Good point, thanks for noticing and sorry for the wrong information-- I was certain that this was the case for some reason.

– KonstantinosKokos
Jan 1 at 18:09

Thank you for your patience and careful answer.Your answer gives me some inspiration,but I'm dealing with objection detection problem,so I don't know what impact it will have on classification.Thanks again!

– heiheihei
Jan 2 at 8:29

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu