qfle3 driver crashing VMWare hosts - solved reverting to bnx2i legacy drivers
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
It's been happening for a while now.
Several different hosts, dell hosts, using "QLogic 57810 10 Gigabit Ethernet Adapter" network cards have been failing for a while now while using the native qfle3 driver.
We tried disabling the load balancing queues without any positive results (hosts kept crashing)-
The only solution we found was to revert to bnx2i drivers. This definitely made the hosts sop crashing.
We are discarding hardware problems since it's been happening on multiple machines - even the hardware retailer agreed to replace some of those cards but no luck still.
The version on the hosts is 6.7.0 and we are having a hard time getting a straight answer from VMWare - Frimwares are ok according to the compatibility matrix.
Did anyone else come across this issue? What could the problem be?
drivers
migrated from stackoverflow.com Jan 23 at 2:41
This question came from our site for professional and enthusiast programmers.
add a comment |
It's been happening for a while now.
Several different hosts, dell hosts, using "QLogic 57810 10 Gigabit Ethernet Adapter" network cards have been failing for a while now while using the native qfle3 driver.
We tried disabling the load balancing queues without any positive results (hosts kept crashing)-
The only solution we found was to revert to bnx2i drivers. This definitely made the hosts sop crashing.
We are discarding hardware problems since it's been happening on multiple machines - even the hardware retailer agreed to replace some of those cards but no luck still.
The version on the hosts is 6.7.0 and we are having a hard time getting a straight answer from VMWare - Frimwares are ok according to the compatibility matrix.
Did anyone else come across this issue? What could the problem be?
drivers
migrated from stackoverflow.com Jan 23 at 2:41
This question came from our site for professional and enthusiast programmers.
add a comment |
It's been happening for a while now.
Several different hosts, dell hosts, using "QLogic 57810 10 Gigabit Ethernet Adapter" network cards have been failing for a while now while using the native qfle3 driver.
We tried disabling the load balancing queues without any positive results (hosts kept crashing)-
The only solution we found was to revert to bnx2i drivers. This definitely made the hosts sop crashing.
We are discarding hardware problems since it's been happening on multiple machines - even the hardware retailer agreed to replace some of those cards but no luck still.
The version on the hosts is 6.7.0 and we are having a hard time getting a straight answer from VMWare - Frimwares are ok according to the compatibility matrix.
Did anyone else come across this issue? What could the problem be?
drivers
It's been happening for a while now.
Several different hosts, dell hosts, using "QLogic 57810 10 Gigabit Ethernet Adapter" network cards have been failing for a while now while using the native qfle3 driver.
We tried disabling the load balancing queues without any positive results (hosts kept crashing)-
The only solution we found was to revert to bnx2i drivers. This definitely made the hosts sop crashing.
We are discarding hardware problems since it's been happening on multiple machines - even the hardware retailer agreed to replace some of those cards but no luck still.
The version on the hosts is 6.7.0 and we are having a hard time getting a straight answer from VMWare - Frimwares are ok according to the compatibility matrix.
Did anyone else come across this issue? What could the problem be?
drivers
drivers
asked Jan 3 at 7:18
RedNanoRedNano
61
61
migrated from stackoverflow.com Jan 23 at 2:41
This question came from our site for professional and enthusiast programmers.
migrated from stackoverflow.com Jan 23 at 2:41
This question came from our site for professional and enthusiast programmers.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Edit 2019-03-01: Updated drivers have been released at VMware support portal.
I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.
There are 2 independent issues (maybe more, but specific to my issue):
- iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.
- There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer
esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX
In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "2"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f950301%2fqfle3-driver-crashing-vmware-hosts-solved-reverting-to-bnx2i-legacy-drivers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Edit 2019-03-01: Updated drivers have been released at VMware support portal.
I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.
There are 2 independent issues (maybe more, but specific to my issue):
- iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.
- There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer
esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX
In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.
add a comment |
Edit 2019-03-01: Updated drivers have been released at VMware support portal.
I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.
There are 2 independent issues (maybe more, but specific to my issue):
- iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.
- There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer
esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX
In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.
add a comment |
Edit 2019-03-01: Updated drivers have been released at VMware support portal.
I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.
There are 2 independent issues (maybe more, but specific to my issue):
- iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.
- There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer
esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX
In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.
Edit 2019-03-01: Updated drivers have been released at VMware support portal.
I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.
There are 2 independent issues (maybe more, but specific to my issue):
- iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.
- There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer
esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX
In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.
edited Mar 1 at 9:24
answered Jan 22 at 9:02
Don ZoomikDon Zoomik
1,003610
1,003610
add a comment |
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f950301%2fqfle3-driver-crashing-vmware-hosts-solved-reverting-to-bnx2i-legacy-drivers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown