qfle3 driver crashing VMWare hosts - solved reverting to bnx2i legacy drivers





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















It's been happening for a while now.
Several different hosts, dell hosts, using "QLogic 57810 10 Gigabit Ethernet Adapter" network cards have been failing for a while now while using the native qfle3 driver.
We tried disabling the load balancing queues without any positive results (hosts kept crashing)-
The only solution we found was to revert to bnx2i drivers. This definitely made the hosts sop crashing.



We are discarding hardware problems since it's been happening on multiple machines - even the hardware retailer agreed to replace some of those cards but no luck still.
The version on the hosts is 6.7.0 and we are having a hard time getting a straight answer from VMWare - Frimwares are ok according to the compatibility matrix.



Did anyone else come across this issue? What could the problem be?










share|improve this question













migrated from stackoverflow.com Jan 23 at 2:41


This question came from our site for professional and enthusiast programmers.

























    1















    It's been happening for a while now.
    Several different hosts, dell hosts, using "QLogic 57810 10 Gigabit Ethernet Adapter" network cards have been failing for a while now while using the native qfle3 driver.
    We tried disabling the load balancing queues without any positive results (hosts kept crashing)-
    The only solution we found was to revert to bnx2i drivers. This definitely made the hosts sop crashing.



    We are discarding hardware problems since it's been happening on multiple machines - even the hardware retailer agreed to replace some of those cards but no luck still.
    The version on the hosts is 6.7.0 and we are having a hard time getting a straight answer from VMWare - Frimwares are ok according to the compatibility matrix.



    Did anyone else come across this issue? What could the problem be?










    share|improve this question













    migrated from stackoverflow.com Jan 23 at 2:41


    This question came from our site for professional and enthusiast programmers.





















      1












      1








      1








      It's been happening for a while now.
      Several different hosts, dell hosts, using "QLogic 57810 10 Gigabit Ethernet Adapter" network cards have been failing for a while now while using the native qfle3 driver.
      We tried disabling the load balancing queues without any positive results (hosts kept crashing)-
      The only solution we found was to revert to bnx2i drivers. This definitely made the hosts sop crashing.



      We are discarding hardware problems since it's been happening on multiple machines - even the hardware retailer agreed to replace some of those cards but no luck still.
      The version on the hosts is 6.7.0 and we are having a hard time getting a straight answer from VMWare - Frimwares are ok according to the compatibility matrix.



      Did anyone else come across this issue? What could the problem be?










      share|improve this question














      It's been happening for a while now.
      Several different hosts, dell hosts, using "QLogic 57810 10 Gigabit Ethernet Adapter" network cards have been failing for a while now while using the native qfle3 driver.
      We tried disabling the load balancing queues without any positive results (hosts kept crashing)-
      The only solution we found was to revert to bnx2i drivers. This definitely made the hosts sop crashing.



      We are discarding hardware problems since it's been happening on multiple machines - even the hardware retailer agreed to replace some of those cards but no luck still.
      The version on the hosts is 6.7.0 and we are having a hard time getting a straight answer from VMWare - Frimwares are ok according to the compatibility matrix.



      Did anyone else come across this issue? What could the problem be?







      drivers






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 3 at 7:18









      RedNanoRedNano

      61




      61




      migrated from stackoverflow.com Jan 23 at 2:41


      This question came from our site for professional and enthusiast programmers.









      migrated from stackoverflow.com Jan 23 at 2:41


      This question came from our site for professional and enthusiast programmers.
























          1 Answer
          1






          active

          oldest

          votes


















          1














          Edit 2019-03-01: Updated drivers have been released at VMware support portal.



          I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.



          There are 2 independent issues (maybe more, but specific to my issue):




          • iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.

          • There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX


          In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.






          share|improve this answer


























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "2"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f950301%2fqfle3-driver-crashing-vmware-hosts-solved-reverting-to-bnx2i-legacy-drivers%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            Edit 2019-03-01: Updated drivers have been released at VMware support portal.



            I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.



            There are 2 independent issues (maybe more, but specific to my issue):




            • iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.

            • There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX


            In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.






            share|improve this answer






























              1














              Edit 2019-03-01: Updated drivers have been released at VMware support portal.



              I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.



              There are 2 independent issues (maybe more, but specific to my issue):




              • iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.

              • There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX


              In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.






              share|improve this answer




























                1












                1








                1







                Edit 2019-03-01: Updated drivers have been released at VMware support portal.



                I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.



                There are 2 independent issues (maybe more, but specific to my issue):




                • iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.

                • There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX


                In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.






                share|improve this answer















                Edit 2019-03-01: Updated drivers have been released at VMware support portal.



                I just spent 7 months working on this (or very similar?) issue with HPE, Broadcom/QLogic/Cavium/Marvell (whoever they are now...) and VMware. It's been tough and a very pink (as is PSODs) experience. It started in June 2018 to July 2018 in different datacenters with different configurations. As HPE requires you to use qfle3 and vSphere 6.0 (that is allowed to use bnx2) lacked some key functionality, I was between a rock and a hard place.



                There are 2 independent issues (maybe more, but specific to my issue):




                • iSCSI offload (as you mention bnx2i) driver is unstable. I've heard various other accounts but in my case hosts would PSOD within seconds or minutes after configuring iSCSI (often resulting in bootloop as host would crash during boot iSCSI login). I have beta drivers from late December 2018 that work fine now. Stable/tested/qualified drivers are supposed to be published by Marvell at QLogic download portal at the end of January 2019 (or early February). Workaround: use software iSCSI.

                • There's a bug in vSphere 6.7 RSS module that should be fixed in vSphere 6.7U2 (spring 2019). This causes occasional PSODs, NMIs or just network connectivity loss (often a bit after vMotion). Workaround is to disable vSphere RSS loadbalancer esxcli network nic queue loadbalancer set --rsslb=disable -n vmnicX


                In my case I had to switch all production NICs to Intel ones (that don't support iSCSI nor RSS) but I continued to push the case due to some future requirements.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 1 at 9:24

























                answered Jan 22 at 9:02









                Don ZoomikDon Zoomik

                1,003610




                1,003610






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Server Fault!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f950301%2fqfle3-driver-crashing-vmware-hosts-solved-reverting-to-bnx2i-legacy-drivers%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    MongoDB - Not Authorized To Execute Command

                    Npm cannot find a required file even through it is in the searched directory

                    in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith