I had an interesting conversation the other day about failure of the APICs within an ACI fabric. This particular customer had been burned by another vendor’s fabric solution with regards to failure or upgrade scenarios. I mentioned that the APICs could all blow up and the data plane of the fabric would keep chugging along. This customer said, “I want to see it. I want to pull the power to the controllers and see it.”
I realized that I had been saying this scenario was possible for a while now, but I had never explicitly tested it. I figured I had better ensure this behavior before I have my customer come out and pull the plug on my APICs.
Here is the setup. I have two EPGs on either side of an F5 LTM. This LTM is doing a simple round robin load balancing of some apache web servers. In my test, I am pinging the virtual server on the LTM. I also tested that I could get to the web pages served by the pool defined in the LTM.
I am advertising the bridge domain subnet (10.207.141.0/24) associated with F5 Outside EPG via an L3 Out.
Starting a ping to all three controllers and the VS on the LTM appliance (10.207.141.100). I can ping all four addresses. I can also access web pages from servers in the Web EPG that are load balanced by the LTM.
I logged into the CIMC of all three controllers and did a reboot. At this point I lost pings to all three controllers as expected. I did, however, keep my pings to the VS and I was still able to access web pages.
Now I have proof that the APIC failures do not affect data plane traffic. I can rest easy the next time a customer questions that statement.