As fashion would have it, we’re more and more using cloud services, either private of public (and don’t get me wrong : there are a lot of benefits doing so !).
So here I am, provisioning VMs in our private cloud, setting them up as nodes of a ServiceFabric cluster, and deploying my services there.
Everything was running well enough, until I wanted to add ElasticSearch and Kibana as guest executables. Then strange things happened, such as nodes failing to talk to each other, but a bit randomly. For instance
ping <hostname> could fail, while
ping <ip address> would work.
The root cause was the default setting of ServiceFabric regarding port usage, with only 300 ports allowed (both for ephemeral ports and application ports). This is not a lot, especially since it is translated as a limit at the OS level :
netsh int ipv4 show dyn tcp
showed a range matching ServiceFabric configuration.
ElasticSearch is rather liberal with its use of ports, and we rather quickly mostly exhausted the allowed range. Since connections keep being opened and closed, something could sometimes work, but when there really was no port available, it meant a failure to do any kind of connection, including with the DNS : that’s why sometimes hostnames couldn’t be resolved.
Of course the solution was simple : change the ClusterConfig.json with more reasonable values (we went for 5000 ports).