Failure of Let’s Encrypt certificate renewal

I recently received a mail from Let’s Encrypt warning me that the certificate for this very site was nearing expiration.

Now that was a bit strange, as this was handled by my Synology NAS, behind my Freebox, and everything was supposed to be automatic. But sure enough when looking at the NAS’ /var/log/messages, there were errors over there, pretending that the renewal failed because of a failure to open port 80. Of course it is opened (as well as port 443), how do you think you’re reading this !

I tried to renew manually : /usr/syno/sbin/syno-letsencrypt renew-all -vv, more info, but at the end "failed to open port 80.". I tested a few things, reached the five tries per hour limit of Let’s Encrypt, so had to sleep over it. At that time, I took a closer look to the verbose output, and noticed that the domain name was correctly resolved to both my ipv4 and ipv6 addresses, and that the latter was chosen. And that was the culprit : my site simply didn’t answer on the ipv6 address (although the address pinged).

I don’t know yet what’s wrong with the ipv6 setup, but for now the workaround is simple : I removed the AAAA entry from my domain provider, and the renewal worked without further trouble.

EDIT November 2019 : I finally took a closer look to the ipv6 issue : I simply hadn’t realized that there is no longer a NAT in this case, each device has its own address ! So I needed the AAAA entry to point directly to the public ipv6 address of the NAS, instead of the address of the Freebox.

Where are my ports ?

As fashion would have it, we’re more and more using cloud services, either private of public (and don’t get me wrong : there are a lot of benefits doing so !).

So here I am, provisioning VMs in our private cloud, setting them up as nodes of a ServiceFabric cluster, and deploying my services there.
Everything was running well enough, until I wanted to add ElasticSearch and Kibana as guest executables. Then strange things happened, such as nodes failing to talk to each other, but a bit randomly. For instance ping <hostname> could fail, while ping <ip address> would work.

The root cause was the default setting of ServiceFabric regarding port usage, with only 300 ports allowed (both for ephemeral ports and application ports). This is not a lot, especially since it is translated as a limit at the OS level :
netsh int ipv4 show dyn tcp
showed a range matching ServiceFabric configuration.

ElasticSearch is rather liberal with its use of ports, and we rather quickly mostly exhausted the allowed range. Since connections keep being opened and closed, something could sometimes work, but when there really was no port available, it meant a failure to do any kind of connection, including with the DNS : that’s why sometimes hostnames couldn’t be resolved.

Of course the solution was simple : change the ClusterConfig.json with more reasonable values (we went for 5000 ports).