Auto-generated binding redirects

In .Net full framework, when using strong naming, handling of different versions of dependencies has been tedious.

Until somewhat recently, that is. Today, we can slap a <AutoGenerateBindingRedirects>true</AutoGenerateBindingRedirects> into our csprojs (or better in a Directory.Build.props file), job done, right ? Well, not quite…

Class Libraries

A first small hurdle comes from library projects. By default, there won’t be a .config file generated, since usually only the main .exe entry point’s config file is used at runtime. However, in some hosting scenarios, a class library may be loaded in a dedicated AppDomain, where the corresponding .dll.config will be injected. This is often the case for test runners and the corresponding test assemblies, for instance. However this is just another property : <GenerateBindingRedirectsOutputType>true</GenerateBindingRedirectsOutputType>.

What about libraries used as plugins ? For now, I have no silver bullet there. If the entry point has no idea of the dependencies that may run in the final process, there can be no appropriate binding redirect generated. Here are a couple of possible mitigations :

  • Similarly as what some test runners do, have the plugin code run in a dedicated AppDomain, where its own .dll.config file can be used. However this seriously limits the possibilities for seamless integration, with the need for serialization and/or MarshalByRefObject.
  • Sometimes you want to write code mostly against interfaces, but at the topmost level, the set of implementations is well known. In this case, this topmost entry points can have very little code, but be used mostly for packaging, with references to all assemblies needed at runtime. Then these entry points will have correct binding redirects.

A weirder corner case

Imagine that you want to deliver some code as a (private) NuGet package. You have a rather small API surface, but a much larger volume of code for the implementation(s). So you package only a handful of dlls in the lib part of the package, with relevant interfaces and factories. At runtime these dlls (at least the factories) use a bunch of other dlls, which are copied into the final output directory by a specific <Target> specified in the build/<package>.targets file of the package. How well will binding redirects be handled in this case ?

TLDR; It’s complicated, some things may work by chance, with a small change breaking stuff…

Old style projects

Or rather packages.config project, though I didn’t check (we use PackageReference only in projects migrated to SDK-style). Then things are rather simple : all directly referenced assemblies are checked, then their references, and so on. The referenced assemblies are searched in a few location, including the output directory. Of course, as soon as a reference is not found, lookup stops there with a message from the compiler about potential issues (if msbuild log is verbose enough). Therefore, after a first build, as long as things have a chance to work at runtime, all transitive dependencies are present, and all binding redirects will be properly generated.

SDK-style projects

In this case, the output directory is not used for lookup : since transitive references and dependencies is the rule, everything is supposed to be found just by recursively following explicitly stated dependencies (including content of the proper lib/ folders of NuGet packages). Our use-case is broken in this case : the dlls that are copied are not found, and there own dependencies are not considered when generating binding redirects.

Things may still work by chance if other “normal” references have a conflict with the same dependency : then as long as we end up with the latest version, the generated binding redirect is OK, even if the missed conflict would be with another (older) version.

There is another very strange situation (which actually happens for us) where things may work : by default <None> items that happen to be assemblies are considered when looking up dependencies. The default <None> items include all files within the project (<None Include="**/*" Exclude="$(DefaultItemExcludes);$(DefaultExcludesInProjectFolder)" /> in Microsoft.NET.Sdk.DefaultItems.props), except that without surprise, DefaultItemExcludes excludes the output directories. However, in our setting, most non-toplevel assemblies use the same output directory, which is defined by defining BaseOutputPath in a Directory.Build.props file. Then, only for the top-level assemblies (those for which we actually want binding redirects), we override OutDir to the bin directory next to the csproj. This is where the twist is : there is a bug in MSBuild, which makes the OutDir as we define it, with a leading .\, to be ignored. So at the end, in this case, all the bin directory ends up included in the <None> items, and considered as a potential reference. So here again, after a previous build has copied the dlls, further compilations will consider them, and the binding redirects will be correctly generated. Needless to say, we don’t want to rely on this !

Conclusion

Binding redirects in .config files should be mostly a thing of the past. However, until we move fully to .Net Core, or at least ditch strong naming, they will be needed. Generating them automatically is often fine, but sometimes not enough : rigorous testing is needed to identify edge cases. And in these cases, it helps to understand how things really work, so that we can find efficient workarounds.

A test of (im)patience

Among the qualities that help someone working intensively with computers, one that I’m definitely not short of is impatience. So when, as I was investigating an issue with the ELK stack, I had seemingly random long freezes of Kibana, I couldn’t simply live with it…

I had a simple local setup to try to reproduce my problem : out-of-the-zip elasticsearch and kibana servers, and nginx in front, proxying /kibana to localhost:5601. While changing the configuration of kibana to account for nginx, I set server.basePath (to /kibana), but also server.host to 0.0.0.0, matching the real environment (where nginx may be on another machine).

When checking the network tab in Chrome’s DevTools, I was seeing some more or less random calls taking almost exactly one minute to complete, very far from the usual few tens of milliseconds. The logs of nginx gave it away : sometimes, nginx would try both the ipv4 and ipv6 looopback addresses, starting with the ipv6 one. I would have guessed that since kibana was only listening on the ipv4 addresses, that would immediately fail, but as it happens, it only timeouts, after (of course !) exactly one minute. Then nginx tries the 127.0.0.1 address, which works normally, and keeps using it for a while, until it deems a good idea to check again if there is now someone listening on [::1]:5601, and bam, another (very) long call. The fix is either to configure nginx to send to 127.0.0.1:5601 instead of localhost:5601, or to set server.host to ::0 in kibana’s config.

And the issue I had in the first place ? It turned out to be an actual bug in kibana, which was much easier to find without constantly losing focus because of the one minute pauses.

Git and line endings

I’m reasonably proficient with git, but I came across the core.safecrlf configuration for the first time only today. This is the opportunity to consolidate my thoughts on the subject of line endings in git.

What happens inside git repo

Due to its origins in unix/linux world, git thinks the “normal” way to end lines is with a line feed (lf, 0x0a). So this is always the way it will store files internally when it decides to normalize them. Now when does this normalization happen ? When git thinks it is a text file, which happens in two cases :

  • The preferred way : when there is an appropriate entry in a .gitattributes file, typically a * text=auto line, where git will use its heuristics to decide if a file is text or not. The advantage is that the .gitattributes file(s) come with the repo, so there is no risk of local settings to come in the way.
  • When core.autocrlf is set to true or input, and the same heuristics as above identify a text file.

I used to think that on a pure Windows project, there was no reason not to keep the native line endings (carriage return followed by line feed, crlf, 0x0d 0x0a) all the way. However, the “new Microsoft” general openness means it is less and less frequent to purely stay in Windows world. Moreover, having git know additional info about files through .gitattributes has other advantages. So now I would advise to always include a .gitattributes file and let git do its magic with text files.

How about the working directory ?

This is where it gets a bit more complicated. One of the reasons is that git configuration comes from multiple places, so for instance the local .git/config settings override the global ~/.gitconfig ones. A nice trick to better understand your configuration is the command git config --list --show-origin. Another invaluable command is git ls-files --eol, which will show you for each file its line ending style in the repo (index) and in the working directory, along with its attributes.

  • Of course any manipulation done by any tool or editor may affect the current state of your files, here we’re interested in what’s there after a clone or checkout.
  • First, if there is an attribute eol for a file (identified as text), the corresponding value (lf or crlf) is always used. A classical use is *.sh text eol=lf, where even on Windows, you want your bash to understand the script
  • Else, if core.autocrlf is set to input, nothing is done, which usually means lf are used
  • Else, if core.autocrlf is set to true, crlf are used
  • Else (i.e. core.autocrlf is set to false), the value of core.eol is used : either lf, crlf, or native, the default, where the platform usual ending is used.

warn: CRLF would be replaced by LF in file

Before staging a file, git checks if a round-trip commit then checkout will leave it unchanged. A rather frequent case when this is not the case, is when using cross-platform tools under Windows, with core.autocrlf set to true. If the tool generates of modifies a file so that it has lf endings, then it will of course also have lf within the repo, but theses will be changed back to crlf on checkout ! Similarly, if core.autocrlf is set to false or input, with a handling of line endings through .gitattributes, then a text file with crlf will be normalized to lf in the repo, and checked out the same, losing the cr.

This is where core.safecrlf comes into the picture : when a staging is deemed not reversible, according to its value, git will either prevent the operation (if set to true), issue a warning (set to warn, the default), or do nothing (set to false).

Conclusion

With most tools on Windows now properly handling lf, I’m not sure that core.autocrlf set to true is that useful. So I think that I will go with input for the global setting, with possible overrides on per-repo basis.

Since most of my repos have (or will have) an appropriate .gitattributes file, I’m confident enough that text files are handled properly, and since my tooling is not always consistent eol-wise, I’ll set core.safecrlf to false globally, trying not to forget overriding it in the few repos without .gitattributes.

For any new project, I’ll have a .gitattributes file along the lines of :

* text=auto

*.sh text eol=lf

*.sln text eol=crlf
*.bat text eol=crlf
*.cmd text eol=crlf

Failure of Let’s Encrypt certificate renewal

I recently received a mail from Let’s Encrypt warning me that the certificate for this very site was nearing expiration.

Now that was a bit strange, as this was handled by my Synology NAS, behind my Freebox, and everything was supposed to be automatic. But sure enough when looking at the NAS’ /var/log/messages, there were errors over there, pretending that the renewal failed because of a failure to open port 80. Of course it is opened (as well as port 443), how do you think you’re reading this !

I tried to renew manually : /usr/syno/sbin/syno-letsencrypt renew-all -vv, more info, but at the end "failed to open port 80.". I tested a few things, reached the five tries per hour limit of Let’s Encrypt, so had to sleep over it. At that time, I took a closer look to the verbose output, and noticed that the domain name was correctly resolved to both my ipv4 and ipv6 addresses, and that the latter was chosen. And that was the culprit : my site simply didn’t answer on the ipv6 address (although the address pinged).

I don’t know yet what’s wrong with the ipv6 setup, but for now the workaround is simple : I removed the AAAA entry from my domain provider, and the renewal worked without further trouble.

EDIT November 2019 : I finally took a closer look to the ipv6 issue : I simply hadn’t realized that there is no longer a NAT in this case, each device has its own address ! So I needed the AAAA entry to point directly to the public ipv6 address of the NAS, instead of the address of the Freebox.

Where are my ports ?

As fashion would have it, we’re more and more using cloud services, either private of public (and don’t get me wrong : there are a lot of benefits doing so !).

So here I am, provisioning VMs in our private cloud, setting them up as nodes of a ServiceFabric cluster, and deploying my services there.
Everything was running well enough, until I wanted to add ElasticSearch and Kibana as guest executables. Then strange things happened, such as nodes failing to talk to each other, but a bit randomly. For instance ping <hostname> could fail, while ping <ip address> would work.

The root cause was the default setting of ServiceFabric regarding port usage, with only 300 ports allowed (both for ephemeral ports and application ports). This is not a lot, especially since it is translated as a limit at the OS level :
netsh int ipv4 show dyn tcp
showed a range matching ServiceFabric configuration.

ElasticSearch is rather liberal with its use of ports, and we rather quickly mostly exhausted the allowed range. Since connections keep being opened and closed, something could sometimes work, but when there really was no port available, it meant a failure to do any kind of connection, including with the DNS : that’s why sometimes hostnames couldn’t be resolved.

Of course the solution was simple : change the ClusterConfig.json with more reasonable values (we went for 5000 ports).