Windows Network File System flaw results in arbitrary code execution as SYSTEM

Friday 15th July 2022 14:53 GMT Nate Amsden

Windows NFS was so terrible

Maybe it's better now but am not holding my breath. Several years ago I tried to deploy a pair of "appliances" from HPE that ran Windows 2012 Storage server specifically for NFSv3 for Linux clients (and maybe 1% SMB). I chose this because it was supported by HPE, it could leverage existing back end SAN for storage, and it was highly available. Add to that I only needed maybe at most 3TB of space so buying a full blown enterprise NAS was super overkill as nobody made such small systems. SMB NAS units(Synology etc) I didn't deem acceptable levels of high availability.

I figured my use case is super simple(file storage, not transactional storage), my data set is very small, my I/O workload is pretty tiny, it's fully supported so how hard could it be for Windows to fill this role?

Anyway, aside from the 9 hour phone support call I had to sit on while HPE support tried to get through their own quickstart guide (I was expecting a ~10 minute self setup process) due to bugs or whatever(they ended up doing manual registry entries to get the two nodes to see each other which were connected directly via ethernet cable), the system had endless problems from a software perspective. 99% of the software was Microsoft, the only HPE stuff was just some basic utilities which for the most part I don't think I even used outside of the initial setup.

I filed so many support issues, at one point HPE got Microsoft to make a custom patch for me for one of the issues. I never deployed the patch purely out of fear that I was the only one in the world to have the issue in question(that and it wasn't my most pressing concern of all the problems I was having). I should of returned the systems right away but I was confident I could work through the problems given time. I had no idea what I was in for when I made that decision. I was also in periodic contact with HPE back end engineering for this product though their resources were limited as the software was all Microsoft.

The systems never made it past testing, maybe 6 months of usage with many problems and support cases and workarounds and blah blah.. I designed the file system layout similar to our past NAS offerings from Nexenta(VM based), and FreeNAS (also VM based). There were 5 different drive letters, one for production, one for staging, one for nonproduction, one for backups(with dedupe enabled) and one for admin/operations data. The idea is if there is a problem on one of those it doesn't impact the others.

The nail in the coffin for this solution for me, was at one point the backups volume (which had dedupe enabled) gave an error saying it was out of space(when it had plenty of space according to the drive letter and according to the SAN). When this happened the entire cluster went down(including the other 4 drive letters!). The system tried to fail over but the same condition existed on the other node. I had expected that just that one drive letter/volume to fail, that is fine let the others continue to operate. But that's not what really happened, all data went offline. I worked around the issue by expanding the volume on the SAN and it cured it for a while, until it happened again, all volumes down because one got full. WTF.

I tried to figure out a way to configure the cluster so that it would continue to operate the other 4 drive letters while this one was down. Could not figure it out(and didn't want to/couldn't wait hours for support to try to figure it out). So I nuked the cluster. Went single node, from that point on if that drive letter failed(if I recall right anyway this was years ago) the other drives remained online. I assume the source of the "out of space" error was some bug or integration issue with thin provisioning on the SAN, but was never able to figure it out. I do recall getting a similar error from our FreeNAS from the same array, there was plenty of space but FreeNAS said there was not(filesystem was 50% full). This issue ONLY appeared on volumes with dedupe enabled (Windows side dedupe, and FreeNAS side dedupe). I've never seen that error before or since. It was an old array so maybe the array's fault. I don't mind that particular volume going offline (it only stored backups), but the final straw was that volume being offline should not of taken the rest of the system down(in the case of Windows cluster) with it.

But I had so many other annoying issues with NFS on Windows, I'm a Linux person so I didn't have high expectations for awesome NFS from Windows but again my use case was really light and trivial(biggest requirement was high availability).

In the end I migrated off of that Windows NFS back to FreeNAS again(had replication but no HA, so I never applied any OS updates to the system while it was in use, fortunately no failures either), before later migrating to Isilon. Makes me feel strange having 8U of Isilon equipment for ~12TB of written data(probably be closer to 7TB after Isilon overhead, more data than original because years have passed and we have since grown). But I was unable(at the time) to find a viable supportable HA NAS head unit offering to leverage shared back end storage. I was planning on going NetApp V-series before I realized the cost of Isilon was a lot less than I expected(and much less than V-Series for me at the time especially if you consider NetApp licensing for both NFS and SMB).

(When I first started out we used Nexenta's VM based system, and they officially supported high availability running on VMs. Initially this worked fine, however in production it was a disaster as the Nexenta systems went split brain on several occasions corrupting data in ZFS(support's only response was "restore from backups"), things were fine after destroying the cluster and going single node but of course no HA anymore)

8 0 Reply
1. Saturday 16th July 2022 00:06 GMT Androgynous Cow Herd
  
  Re: Windows NFS was so terrible
  
  Nexenta was never intended to front a SAN. That directly from my counterpart over there as a solution architect. Even though Compellant used Nexenta just that way, the guys who knew it best HATED it used as a SAN gateway.
  
  Windows 2012r2 was actually pretty damned usable for NFS - I had a couple of large implementations of NFSv3 fronting Nimble Storage iSCSI SANS, they were performant and rock solid. Outperformed much more expensive NTAP systems Never tried it for V4, and at the time Nimble didn't have deduplication at all. No problems there, we weren't using the NFS for VMWare .vmdks as only a moron would do that when a well integrated block system is available.
  
  Your issues sound like you might have been running out of inodes (dedupe COULD exacerbate that) and the inode limit for Windows based NTFS systems is not all that large. That would explain "out of space" when you have plenty of physical space - no addresses available to store the data. I've seen that in some largish Veritas FS environments.
  
  WIndows NFS PRIOR to 2012r2 *was* terrible, but after 2012r2 dropped it was very usable for even some fairly high performance work loads with the right SAN mounted to the host. If you haven't tried it and tested it personally, spare me the hate mail...and I KNOW there are better systems out there for NFS. But Windows isn't the worst of it.
  
  1 0 Reply
2. Sunday 17th July 2022 23:01 GMT david 12
  
  Re: Windows NFS was so terrible
  
  From your description, the node went down, so you got an error saying 'out of space'. The node failure was general, the cluster went down, but you never identified the actual point of failure.
  
  0 1 Reply
Friday 15th July 2022 15:33 GMT simpfeld

I used Windows NFS

But only to share Windows files out for Linux desktop users. So light usage. It was okay for this (maybe very okay), but the lack of reasonably handling of filename case issue would prevent me from doing anything deeper with it.

One thing was there is no group ownership file that you can set in the GUI even though this is stored in NTFS and used by their NFS. I had to write my own chgrp in PowerShell :(

Would have been easier if they had just exposed the NFSv4 ACL's mapped from NTFS ACL's (they are pretty identical), but they only map ACL entries for owner, group and other (everyone on NTFS) to basic unix perms. Hence the need for a chgrp.

1 0 Reply
1. Friday 15th July 2022 19:50 GMT Sudosu
  
  Re: I used Windows NFS
  
  I generally use Community OmniOS with Napp-it for NFS.
  
  It is pretty rock solid, I still have one running the ancient commercial version which has not been rebooted in several years as an NFS host for around 30VMs.
  
  The only issue is if you have strange\new devices on the system you want to install on, sometimes they won't work, or work correctly.
  
  1 0 Reply
Saturday 16th July 2022 07:01 GMT Plest

Getting to the point where we don't actually do anything useful in IT other than constantly patch our overly complex systems!

PHBs go nuts with the yearly budgets buying all sort of kit, we get ordered to install it 'cos someone got a kick back and we spend 75% of our working day just scanning bug/sec reports and checking when the patches will be out for the 63 unique pieces of kit we've been dumped with!

3 0 Reply
Saturday 16th July 2022 10:47 GMT Roland6

Buffer size miscalculation = overflow waiting to happen

>"The server calls the function ...to calculate the size of each opcode response, but it does not include the size of the opcode itself."

I anticipate there are other similar buffer overflow issues caused by developer miscalculation, lurking in system software.

I wonder if we will get the hear about all the vulnerabilities in the massive amounts of code underpinning cloud...

1 0 Reply
Saturday 16th July 2022 18:53 GMT Henry Wertz 1

NFSv4 is complex

NFSv4 really is overly complex.

Don't get me wrong, this error is on Microsoft, but...

never used NFSv1, NFSv2 has simple read, write, file locking, directory read, file create, file delete type operations, it was stateless (Locks aren't *quite* stateless, but basically if the NFS client didn't keep extend the lock every 30 seconds or so, the lock went away on it's own, and the client was expected to re-grab it's locks if the server temporarily went down then came back up.)

NFSv3 added a few features meant to speed up common operations, and of course most crucially support for >2GB files, but stayed fairly simple.

NFSv4? It has file delegations, multiple types of locks, mechanisms to indicate you have writes waiting but haven't written them back yet, mechanisms to say "I have this data in *my* read cache, inform me if the file changes on disk"), the odd decision to switch ACL support from POSIX to NTFS-style (an odd decision since apparently Windows itself does not present the NTFS-style ACLs to clients... also odd because why wouldn't you just make ACL support general-purpose so it can support both?) If you look at even a list of supported NFSv4 "opcodes" (or whatever you want to call them), there's loads of them, some are optional for both the client and server to support so you can send up with multiple code paths both client and server to handle different combinations of supported options (or not and end up with mysterious buggy behavior depending on what combination you are using.)

Don't get me wrong, I've used NFSv4 and undeniably the performance on it is good, and I have had a drama-free experience when I've used it now and then. (The one time I had problems with NFS exporting a fuse-based filesystem, I think it was a kernel bug because I had the same problems exporting it with NFSv3.) Just saying NFSv4 is far more complex than NFSv3 or the likes of sshfs.

4 0 Reply
Saturday 16th July 2022 21:53 GMT Lorribot

Used NFS once because I needed a target for a bit of dumb kit that would only backup its config to an NFS share. Simple to set up and do and worked. I then uninstall the feature form the server and never used it again. Anything worth using will support FTP if nothing else so we just use that and TFTP for these sorts of things.

If you don't install the feature you won't see the problem, however it is a timely reminder that you should always patch your system after installing any feature.

And just for balance, Samba has had its fair share of woes over the years and can be a real pain to implement on some Nix a like systems and for that reason we just use FTP for file transfers.

0 0 Reply