
A better long-term approach...
... Would be to disallow contributions from anyone who has anything to do with Google.
Linus Torvalds has dished up one of his most strongly worded Linux kernel mailing list posts in years, lashing a contributor from Google for his suggestions regarding filesystems. The subject of Torvalds's ire is inodes, which as Red Hat puts it are each "a unique identifier for a specific piece of metadata on a given …
... er, correct me if I'm wrong, but I thought Pottering had been offered a job at Micro$oft and took it .... But that aside, the main users and customers of Linux, are also the major contributors, and that starts with Intel & AMD, then goes to Google and Redhat and Amazon Web serv and such.
It is more than reasonable to be watchful and wary about these people, but being outright hostile is not helpful. 3/4 of the code we all love, was written by these people.
Personally, the idea that Micro$oft are a major contributor to Linux puts all the hair on the back of my neck up, but they are.
You have misread it. The discussion there is about ways to replace open source with something else that would be easier to weaponize. Current open source is really quite difficult to treat that way; while Linus himself could probably prevent Red Hat from contributing to Linux, few others could do so unilaterally, and it would take a large group to do it without Linus's support. Should this happen, it would be possible for some group to fork the code and try to make that the canonical (little c) version. They might or might not succeed, but they have the ability and right to do so. There are some who would like more ability to control code to prevent people from doing things they don't like with it, but it is opposed to existing requirements of free/open source as defined by both FSF and OSI definitions and the licenses that implement them.
They didn't break any part of the gpl, they just stopped offering a nicety that was advice/beyond the gpl.
Call it lawyerly but you either meet the terms of the gpl or you don't.
What company contributes to more areas of the kernel than RH? That's excluding all the maintainers they employ, and supporting projects.
Does the GPL allow me to distribute copies under a nondisclosure agreement? (#DoesTheGPLAllowNDA)
No. The GPL says that anyone who receives a copy from you has the right to redistribute copies, modified or not. You are not allowed to distribute the work on any more restrictive basis.
IIRC redhat were saying that only licence holders could get the code and the licence terms said you couldn't redistribute it.
People don't want these to participate, because
Google = they're the enshittification overlords
Red Hat has a very opinionated development style that doesn't favor customizability, they don't seem to care about the average Linux user, and they killed CentOS.
In general, there are no factual reasons to prevent them from contributing, but there are plenty of emotional ones. :-)
> OMG! Imagine the horror if Micro$haft became the biggest code contributor to Linux, usurping RedHat from the No.1 slot. Oh, wait...
That would be fine. RedHat has managed to be even bigger dicks about Open Source than Microsoft. And Microsoft has apparently reformed from the Halloween documents days.
Microsoft didn't try to conceal their ire back in the day. RedHat (IBM) are being far more insidious, saying "We're Open Source" then penalizing anyone that tries to be Open Source with their product.
Ahh come on... Rostedt is simply employed by Google. He's been a kernel contributor for a very long time. This isn't "Google" it's an experienced kernel developer with a valid opinion. Torvalds doesn't want to break stuff either.
The kernel mailing list doesn't exist for tabloid journalism to cherry pick quotes from, it's how they communicate in development of the kernel.
I'm on the mailing list. Have been since the year dot. I've never been yelled at. Even though I've made mistakes.
Linus is a nice guy. Unless you are the type that is too boneheaded to admit you are wrong, fix your mistake, and move on. Frankly I'm surprised that Linus has been as tolerant as he has all those years ... If it was my name attached to the project, I'd have really lit into a few of the fucking idiotic prima donna drama queens.
I was trying to search for Linux eventfs, but couldn’t seem to find anything useful. I think it might actually be tracefs that is under discussion here, and you’ll note Rostedt’s name also turns up on that man page.
You may have heard of the Unix tradition that “everything is a file”. On Linux, things are more finely distinguished than that. You can have
* Everything is a file
* Everything is a file descriptor
* Everything is a filesystem
and there are examples of all three. This one is an example of “everything is a filesystem”.
NOT in the kernel itself, however.
The systemd-cancer is not the kernel, is not required by the kernel, and has no hooks in the kernel, and according to the kernel devs (including Linus), it never will.
Thus, building a Linux Distribution without the systemd-cancer is actually pretty easy. I prefer Slackware, YMMV.
This post has been deleted by its author
> Linux today is about making things more cumbersome ...
Easy to think that way, but Linux is encountering the real world, and new requirements that did not exist back when the original Unix was designed. The original design was simple, elegant - and quite inadequate for today's tasks.
And like any old system (more than 30 years by now), Linux has to maintain enough backward compatibility, in order to not lose its user base.
It would be more interesting to learn that Linus hadn't been shouty for X days and that he had broken a new record for himself.
Otherwise I like the shouty version of Linus and one always has the option to shout back. I am one of those people that shout back and in general the initial shouting match becomes the first and last (gotta make sure that you have grounds to shout back though , it's very easy to become the fool if not).
Every time that I have been butthurt it usually meant that a lesson was necessary and it helped me grow as a person.
Each to their own I suppose.
A shouting match at work is completely unprofessional, and rarely achieves the desired outcome, as people get their backs up, and simply harden their positions. A quiet word over a coffee, after a meeting to point out the failures is a million times more effective.
When you're talking a situation involving volunteer contributors, getting shouty is about the worst thing you can do. There's no faster way to lose volunteers then to start yelling or apportioning blame for mistakes. People dont sign up to be unpaid volunteers, in order to be made to feel bad. They are there to try and help, and be part of the community, but they'll very quickly decide they've got better things to do, if people are d%cks.
Linus should reconsider the value he places on those unpaid volunteers. Without them, he'd be royally screwed...
Let's see, it's been what, 32 years and counting that the Linux Kernel has been overseen by Linus with his BDFL hat. It has continued growing, and making inroads into virtually every aspect of day-to-day life.
Sounds to me like he's been doing something right, despite all those poor, downtrodden, unpaid volunteers that you invoke.
Maybe, just maybe, you aren't really as clued into the situation as you think you are.
Your right, Jake I dont have a clue about the specifics of Linux. It's never interested me. I DO know about volunteering though, and I guarantee that if you start abusing volunteers in a normal setup then you quickly lose those volunteers.
What Doctor Syntax below wrote about most of the contributors now being paid by their firms to contribute, goes a long way to explaining a lot, about why people may be willing to put up with it. People will put up with a lot of sh%te when they're paid too.
I also stand by my comment that shouting in the workplace is completely unprofessional and counterproductive.
You can disagree with my comments specifically about Linux, perhaps I should have left out the last line in my first comment. But I was mainly answering Khaptain's comments that getting to shouting matches at work is fine. I disagree. Strongly.
"I also stand by my comment that shouting in the workplace is completely unprofessional and counterproductive."
The situation is not a workplace as you understand it. You might, in a normal workplace, take the person concerned aside and have a quiet word. In the context of Linux kernel development stuff happens on the mailing list under public scrutiny. The equivalent to your quiet word in private would be to email outside the mailing list.
Once that starts it's likely to be repeated next time a similar circumstance comes up. And again. What now has happened to development on the mailing list under public scrutiny?
It's irrelevant how long he's been doing it; the fact is that shouting at volunteers, in particular, is a bad move and will stop them from contributing as well as making people think twice about that being a project they would consider getting involved with.
You claim he must be doing something right, but you have no way of knowing how much better the project may be doing if he wasn't such a d1ck about things like this.
> You claim he must be doing something right, but you have no way of knowing how much better the project may be doing if he wasn't such a d1ck about things like this.
You mean one of the widest deployed operating system kernels on the planet? As much as I wish he'd be a bit more chill, I have a hard time figuring out how Linux could be doing better than it already is.
"Linus should reconsider the value he places on those unpaid volunteers."
Someone who works for Google is unlikely to be an unpaid volunteer. Most kernel contributions come from companies who want something in the kernel for their own purposes in the first place. Although such contributors are such as employees of someone none of them are Linus's employees, nor is he their line manager. As a maintainer he has no ultimate sanction except to insist that stuff he doesn't think belongs is either fixed to fit or kept out. Very occasionally - and it always was AFAIK, very occasionally, that is going ot lead to a heated situation with someone who won't back down.
OTOH, what's happening with inodes?
I've heard that getting yelled at by Linus is considered a badge of honor - and it means whatever you're working on is important enough for him to take note.
That being said, as Linus himself said, making personal attacks on someone is never ok no matter how heated the discussion gets.
You have a point, in a work place, shouting managers aren't good. However, volunteers have the choice to just not volunteer anymore and we don't know the full context or conversation.
Like mine for example last week. Having issues with our phone system so I test with a user from home. Called them fine, 2 times, no drops. Got them to call me later, talked for a bit, all fine. Then they said they were going to call the person they had isseus with, so I put the phone down, only for them to call me back a few seconds later.
Oh, this time the issue has happened. I can't hear anything, remote to their machine (which they accept, that's important) and I can see the mic levels their end moving as I talk so I know they can hear me. Eventually, I cut the call after taking my trace but before I spend hours looking at the trace I check "Did you call me after we got off the phone". Several hours pass before I get "No". I said but you did, I was then connected to your machine so you were there to accept that connection. "I never called you". OK but the software SHOWS you called me, look at the time. "I'm sorry, I don't know, I didn't call you".
WTF!?!
I wanted to shout. They have no memory of calling me shortly after despite being there to witness the fucking call because they accepted the remote connection. Jesus fucking christ!
If the people in Linux's e-mails are like that, you can't blame him for being shouty.
Leaving the realm of Linux entirely, I think you've misinterpreted the statements that led to this part:
"However, volunteers have the choice to just not volunteer anymore"
Yes, they do, which is why you generally want to stop that from happening. In many places where you have volunteers, they're not that easy to get and can be really important to whatever you're doing because, if you didn't have them, you'd either have to pay someone to do what they're doing or do without whatever they're doing. In most situations where there are volunteers involved, they are a major asset. Mistreating them can be even more harmful than mistreating an employee because the volunteer can usually just quit at a moment's notice, whereas the employee might hang around long enough for someone to apologize and fix things.
I know how frustrating a support call can be, but that doesn't change any of the harms that getting shouty can have. Even if it is entirely their fault, getting angry at them often will just extend the process. For example, in your situation, they could have accidentally called you as they were changing focus because the call system's interface makes that too easy, but they didn't know that they had done it. They were using headphones but had taken them out because they weren't on a call, meaning they couldn't actually hear you. Then they got your remote request, and having just talked to you, they accepted it because they didn't understand. A few hours later, they don't know what you're talking about with this second call idea because the call was ended without them ever looking at it. That is a possibility, and shouting wouldn't help to resolve it.
"A shouting match at work is completely unprofessional, and rarely achieves the desired outcome, as people get their backs up, and simply harden their positions. A quiet word over a coffee, after a meeting to point out the failures is a million times more effective."
Agreed. The guy appears to have some unresolved mental health issues and may see some benefit from counselling and antidepressant medication.
The sad thing is that many people in the comments seem to view this abusive behaviour as amusing or quirky. It's not.
> have the balls to not hide behind
Huh? All of these weird handles are "things to hide behind"!
Unless your parents had you christened "Necro", from the famous Shropshire Hamsters?
(In other words, if all you have as a comeback is "hiding behind AC" then - you've lost)
Ballcocks!
Sometimes a quick 20 second slanging match is needed to clear the air. Then immediately do what they both did, calm down, sit down and talk through what caused things to get so bad, work the problem over and then everyone moves on. That's how frustrated adults often clear things up and form stronger bonds. If you have something to say then just come out and say it, have it out and then we have something to work on. That's how stuff sometimes has to be solved when you mix egos and testerone.
"That's how frustrated adults often clear things up"
Yes, and for those of us who prefer to be non-confrontational (I'm there to work to get paid, not be be harassed or assaulted), it's really fucking annoying. My usual response is to simply walk away. One of these times I swear to god I'm going to keep walking and never go back. I'm not somebody's punching bag, I'm not a scapegoat, and keep your fucking testosterone in your pants. That sort of behaviour is utterly unacceptable in the workplace unless, maybe, your line of work is WWF contests.
"If you have something to say then just come out and say it"
This I agree with. The backstabbing gossipy office politics is awful. But - and it's a really big BUT - there are ways of saying what needs said. If you can't think of a way to say it that isn't going to start a conflict, well, then you're probably not the one to be saying anything. People, on the whole (unless they are total dicks), will respond to constructive criticism. Many of us take some measure of pride in what we do (even if it is mindless and boring) because it's our little contribution to the world. If somebody has ideas for improvements, sure, let's hear them. But if that means starting by pointing out all that they think is wrong, well, sorry, bugger off. I'm not okay with a pile of negativity. Be polite, but most of all, be kind.
"That's how stuff sometimes has to be solved when you mix egos and testerone."
What an utterly toxic workplace that must be.
That's not how kernel development works. Most of the heavy hitters are employed by companies with vested interests.
Further, if you think that's "flaming", try putting on pretentious airs of professionalism around me. It was a discussion between two people who have been working together for a long time. I doubt any appreciate sites cherry picking quotes from their development communications environment. Nice of them to keep it public so anyone can join in, but this is the kind of crap that happens.
A large number of people on the mailing lists are PAID developers from companies which use Linux, posting on work time, etc.
Most of the drama queening I've seen over the years has involved those who "do linux" for their work, as part of their dayjob and have been told to be on the kernel list as part of their job description
Volunteers usually listen to feedback, don't keep pushing bad ideas and take major pride in the quality of their work
In this instance, inodes as unique identifiers isn't wonderful but it's all that we currently have. I've had to ponder somthing similar when trying to optimise backup strategies and work out where virtually all backup software isn't doing very well (most of it has major bottlenecks in various parts of the process which badly limit throughput once you hit LTO8+ tape speeds and volumes)
1. I've looked at the inodes / dentries / vfs arguments/flames/etc., on lore.kernel.org, which are the basis of TFA, and am still unenlightened. (It doesn't help that the lore.kernel.org website does not present the emails in chronological order.)
2. The apparent, "surface" issue which one would presume from TFA could easily be solved as within a given filesystem, as inode numbers could be made unique-within-a-hard-or-flash-drive by simply using the C*H*S of their first block (if they're on old drives), or the block number of their starting block (if they're on newer drives) as the "inode number". Alternatively, they could simply be assigned consecutive integers as they are created. The sequence: 0, 1, 2, 3, ... doesn't seem "difficult" to me.
3. But, the surface issue is not the true issue. Or issues.
4. I suspect one person is mentally-visualising one thing, and/or constraint, the other person, something else, and Linus became frustrated.
Can anyone explain to me why Linus thinks inode numbers may now all be "the same" without causing problems (remember that VFSes are involved)?
It is a bit abstruse. Just to complicate things, it seems they are not talking about a real filesystem where users create files that take up space on disk (or other persistent storage), but one of these virtual ones that the Linux kernel defines to implement its myriad of userland APIs. Still, it has to have enough of the behaviour of a real filesystem in order for file-manipulation tools like tar to behave reasonably, and that seems to be the crux of the argument.
Well, at least, I think so ...
Should it have been the subject of an article on The Register? Frankly apart from the point about Linus telling off somebody for not being clever enough (yawn), I don’t think so.
As I understand it, all file systems are Virtual File Systems as far as the userland is concerned: there is a defined set of functions that a file system must implement whether it's a physical filesystem, a network filesystem or some sort of fictional emulation of a filesystem. These become available to the userland system calls when the filesystem is mounted. Whereas it's a benefit to the programmer that there is a standard filesystem interface, the problem in the real world is that filesystems are not all the same.
The VFS interface is perhaps in hindsight not ideal, for example, it turns out to be a bit of a pain for networked filesystems (because the kernel expects things to be synchronous that can't be). But it also uses terminology in its documentation that can be a bit misleading - like inode. Although what is labelled as an inode will be an actual inode on physical filesystems with Unix semantics, for the purposes of the VFS it's basically just a handle that can be used to manipulate or open a file on a particular filesystem: it might be a memory pointer (for a FIFO or pseudo filesystem) or some other handle in a physical filesystem that doesn't have inode semantics. It's essentially anything that you get back from a directory lookup that identifies a file that can be opened.
I think the semantics of the VFS mean that an "inode" only needs to be valid and unique, within that filesystem, to a specific file at most for as long as the filesystem is mounted - and to assume anything else about them would be wrong. I'm sure if I've misunderstood that someone will politely correct me.
Of course if you're manipulating physical disks directly, that's a different matter.
One thing I could think of is that virtual file systems might span multiply filesystems and devices, and while filesystems-id and devices-inode might be unique, it's also rather arbitrary, long-winded, and serves no purpose other than a being a unique id, where a shorter one might suffice, if it was needed at all.
I think it is a shame Linus didn't take the opportunity to logically express his reasoning (whatever it is). It is surely worth being expressed and heard, and then it will be there as his valuable legacy for future perusal and enlightenment. I doubt it would take any more time than getting mad. At the same time, we are all human, even Linus, and can be very counterproductive to try to micromanage other peoples emotions.
> Off the top of my head, find and tar need unique inodes
True. And if you read a few of the messages, Linus accepts that.
What he isn't accepting is that there is any good reason to support someone running tar over the particular FS in question, eventfs (and/or tracefs, the distinction is a bit confusing). Nor does he worry about people having tar fail when trying to archive procfs! If you try to tar eventfs, the argument appears to go, even if it "worked", what good is the result?
The additional argument (over inodes that will about allow for processes in tight loops doing mkdir/rmdir all the live long day) is - more than I want to worry about! Came back and read some more Register articles instead.
"An inode number just isn't a unique descriptor any more. We're not living in the 1970s, and filesystems have changed." -Linus
This is from Chris Down <lkml.org/lkml/2020/7/13/1078>
"In Facebook production we are seeing heavy i_ino wraparounds on tmpfs. On affected tiers, in excess of 10% of hosts show multiple files with different content and the same inode number, with some servers even having as many as 150 duplicated inode numbers with differing file content."
Interesting inode issues, also from Chris Down <unix.stackexchange.com/questions/642313/how-does-linux-assign-inode-numbers-on-filesystems-not-based-on-inodes>
What if this “emperor” jackass gets run over by a truck one day? What then? This guy’s grip on all of this is what may eventually bring the whole thing down. So, here in the “west” we despise authoritarianism and despots (we know who) yet this is allowed! In what’s become a key part of the function web and daily lives of people. Has anyone ever thought about this and what if any contingencies should be made? I predict that this cat one day is gonna blow a fuse for good and go nuts or just drop dead from so much rage and fear of losing control. GET HIM OUT before he screws up this shit!
He has control of the Torvalds kernel tree. People contribute to it and it's where the main development takes place.
Take a look at all the kernel trees on git.kernel.org. kernel/git/torvalds/linux.git is just one of them. Now most of those are for developers to work on and they submit merge requests but there are people who work on their own trees, with stuff that's never (fully) going in. For example, the realtime kernel, or Andrew Morton's mm fork (akpm). Con Kolivas' fork with different scheduler work in the past (ck).
https://git.kernel.org/
You, or anybody could fork your own git repo and do what you want, and if people like it, they'd contribute and you could have Pete's Linux or something. Just for example, if you were a kernel programmer (I'm not... when I get stuck on code I don't understand, which happens fairly quickly lol, I have to go find an example)
You don't know? He runs a penguin sanctuary. His work on Linux is just something he does in the evening to blow off steam after dealing with the visitors, and if he's in a particularly fragile mood it's because there was a tour bus full of kids that came to taunt the penguins. He gets a bit stressed about that sort of thing.
Icon, for obvious reasons.
I remember when I first came across the concept of VFS and Vnodes when AIX 3.1 came out on the first RS/6000s. I *think* it was a Sun who invented them (I may have seen them mentioned in the SVR4 developer documentation) to make NFS easier to implement on different system architectures, but at the time they were still mainly UNIX, spanning various UNIX filesystem types, mainly implementing UNIX/POSIX file semantics.
I thought at the time that it was an elegant way of abstracting different filesystem implementations, and I'm sure that it made things like AFS and DCE/DFS easier to implement, but it's understandable that things move on, and some of the complex object types in persistent storage do not fit comfortably in a traditional VFS/Vnode implementation, particularly many object storage systems in Cloud infrastructure.
But for decades, it has made UNIX and UNIX-like OS's work and feel very similar, but I now find that Linux us evolving towards something that is not really UNIX any more.
I definitly can see that it may make non-filesystem data abstraction into something like a file more complicated, but getting things in chronological order, these things have happened since the VFS concept was invented.
It was actually AT&T that first invented the concept of a “File System Switch” (FSS), in System V Release 3. Sun took that idea and refined it a bit. And of course Linux takes full advantage of its similar VFS layer to support both real and virtual filesystems.
Now, why do you think the Windows NT kernel has never implemented such an idea? Why are so many Windows filesystem features inextricably tied into NTFS and won’t work with anything else?
I was an AT&T RFS user back in the late '80s, so I know that this plugged into the File System Switch (or as Maurice J. Bach referenced it, File System Abstraction, as FSS was used as the acronym for the Fair Share Scheduler in his book "The Design of the UNIXTM Operating System", and also in the AT&T Research and Development UNIX documentation). But I do remember an internal wall poster produced by one of the OS groups at either Indian Hill, or Murray Hill in AT&T, that showed the FSS.
I knew it existed, but as RFS was a UNIX-to-UNIX filesharing protocol, I always thought of the Switch as just a way of effectively adding remote references to the inodes from remote filesystems, rather than a full abstraction layer for different filesystem types. That may have been a misunderstanding on my part, because it also looks like it contained references to alternate system calls to handle remote filesystems. They certainly weren't called VFS and Vnode in Bach's book, and the term "Generic Inode" is used for what we now call a Vnode.
But when I read the VFS/Vnode description in AIX 3.1 documentation, and in the AIX internals course I took, this looked significantly different, and I was at the time talking to a very experienced Sun OS administrator and programmer (who had joined IBM about the same time I did) who implied very strongly that what IBM implemented was very much like the Sun OS implementation.
When the RISC System/6000 was first introduced, with AIX 3.1, the most common UNIX systems in the wild were from Sun, and IBM tried everything they could to make AIX interoperate with other UNIX systems, particularly Sun.
I agree that isn't the point, HOWEVER, if it was a microkernel, the filesystems wouldn't be in the kernel, they'd be userspace drivers (and they wouldn't be having this argument because somebody else would probably be writing the filesystem drivers lol).
Also, they'd be more "pluggable" if all they had to do was interface with an abstraction.
Sorry though, I'm not a fan of the microkernel method. I like having all my drivers in a monolithic kernel, and mine is even more monolithic because everything that drives hardware is built right in. Modules for things I don't load all the time, like virtualization drivers, netfillter modules etc.
Drivers all get updated accordingly when they are part of the kernel, too. The driver author often doesn't even have to do it. I know that when I boot Linux 6.8 for the first time (when it's mainline), that my filesystem drivers are going to be correct for the kernel,
Having a lot of stuff in modules, loaded as needed, achieves a lot of what you're thinking of. But even with a microkernel you still need the bits to talk with each other and present an agreed interface to userland - even if your drivers are in userland. AFAIKS this is what the argument is about here and presumably microkernels have the same scope for arguments.
Microkernels are odd beasts. They definitely are being used in the real world, yet there is little-to-nothing available which you can actually install on your PC and use as an OS. I just went looking just now, and found many virtual machine images, but just two CD images.
The Debian CD image, according to their web page, does not work out-of-the-box.
The ArchHurd website was last updated in 2018, but I'm downloading the LiveCD, and will see how it goes.
There's no homepage I can Google for L4/Fiasco.
But there have been tons of papers written about microkernels.
Now that is just about the worst example of the “benefits” of a microkernel you could think of ...
I think you should have presented way more context vs than extracting only a few lines from the last message. To me this message as part of the whole conversation reads very differently. Especially as this email follows many more polite (growing less polite) ones from Linus, and even sample code. For me this email is a reply to someone pushing hard again and again and again for something that Linus has already stated won't go ahead, and Rostedt did something that had already been criticized and prohibited, messing with core VFS functionality for their special case.
tldr; This message was not out of the blue, Linus already told them what they needed to do and what they cannot do several times/
Previous message, https://lkml.org/lkml/2024/1/28/513, which is still the middle of the thread and follows several more previous messages that were even more polite and clear:
---
On Sun, 28 Jan 2024 at 16:21, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> >
> > Wouldn't it be bad if the dentry hung around after the rmdir. You don't
> > want to be able to access files after rmdir has finished.
Steven, I already told you that that is NORMAL.
This is how UNIX filesystems work. Try this:
mkdir dummy
cd dummy
echo "Hello" > hello
( sleep 10; cat ) < hello &
rm hello
cd ..
rmdir dummy
and guess what? It will print "hello" after that file has been
removed, and the whole directory is gone.
YOU NEED TO DEAL WITH THIS.
> And thinking about this more, this is one thing that is different with
> eventfs than a normal file system. The rmdir in most cases where
> directories are deleted in eventfs will fail if there's any open files
> within it.
No.
Stop thinking that eventfs is special. It's not.
You need to deal with the realities of having made a filesystem. And
one of those realities is that you don't control the dentries, and you
can't randomly cache dentry state and then do things behind the VFS
layer's back.
So remove that broken function. Really. You did a filesystem, and
that means that you had better play by the VFS rules.
End of discussion.
Now, you can then make your own "read" and "lookup" etc functions say
"if the backing store data has been marked dead, I'll not do this".
That's *YOUR* data structures, and that's your choice.
But you need to STOP thinking you can play games with dentries. And
you need to stop making up BS arguments for why you should be able
to.
So if you are thinking of a "Here's how to do a virtual filesystem"
talk, I would suggest you start with one single word: "Don't".
I'm convinced that we have made it much too easy to do a half-arsed
virtual filesystem. And eventfs is way beyond half-arsed.
It's now gone from half-arsed to "I told you how to do this right, and
you are still arguing". That makes it full-arsed.
So just stop the arsing around, and just get rid of those _broken_ dentry games.
Linus
"You copied that function without understanding why it does what it does, and as a result your code IS GARBAGE."
I'm with Linus on this one. If there is one thing certain to get me annoyed it's having to clean up after someone who did something complex without understanding what they were fucking with*.
Even leaving aside the history, filesystems - even the comparatively simple 'basic' ones - are hard to understand at times. There is *always* merit in reading/learning about that you are about to start screwing with.
To me, this seems to be happening more and more. People seem to just be happy to learn how something works, but not why it works like that. Asking the 'why' question makes you a better engineer!
*AKA 'I know a little about it, don't have the time to learn anything more, so what I know is fine...it'll be OK I'm sure'.
Well... I happen to agree with Rostedt, inodes are a fundamental that shouldn't be broken. It doesn't sound like Linus. He may be envisioning "inodes going away" sometime in the future but he's not about to break every system in use.
Now, that's not "flaming". It's not completely polite, but it's not like Rostedt is some shrinking violet, they've been working together for a long time. I should think Rostedt is capable of telling him to piss off if he's offended. There are going to be impolite communications in any medium. People get annoyed.
I guess the real enigma is...
While everyone knows John E. is a big poof so why would he chose a piece of Asian ass over his little band and cause the collapse of the latter?
Or, why was he made a Lord when I've never seen anything by his little band even at the top of the charts?
Are the English so desperate for musicians, any musician?
Oh I guess that'll make 2 but anywho.
I do wonder why all the downvotes.
Surely the band would still be together making their cheesy songs if the "little" fracas over Ms. Sukisuki wouldn't have happened?
And maybe the King could have given lordships to the rest of the Beetle's? Might have helped to alleviate the obvious lack off officially recognized musicians in England to boot.
Or maybe y'all's just homophobes considering it's John Elton after all???
Beware of hipsters and ADHD people, perhaps with some technical knowledge but their main role is to _disrupt_ the Linux society/code and make it fail dramatically! It is not social, it might not even be business driven (although mostly it is), but politically driven "endevour" nowadays. IT people should take more care/attention now about what he/she develops/changes. Taking error-prone not as a thing of the past, but strictly the opposite - as a part of every line of code.
-Greg
Distributed reliable filesystems might be absolutely incapable of meeting the requirements of an inode. The big flaw is that the inode, if it exists, may unpredictably change as hosts are added and removed. It would be a mistake to assume inodes exist as a basic feature.
Google is also not to be trusted. They're 1990s Microsoft levels of evil.
My goodness people are fragile these days. When you're suggesting something to an expert and they tell you you've got it all wrong, step back and make sure you have a VERY good idea of what you're talking about before you contradict them.
Experts like Linus are few and far between, we need to value their time and energy and take the banal arguments elsewhere.
Let me introduce myself. I'm Steven Rostedt (some people jokingly say 'Roasted' which started when I was in third grade. I sometimes use the term myself).
tl;dr; Google had very little to do with the email thread. I didn't copy a function that I didn't understand but it was overkill for my use case which is why Linus said I didn't understand it. Linus was having a bad week when a bug report came in on my code.
Sorry for the long post. Most people will not read this and I'm fine with that. The article will be forgotten in a month but lives forever. I wanted my response to live with it. For the few of you that care about the background, including why Linus blew up, feel free to continue.
My background. I started Linux kernel development while working on my Masters in 1998. I fell in love with doing it, and sought out work as a kernel developer. In 2001, I landed a job at TimeSys porting their version of the Linux kernel to embedded boards. In 2003, I became a contractor focusing on using Linux in real time environments. In 2006, I joined Red Hat and helped them create their Real Time Linux offerings. In 2017, I left Red Hat and joined VMware as I was asked to help them "Convert an Open Source hostile company into an Open Source friendly one". I was mostly there to consult on how to interact with the open source communities and where we made the policy that any change to an open source project must benefit the community and not just the company. About 20% of my time was to consult the company and 80% was to continue my upstream contributions. After 5 years with VMware, my contributions were becoming less relevant to the company, so I left in 2022 and joined Google to work on ChromeOS where I help improve performance on their low end Chromebooks. Basically, I work to improve the laptops your kids use in school.
In 2008 (at Red Hat), I wrote the tracing infrastructure (aka ftrace) of the Linux kernel. As my heart has always been with embedded development I wanted the tracing infrastructure to easily be used on embedded devices. I chose to use a file system base interface that was functional with nothing more than BusyBox (a simple embedded shell). My kernel expertise is with the scheduler, interrupts, real-time, a bit of memory management and obviously tracing. I've never worked much in the file system management layer. At the time, I decided to use the debugfs file system (/sys/kernel/debug) as it was the easiest to implement. I added a directory there (/sys/kernel/debug/tracing) to interact with my infrastructure. I've been thanked several times by the embedded community for making the interface so simple to use, including by the lead of Ingenuity, the Mars helicopter, as my code was heavily used in debugging it, My scheduler work happens to be on the helicopter too, so I know that my code was running (and flying) on Mars! ;-)
As my code was starting to be used in production environments, I was asked (not by Google, but by others), if I could move the tracing interface out of debugfs. That's because debugfs is an all or nothing file system, and being a debug interface there could be vulnerabilities along with it. So I created tracefs. As I stated before, I didn't know much about file systems and after talking with some file system folks, they just told me to clone debugfs and start with that. I did and it wasn't that hard. In 2016 (still at Red Hat), the tracing infrastructure appears in /sys/kernel/tracing and has no dependency on debugfs. For backward compatibility, when you mount debugfs, it will automatically mount tracefs in its original location.
While I was at VMware, someone outside of VMware complained to me that tracefs had a very high memory footprint. Investigating, I found that it was due to the trace event files and directories. The trace events are created by any kernel developer, and we now have close to 2 thousand of them which creates close to 20 thousand files in tracefs. When you boot up, these event files are created even if you don't mount tracefs. But here's where my lack of knowledge on the Linux virtual file system (VFS) layer was a problem. I based tracefs off of debugfs which was actually doing things wrong. It used "dentry" as a handle to the interface. A dentry is just a VFS cache element. It should never be used outside of VFS as it is a critical element of the VFS layer. As tracefs copied debugfs, it inherited the same issue. The problem for me was, I still didn't know this was wrong until the blow up with Linus. I realized that the dentry and its backing inode was the cause of the memory footprint, as it wasn't being used as a cache to the file system but was being created for every event. In early 2020 I started looking at converting the "events" directory in tracefs over to something that would only allocate the dentry and inode when referenced. I got a working prototype semi working but ran out of time to finish it. Another engineer at VMware who was mostly doing Linux kernel backports asked me if there's any TODO items I had that he could work on to become a more established kernel engineer. I gave him the eventfs work. He got it working, but as I was his only interface to the kernel, I guided him incorrectly due to still thinking it was OK to use dentry as a handle. The incorrect code was my fault, not his.
I presented this at the Linux File System summit (LSFMM) in 2022 to get some feedback. I was informed about kernfs that did pretty much the same thing (but correctly). But when I looked into that code (still thinking it was OK to use dentry as a handle) it didn't make sense, and there was virtually no documentation on how to use it. It's what /sys uses in general, but I couldn't easily see the connection to what I was doing. Continuing on my working prototype using dentry seemed a easier path to take. Note, all this work was in the public domain where I even posted patches to the file system mailing list. Nobody said I was doing it wrong. I don't blame them, they are just as busy as I am, and my work didn't affect theirs.
When I finally had eventfs passing all my tests, it saved over 22 megabytes per instance. Not a big deal for data centers, but my focus is on low end Linux devices where 22 megs makes a difference. This is 22 megs of memory that is totally wasted. It can't be swapped out. It's basically just like telling the kernel not to use this memory for anything. I broke the changes up into two parts where half went in in 6.6 and the other half in 6.7. In the development cycle of 6.8 Linus noticed my use of dentry and told me I was doing it "wrong". Now having worked on this for 4 years, and nobody once told me that using dentry was bad, and seeing that debugfs did it, I was never fully understanding what problem Linus had with using dentry. This miscommunication escalated where I was starting to annoy Linus. One of the changes Linus told me to do, was to make all the inode numbers the same. An inode number is a unique number that every file and directory gets in a file system. For real file systems it makes sense, but for a pseudo file system like tracefs it's meaningless. I was concerned that this might break user space, and was rather surprised when Linus told me I shouldn't worry about that. Basically, he told me to try it, and see what breaks.
Then Linus had a very bad week. The Linux kernel development process starts with a two week "merge window" were maintainers may send Linus all their new features. After the merge window closes, only bug fixes are allowed and the release candidate process starts and goes on for 7 to 8 weeks. When the new release is out, the merge window for the next release starts. This merge window Linus pulled in a scheduler change (not my code) that caused a regression on his machine. His builds took twice as long to finish. But this regression only appeared on his machine and others were not able to reproduce it. I could tell Linus was debugging this himself because the rate of pull requests going into Linux was drastically slower than in other merge windows. Then Linus lost power for 4 days, right in the middle of the merge window. When he got his power back, the scheduler bug was fixed and he rushed to get all the other pull requests in and not extend the merge window as there's a lot of people depending on this cycle. After the merge window closed, one of the first bugs to come in was that "same inode change" that I did for Linus. It broke the application "find". "find" checks the directory inode numbers to make sure it's not going into loops. With all the directories having the same inode number, find thought it was looping and complained about it.
So, the first thing Linus told me to do was to make a simple counter to create the inode numbers. I did that, but I also took a look at the inode number generator that my code was originally using called get_next_ino(). I noticed that it did a nifty trick of making per CPU counters and allocating a batch of numbers for each CPU. The batch count was set at a power of two so to prevent races between CPUs it would only do an atomic_add_return() (an expensive CPU operation) when the count overflowed. I saw that, and being a real-time embedded developer by heart, thought to myself "Damn, that's nifty" and replaced my simple counter that did the atomic_add_return() for every new inode number with that. Well, atomic_add_return() may be an "expensive" CPU operation, but that just means it takes several CPU cycles to process. For my use case, it would never show up as an improvement. So doing so was complete overkill and added complexity. Because of that, Linus said I didn't understand that function. It wasn't that I didn't understand how it worked, it was that I didn't understand it wasn't the right place to use it. He was right. I shouldn't have used it, but did so more out of habit of using optimized code when I can.
Steven "Roasted" Rostedt
Thank you, Sir.
Kindly stick around, your kind of perspective is quite valuable in these here parts.
The cheque'll be in your voicemail, and as always the Secretary will disavow any knowledge of your actions.
N.B. I don't speak for or type(o) for ElReg; I'm just a common or garden commentard.