
Two men
Two individuals. That's all it took to define an entire industry that still lives under their shadow.
Salute to the masters.
In Unix terms, this news is akin to Moses appearing and announcing an amendment to the 10 commandments. AWK, a programming language for analyzing text files, is a core part of the Unix operating system, including Linux, all the BSDs and others. For an OS to be considered POSIX compliant, it must include AWK. AWK first appeared …
Yikes, I'm having flashbacks to My First Network Server written 30+ years back and initially being side-tracked into System V's Streams subsystem. I never did get my head around it. When I finally worked out how to do TCP sockets (we didn't have any source code so it took a bit longer than was ideal) I was amazed and very relieved by its simplicity, especially in comparison with that. I wonder why Streams never caught on...?
Git is only considered "hard" because it enables really sophisticated operations compared with earlier source control systems. Try setting up a Subversion server yourself, and if you manage that you can marvel at how hard it is to do just a simple merge.
Git is aptly named _by its own author_, it is no accident!
Git got its name as a dig at Andrew Tridgell, since his reverse engineering of the proprietary Bitkeeper version control system lead to Git's creation. It was always a bit dodgy entrusting the Linux source to a closed product like Bitkeeper, but Linus Torvalds felt it was the only one that offered the features he needed to manage such a massively distributed development effort. That dodginess came home to roost when the owner of Bitkeeper threw the toys out his pram at Tridgell's efforts to make the source control more accessible and he ended the free licenses to use Bitkeeper for Linux development.
Your memory is faulty, as a simple search would show. It was Tridgell who was asked by Linus to stop work on his Sourcepuller tool. Here's a source, an InfoWorld article of the time, which quotes Torvald's criticism of Tridgell:
As an interesting footnote, Bitkeeper was eventually open sourced a few years later.
Git is only considered "hard" because it enables really sophisticated operations compared with earlier source control systems.
Well, that and Git concepts have the same name as SVN/CVS/the rest concepts but are actually different things.
Also if you have several files with similar content and do a bunch of add/delete operations in one commit, it can quite happily get stuff round its neck and decide you've renamed files instead.
That's far better than the alternatives, which decide that all rename operations mean you've deleted and created new files, with no possibility of history or merging across said line.
Even if you explicitly use the "rename file" tool within said source control software.
Gods, bringing forward changes across a restructure is basically impossible in, say Perforce. Yet it "just works" in git, almost every time.
>> Try setting up a Subversion server yourself
I don't know, SVN's svnserve was pretty darn convenient. The alternative for Git to prep an inetd daemon, a httpd/CGI or locking down a restricted SSH shell account is in my opinion a little fiddly.
The rest I agree with though, Git makes very tricky things possible compared to others. Even in small teams where some of the complexity is overkill, it is still worth using Git, if anything to avoid the need to use different RCS systems per project.
Git is only considered "hard" because it enables really sophisticated operations compared with earlier source control systems.
ISTR Torvalds himself admitted the interface sucked at the time of its introduction, he stated it was temporary until something else replaced it. Of course that new interface never appeared. It isn't one of those cases were you can legitimately claim it's the result of power - for source code control there is no excuse for the simple stuff not to be simple.
Try setting up a Subversion server yourself,
I have. More than one, in fact.
and if you manage that you can marvel at how hard it is to do just a simple merge.
Merges in Subversion are generally trivially easy – certainly since the introduction of merge tracking, and they weren't that difficult before that. Reintegration merges require a grand total of two commands if there are no conflicts, and resolving conflicts with Subversion is certainly no more difficult, and generally more straightforward, than with git. Cross-branch cherry-picked merges are rarely any more effort. I do dozens of Subversion merges of various sorts a month.
git merges, conversely, can be quite baffling for people who don't understand git's data model and arcane command set. Just look at the unending battles over whether and when rebasing is a good idea.
git does very well at what it was created for, namely truly distributed change control (of text files; it doesn't do well with non-text formats). When used with a single centralized repository, which is how probably the vast majority of its users use it, it's simply extra complexity and obscurity for little or no benefit.
Git certainly takes some onboarding to get fully up to speed with "what to do when I'm in X situation and want Y to happen", but when I think back to dealing with CVS (shudder) and even SVN - with SVN, we used to set an entire week aside to merge feature branches in to production, it was truly horrific - what did you do this week Bob? Oh I merged 1700 commits on to production, there were a couple of mismerges but we got there eventually!
Subversion's branching and merging support was a "proof of concept", whose author never intended to be released in that state. That's why it's such a kludge and makes it so hard to deal with merge conflicts. Things improved a little after Subversion 1.5 was released, but it still built on the terrible foundation of that initial implementation.
I can't find a link to the discussion about all this that the original author posted to, but it was related to a comparison in the O'Reilly book about Mercurial. On the book's forum, a Subversion fanboi took issue with criticism of his fave version control system until the code's author waded in to confirm his work had been flawed.
In at least one case the terminology (especially when directly related to github) seems bass-ackwards to me.
Specifically, "pull request" - usually when you UPload things, or submit things for review. it's more of a PUSH, not a PULL. I think "change request" or "patch request" or "update request" would make more sense.
And so, the "confusing side" of git, which in THIS case, is really a "GitHub-ism".
(otherwise to me it is just another source control revisioning system that happens to be popular)
I think it makes sense in the underlying system, but not in the UI that github puts over the top.
Originally it's "please pull the work I've done from my server into yours"
But now it's "I've pushed this work to the server we both use, please merge it".
So "merge request" is a better term. I think that's what gitlab calls it now, tbh.
Doesn't look like the Unicode support is yet merged in the master branch, but there is a working branch unicode-support with notes like "more to do".
Will be interesting to see how the old Master has managed to retrofit Unicode. I once looked at the problem in the context of another very old piece of code, and thought it too much work. If a program does anything except copy the strings, it has to parse the UFT-8 encoding, and on top of that many old programs assume characters have only 7 bit of data, and use the MSBs for something, or get negative array index errors if it is set (often crashing the program).
One wonders how many mainstream programs are really still 7-bit ASCII only. If they try to use the eighth bit for some internal purpose, they won't work with Latin-1, CP-1252 or other encoding systems that I assume to be in common use in Europe. Surely that'd be a nuisance. But you're surely right about having to deal with UTF-8 multibyte characters without trashing or misunderstanding them.
It Depends.
If there's no typesetting, it doesn't need to know much at all, just "how to determine the beginning and end of a character" and "what is whitespace".
That covers trimming, text search/replace and token splitting.
So if it worked outside of the USA it's mostly a case of fixing everywhere it jumps forward or backwards in the stream.
It depends on whether you want to operate on code points and assume different code points are different, or if you want to operate on characters, where the same character can and will be represented in different ways. For example "café" can be four or five code points, but it is four characters, and they should be considered the same.
"café" can be four or five code points, but it is four characters, and they should be considered the same
Unicode equivalence is a pain, it allows different byte sequences to represent the same character. In the above example the 'é' can either be U+E9 (a single code point) or 'e' followed by a combining acute accent (U+65 followed by U+301). Searching algorithms are thus more complicated.
I am a big fan of Awk. I tend to use it in scripts which requires lists which is just a little out of reach of /bin/sh scripts. I still find it renders Perl and Python a little unneccesary for most sysadmin tasks.
Since it is a requirement for SUS and POSIX platforms so is pretty much always around in various forms, I am fairly surprised it isn't used more as a Makefile generator. I used it for this not long ago and I was surprised how effective it was:
https://gitlab.com/osen/openbsd_drmfb_gnuboy/-/blob/main/configure.awk
I used to do a lot more with awk in the past but nowadays Python tends to be my go-to scripting language.
Is Python as fast or as efficient as tool X? Maybe not. But I rarely need absolute performance and, being a jack-of-all-trades, keeping a reasonable understanding of a small number of tools is the way to a simpler life.
I find my complex awk scripts are quite maintainable, but then I use advanced features like functions, comments, and sensible application of whitespace, which seem to be beyond many developers regardless of scripting language.
I mostly use awk because I have much of it memorized, whereas I use Python so rarely that I have to keep looking things up, and if I'm writing a script it's often to massage diagnostic data to help me diagnose a problem, so I don't feel inclined to spend a lot of time buffing my skills.
Also I don't find Python particularly attractive as a language, to be honest. I mean, it's better than Perl – but that's faint praise. Scoping-by-indentation is OK for blocks that fit on the page, but problematic if they go longer, so to create maintainable Python I want to do a lot of prefactoring into small abstractions, and that takes time I probably don't want to spend if I'm not writing product code.
[Author here]
After 5 years working in Linux vendors in Central Europe, the thing is this:
English is the language of business across much Europe now. It's the one language you can rely on most people speaking.
*But* the version they speak is, by and large, *US* English. For very advanced speakers, they use British pronunciation and spelling, but the vocab is US.
And in US English, the word "git" is meaningless or a variant of "get" in the sense of "go" -- "git out of here".
So they have no clue that the name _means_ something.
Interesting that the author mentions POSIX systems are required to include awk but doesn't seem to realise that POSIX also effectively requires awk to support UTF-8, and that all of the systems which were updated to conform to POSIX.2-1992 and which added UTF-8 locales in the early to mid 1990's added UTF-8 support to awk (and all the other POSIX text-processing utilities) at that time. I say "effectively" because it's only required on systems that have at least one UTF-8 locale installed, so there's a loophole there; for some systems there may have been a short time where they supported older multi-byte encodings in awk before they added any UTF-8 locales. (UTF-8 was invented by Ken Thompson and Rob Pike right around the same time that POSIX.2-1992 was approved by IEEE).