
On the subject of C++
I leave you with an 'interview with Bjarne Stroustrup" - http://www.ganssle.com/tem/tem17.htm
Peter Wayner, a tech columnist, claims to have identified the seven most vexing problems in programming. According to his subheadings, these are: multithreading, closures, "too big data", np-completeness, security, encryption and identity management. Such lists are constructed to be disputed. Game on. To start with, Mr Wayner' …
Google something site:stackoverflow.com and the top link will take you to an answer with a thousand upvotes from 2009 before the New Testament (C++11) arrived. Everything's moved on since then, unless the question is about makefiles.
Upvotes should disappear after a year and eventually old answers will sink and bright shiny new code will appear to take its place. Perhaps a special exemption can be made for Delphi.
"Upvotes should disappear after a year and eventually old answers will sink and bright shiny new code will appear to take its place."
That's not a bad suggestion, but I'd like the option of specifying a time range. There is some perfectly good old code out there that needs love too. ;)
Use Java. Call them lambda functions. It's an even less meaningful name, so at least you don't make any assumptions about what they're supposed to do. Which is, er, let me get back to you on that.
Which probably have little knowledge of foreign languages. The real issue is most programming languages were developed in US/UK on English-bound OSes, by people mostly speaking and writing applications in English - or in a single language.
And that lead to the wrong idea that managing text is easy (so easy C doesn't have a string type at all, you just use array of bytes, being "char" a fake). The reality is managing text (properly) *is one of the most difficult tasks* - especially when you have to manage more than one language at the same time.
Unluckily, languages were not designed by programmers, or mathematicians. Thereby fitting them into simple data structures and algorithms is not easy at all.
The U in Unicode means also "Ugly"? Probably. Just, not complain only. Propose a better working solution for a string data type that can handle multiple languages while being easy to use at the same time.
Also, stop using "strings" and byte array interchangeably. They are not. I understand for beginners strings looks like the "definitive data type" because it can store anything - I've seen a PHP programmer doing bitwise calculations turning integers into strings of "0" and "1" characters, and the comparing them - whoever teaches it is doing a disservice to beginners, yet dynamic typed language made it even worse.
It' time to teach from the beginning how complex text manipulation is - because there languages are more than one, and many are far more complex than English.
IMHO, all comp-sci graduates and practitioners should be required to know at least two languages besides English. They would start to understand why Unicode is a necessary evil - and the difference between text - and a sequence of bytes.
For a supposedly Unicode native language, it ain't half difficult to use Unicode in Go.
In that example, the thing that works is at the thing at the end, where you're faffing about with strings like just as in C. Unless I can address and compare Unicode glyphs in strings as easily as addressing and comparing simple ASCII elements, it's... suboptimal.
It does look to me it doesn't anything special but implying they are UTF-8, and still alike arrays. Not very different from Python 3.
The issue many have with UTF is not any sequence of bytes is valid. That's why concatenating an UTF string with a generic byte sequence is "risky" - you can obtain an invalid UTF string. That doesn't happen with ASCII strings - any sequence is valid even if it may contain unprintable "characters" (or print the wrong ones depending on the "codepage") - yet still they are valid ASCII strings.
Usually the main difference is some functions calls will balk at those invalid sequences, and throw error/exceptions - which were not triggered when using ASCII, whatever the contents was. IMHO, it means lurking bugs now are surfacing, but many developers think instead UTF broke their code.
. That's why concatenating an UTF string with a generic byte sequence is "risky" - you can obtain an invalid UTF string
Byte concatenation of two well-formed UTF-8 strings results in one well-formed UTF-8 string. If you find yourself wondering about input strings that are likely to contain UTF-8 multibyte codes, it's a very strong sign that you shouldn't be doing anything at all with the "characters" that are inside them.
(Besides, byte-drops in UTF-8 are detectable by code, unlike older multibyte systems like Big5 or Shift-JIS where it's impossible to tell a corrupted sequence from an intentional input without using statistical analysis)
Code should not try to write text. The smallest unit of text that you can legitimately concatenate is the sentence. Value insertion, not concatenation, is how you format text that will be read by a human.
The problem is that Unix and its descendants are designed around the assumption that there's really no difference between human-readable and machine-readable data.
Value insertion, not concatenation, is how you format text that will be read by a human.
Yes, well...
English is OK if you have 3 cases for none, 1 or plural.
But take Slavic languages:
1 rubl'
2 rublya
3 rublya
4 rublya
5-20 rublei
21 rubl'
1 god
2 goda
5 lyet
You need a whole code block to decide how to format a currency, date or a time ahead of selecting the proper sentence. And I am sure there are other language groups out there that are worse.
Yes, I mentioned Slavic languages in a follow-up post, as one reason why building strings in code is a really bad idea.
Welsh is the only living language that can beat the Slavic family on complexity of pluralisation rules, but these are actually quite rigid rules, and can be expressed simply by code. Here is the list of pluralisation rules for most of the world's languages:
http://www.unicode.org/cldr/charts/27/supplemental/language_plural_rules.html
Localisation toolkits like GNU Gettext have a pluralisation mechanism built in, which lets you selects the correct string for the value you're inserting ("%d day" or "%d days"), before inserting the value. It's also not limited to just two options, and the logic to select between the options is general enough to support any pluralisation scheme. If you want to know more, it's documented here: https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
If you're not on Linux (or a framework that relies on Gettext for its localisation), you can still use the same procedure, with only a small amount of additional code. The trick is knowing that you might have to do this; once you know you need to do it, implementing it is trivial. (I use a small C# class called PluralStringFormatter that implements this logic; with a "string selector" object that's implements the string selection according to the current locale )
Better to discourage programmers from manipulating text at all. Once you allow programmers to treat human-readable text as chunks of "wordlets", you're asking for the pain of this:
snprintf(buf,len,"%d item%s", count, (count==1)?"":"s");
and this:*
snprintf(buf,len,"%d item%s %s%s.", count, (count==1)?"":"s"
,type==DLOAD?"download":"updat" iscomplete?"ed":"ing");
A brief overview of how Slavic languages handle pluralisation usually does the trick to dissuade practitioners of #1; those who'd describe #2 as "efficient" would take a lot more work ("but I'll just use whatever the Japanese for 'ing' is and it'll still be fine...") .
The best way ways around these is to leave the job of writing text to linguists, and limit the job of the software to simply choosing between complete sentences; or better - complete paragraphs.
* both of those examples are culled from real code, by the way. A long time ago I used to work with development teams to make their products localisable... most needed only a little work, some were really bad. Perversely, I'd find that the higher the number of non-native English speakers on the team, the harder the software was to localise into a non-English language.
Which commenters, where? The article doesn't, that I can see, mention Unicode.
Living, as I do, in a fairly polyglot city (for America), I do seem to know a fair number of programmers who can find their way around in more languages than English. But the suggestion of two language beyond English, or even one, seems visionary.
A lot of the resistance comes not from Anglophone bigotry, but from the deep reluctance to change. That is not in itself a bad thing, since changing code that--within whatever boundaries--does work always risks problems. I understand the programmer who lacks enthusiasm for changing a lot of code so that he can put the tilde in "canon" or the umlaut in "Jager". For a new system, sure; but how often do we work with wholly new systems.
Read some of the links posted by Stob...
Of course the "two language" was a provocation. But in my experience, invariably when in a forum I see people complaining about a language adopting Unicode they are native anglophones - and living in an anglophone country - which means they mostly don't encounter foreign languages.
Anyway, there's a big difference about hearing or even speaking more than one language, and actually having to process text. There are many idiosyncrasies in how text in different languages is written and manipulated that are of course lost when speaking.
IMHO, all comp-sci graduates and practitioners should be required to know at least two languages besides English.
But...but...I've forgotten almost all the French and Latin I learned in four years of schooling. Does that mean I have to resign now, 35 years later?
"I've forgotten almost all the French and Latin ..."
I don't think French and Latin really count as two different languages. They are sufficiently different from English to make you curious about lingustics, but there are languages out there that will make you seriously wonder whether even whole sentences are the minimal unit of translation, or indeed whether it is actually possible to translate them into English without garbling at least some of the meaning.
Notions like "noun", "verb" or even "word" start to look flaky if you review *all* the languages of the world.
As someone who started writing stuff in Python a few years back, and read up on all the unicode stuff as a part of that, I have to say I think the Python developers are broadly correct: Python 2's approach was bad, and Python 3's approach is better. You can see why the Python 2-era designers thought it was a good idea to fudge everything so developers never 'had to worry about' the distinction between strings of bytes and text, and didn't have to care about the various ways of interpreting the former as the latter, In A Time When it was still kind of okay to pretend you only ever had to care about ASCII. But it really *isn't* a good idea, and any time you have to deal with anything but ASCII, it really does tend to lead to difficulties. I think they're right that it's better to explicitly acknowledge the difference in the language. And if you're writing pure Python 3, it's really not that hard to work with - it's not rocket surgery, just encode() and decode() as necessary. (Especially since the default encoding for Python 3 is UTF-8, which is going to be the one you want about 99.99% of the time anyhow).
The reason people hate it so much, I think, is that it becomes unfortunately *much* more of a minefield if you want to write or maintain code that works with both Python 2 and Python 3. Which many many people do. Doing that in itself immediately gives you many, many more edge cases and headaches to deal with. (I've lost count of the number of times I wish I could go back in a time machine and change Python 2's default encoding to be UTF-8, for a start). It gets even worse if you're not starting from scratch - so you can just `from __future__ import unicode_literals` and get rid of quite a few of the issues, making at least *your* code behave quite a lot more similarly under the two interpreters, but are stuck with older code where you can't just do that because of existing assumptions about the way the code will handle strings on Python 2. If you're in that situation, it can be a bleeding nightmare.
It's definitely a shame, and I think some of the Python devs at this point wish they could do the whole thing over and try to somehow reduce the problems. But I don't think it's because the Python 3 design is wrong, or because the Python devs are arrogant, or anything. It's just one of those things that happens because this stuff is hard.
I don't think all the coders who complain are just knuckle-dragging Americans or Brits who refuse to believe that anyone really needs more than the sacred ASCII character set. You *do* get those types, but I think they're a minority. It's just that dealing with this stuff can be a pain even if you honestly do understand and accept the importance of handling ALL THE CHARACTERS.
Its not "English", its the Roman alphabet; that level of coding also accommodates Greek and accented characters from derived languages. Unicode is needed for other scripts, chiefly Asian languages.
There is a reason why a Chinese typewriter is a large, heavy, expensive custom device while a corresponding European language one is (was) small, compact and mass produced. Unicode is like telling everyone they have to use a Chinese typewriter to write English just in case someone wants to write something in Mandarin. Even the Chinese were smart enough to realize that a culturally significant language isn't good for business -- they've had a rendering of their language in Roman script for decades.
Um. No. Aside from Latin characters, Greek and 'Asian languages', you've got Cyrillic, Hebrew, Arabic and several different Indic scripts. Pre-Unicode, each of these had its own encoding - or multiple encodings - most of which didn't really take any account of how the others worked.
Unicode is massively, hugely better than the 'everyone gets their own encoding! You get an encoding! You get an encoding!' world we had before. Claiming that it only exists for Chinese (you know, that ridiculous little niche case, it's only *literally the world's most commonly spoken language*) is ridiculous.
(Also, it's rather silly to talk about 'write something in Mandarin'. Mandarin, Cantonese etc. are the *spoken dialect* variants of Chinese. There are two major written variants of Chinese, Traditional and Simplified. There's no 'written Mandarin Chinese', and there's no direct correlation between which spoken and which written variant you use, many combinations are possible).
The last company I did any international language coding for the first thing I did was to write a DB for all the user interface text and a tool to allow whichever country was in control of their language interface to put their own translations of the english versions in. Even better they got to be in charge of their own CSS which I'd insisted on since the english master version had been working for nearly two years while the advertising department flip-flopped on colours fonts and images. Its amazing how easy coding is when you can fuck off this sort of random noise and penis waving from the design process. Give people ownership of their part of the process and the means to ensure you cant be to blame and they dont spend their lives trying to fuck you up.
"By the way, special mention for JavaScript, the runaway winner in this category: Broccoli, Brunch, Grunt, Gulp, Mimosa, Jake and Webpack, which list sounds like a roll call of firefighters in a forthcoming BBC puppet programme for the under-fives TheDonaldTrumpton."
Not sure Brocolli and Mimosa will qualify for visa's under Obersturmbahnfuehrer Rudd's new listing program....
We have too many tools and languages - and the solution is always to build a new one to fix the problems in the old tool/language. And of course it never does, it just moves them somewhere else - so we create a new tool/language ...
Sigmonster says, "A good programmer can write FORTRAN programs in any language"
... when I got to the point where "make" is called "standard".
It only was in the meaning that it had the same name, but different syntaxes on each flavour of Unix. Until GNU make came and invented a new one, that at least was more portable.
But that hadn't happened before there was an imake, an attempt to automatically create Makefiles for all those flavours, and then GNU automake, to do the same, but in a different way. And that was all before ant and XML came into fashion.
Nostalgia for the good old days of pure and clean Makefiles, good old days that never existed :)
I hate make and all its bastard offspring...
Hint if you need an extra tool to write the config for the tool that makes the config for the thing you are building; you are doing it wrong
These days I just use a bloody shell script to call the compiler, works every time and is 'human' readable
cMake, make & M4 can all burn in the pits of Microsoft for all I care
>Hint if you need an extra tool to write the config for the tool that makes the config for the thing you are building; you are doing it wrong
>These days I just use a bloody shell script to call the compiler, works every time and is 'human' readable
hear hear. stuff i wrote 20 years ago still runs fine today, where i did it on the basis of "maximum simplicity to read & run". shell kicks arse, as do the other standard *nix tools. where i dipped into this-or-that "more efficient" or "tighter" DSLs, it's all dead code. dead effort. wasted time. bounded existence.
well, except make.
oh, and that no-source-control-HERE-let-alone-multiple-branches-thankyouverymuch thing i did/had to do, which apple later re-presented as TimeMachine. which you can still do in shell in 2mins. like i did.
Python, where the intern opening and closing the text in the wrong editor can fuck the code to a fare-thee-well.
Seriously: All those good pythonic ideas saddled with the beyond idiotic significant leading whitespace design feature?
Even Cobol didn't fall for that one. Stay between columns 12 and 76 and remember to punctuate properly and you could do what the hell you liked with spacing.
Of all the things to get bent out of shape about with the flood of "C-like" languages in the wild, the curly brackets seems a bit .. silly.
"Even Cobol didn't fall for that one. Stay between columns 12 and 76 and remember to punctuate properly and you could do what the hell you liked with spacing."
DEC's Terminal Format in Cobol (arguably) went one better - stuff like labels went in column 1, instructions were indented by a tab, thereafter what the hell you liked. Much faster to bash code in, and there was a reformatting utility to convert Terminal Format to column oriented format or back.
I have vague memories of the time wasted hunting for the miss-matched brace back when I first got my hands on a C compiler. Improved compiler error messages have helped, always typing }«left arrow» immediately after { avoids the problem much of the time. When there is a brace problem the best tool I have for fixing the problem promptly is decades of experience. Python indentation is really clear, the parser reports errors on the correct line, and more functionality fits on the screen because a C line with only a } only becomes a python blank line if it improves clarity. Dealing with a python indentation problem is easy for anyone who understands why a C compiler is going to scream if you save your code in docx format.
I have come across config files that use C style braces. When I find one, I want to find a machine belonging to the person responsible, delete a } in the middle of his configuration file and see how long it takes him to fix the problem with only the inevitable crappy diagnostics from his half baked parser.
"and see how long it takes him to fix the problem with only the inevitable crappy diagnostics from his half baked parser."
About twenty seconds at a guess. My half baked parser used a keyword followed by a brace followed by parameters, and then a closing brace. So when the parser stumbles across a new keyword it will report the missing '}' and then continue as if one had been present. It's called error recovery and recognising that files that are supposed to be created by software are often hacked about with by humans.
Adding an extra '}' is more fun. ;-)
I'm not surprised at continuing resistance to Python 3. My first and last experience of interacting with Python folk was receiving multiple repetitions of "why would you want to do that?", with variously emphasized sarcasms, all to one question where I needed to emulate data handling of a non-Python-invented format.
They simply were *against* anything externally derived. These are not the iconoclasts you were looking for.
Programmers as a group have always puzzled me. Applications programmers, that is. They seem live in a world dominated by their user interfaces with a lot of the nuts and bolts -- the boring bits of their software creations -- being relegated to libraries, canned objects and the like.
They should get a behind-the-scenes peek at real time work where software isn't programming so much as a way of expressing some kind of logical machine using code. This machine may sport an interface, something they'd recognize, but its only a relatively small part of the whole.
They'd also get a much better appreciation for bugs. Real time systems tend to have real world consequences when the software misbehaves. You can't just pop up a dialog with a cryptic error message in it and quit, there's nowhere to quit to.
As a BTW, I don't know what people have against curly braces. They're just a delimiter. Try writing a parser using flex/bison or similar and you'll get the idea about why things are the way they are. Trying to pretend -- like Python does -- that somehow spaces can substitute for an unambiguous delimiter to delineate a structure block is, well, just plain stupid. It shows massive egos at work.
"Real time systems tend to have real world consequences when the software misbehaves."
Reminded me of working on software to control a tank turret. After an unfortunate accident, a large lump of nickel steel was attached to the turret, and ditto to hull, to prevent the turret rotating too far and taking the cabling with it, along with the damage to a human occupant.
Working on the thing from the inside, it was less than encouraging to notice the large dents in the lumps of nickel steel, and having to hope that the welds were still sound.
I don't understand the resistance to Python 3.
I *do* understand the issues it causes with legacy codebases, and how the pain involved in porting an existing codebase to the new version (where even possible), may not make sense. This, in turn, may result in needing to write new code in older versions of Python so they work with these legacy codebases. I have to do this myself, and find it manageable. (I have also successfully ported existing code libraries, though they were all pure Python with no C extensions.)
Aside from maintaining legacy codebases, though, I don't really understand why anyone would prefer the 2.0 branch. As someone who started with Python back in 2.2, I personally find the 3.x dialect's modest syntax changes and its library reorgs sensible, useful, and less obtuse in a number of ways.
Tying in to the discussion of Uncode text, I find the Py3K change to dealing with all strings as Unicode to be far more practical and safe in any context where you have to deal with non-ASCII data. Which these days means just about anything that touches text.
And finally, though I love working with Python, no, I'm not the biggest fan of the syntactic relevance of white space. I deal with it without serious issue, but I wouldn't mind at all if I didn't have to deal with it. That said, I really think anyone who can't deal with it doesn't really want to try. If interns opening and closing a Python source file in a text editor is actually a problem, that's pointing out a pretty serious deficiency in source code management. I'll agree it'd be much nicer to not need to worry about it, but come on, really.
Multi-threading has many well developed mechanisms for producing exactly deterministic results in a predictable amount of time. It's a solved problem.
The hard one is distributed concurrency - a large number of machines all solving a single task. The difference is that machines are connected with latency, machines may go silent, and machines may suddenly come back to life after they were presumed dead. Anybody working at a hot startup will quote a bunch of Apache projects claiming to completely solve distributed concurrency with eventual consistency. The "eventual" in "eventual consistency" is normally a short period of time but it also may be infinity, and there might not even be a method of determining when the transient results have finally passed. There may also be eventual failures due to conflicting inputs that could not originally see each other. Such projects are typically poorly documented, not bug free, and may contain features that will not work in realistic conditions.
Distributed concurrency is fairly new and not solved as much as people expect it to be. Depending on the type of task, it might produce exact solutions in a known time, exact solutions in an unknown time, or inexact solutions in a known time. The hard part is transforming your task or expectations to match what can be done.
"Distributed concurrency is fairly new and not solved as much as people expect it to be."
Depends on your definition of "new", folks were working on this problem before ARPAnet first started punting bytes around the place. Leslie Lamport et al wrote the "The Byzantine Generals Problem" paper ~1980, and it was not a new problem for them to solve even at that time...
From my point of view, academically, distributed computing/parallel processing is fairly mature, there are plenty of robust models and papers out there, the problem is that folks invest their time in looking for silver bullet frameworks rather than reading papers and applying grey matter.
People have already lambasted Shaw's ignorance of why bytes and unicode need to be different underlying types. He mentioned a combined bytes/unicode object, but shows no understanding of why it came into existence. (Lazy scripters waste lots of cycles encoding and decoding the same string. By keeping the unicode and encoded versions together, python can recycle an existing objects instead of creating new ones.) I noticed Shaw did not mention the new style classes available in python3 that make multiple inheritance work. It is almost as if he has left multiple inheritance in the tool box because he never recognised the right times to use it.
Python devs believed fixing python2's defects with multiple inheritance and unicode properly were going to require changes that would be incompatible with the existing language. They created python3. Shaw did not create a language compatible with Python2 that fixed the inherent flaws.
After a huge tiresome rant about the need for backward compatibility, Shaw recommended deleting python's legacy string formatting mechanisms. I could not find the words to express what I thought about that, but someone has helpfully created a suitable web page here.