No, that makes things easier
As long as it simply drops the messages and doesn't crash the mail daemon, I'll be ok.
Sysadmins needs to update their code for handling email addresses to meet changes to the global DNS, or they risk losing customers and reputation. That's according to a number of domain experts meeting at the NamesCon conference in Las Vegas this week. Representatives from Google, ICANN, DotAsia and the Domain Name …
This post has been deleted by its author
That was my first thought too. But then I realised that the "pointless new domains" could well include the localisation of existing domains into the users native language with the potential that you are rejecting emails from existing contacts or customers who switch to local, accented or non-latin character set instead of using the Anglicised domains most use currently because it was the only choice.
Yes, let's just stick to English ASCII and sod those weirdos with "funny" letters.
The history of the web teaches us that this is not going to be fixed. Simply put, problematic GTLDs will never be used in anger because legacy systems won't work with them and legacy systems will never be repaired because nobody really uses those GTLDs in anger.
The only possible way the cycle can be broken would be adoption by one of the major incumbents.
China? If they are not already your suppliers, they're going to be your customers.
Yes, but that doesn't change anything. Chinese firms that stick with ASCII domain names, at least for things like email addresses, will avoid compatibility issues with partners that don't support IDNs. That's a competitive advantage, so there's incentive not to switch to an IDN.
Legacy IT is much more durable in industry than it is in the flighty, low-risk consumer realm. Right or wrong, that's going to significantly delay IDN uptake for industry.
IDNs address a long-standing usability issue - some argue an ethical one - with DNS. But as the article notes they come with serious and difficult1 problems for implementation, security, and usability.
1I don't know how many epic threads on the topic I've sat through on the W3C and IETF URI discussion lists. The phrase "pile of poo dot com" (one of the canonical examples of "do we support this?", referring to the eponymous emoji, U+1F4A9) will forever be with me.
So when somebody comes along and enters something like "postmaster@öß.com" into an email field in a web app, where do I send the email to? Do I follow IDNA2003 or IDNA2008 or UTS46 rules. Do I do UTF-8 case folding? I.e, does the email ultimately get sent to "postmaster@xn--zca9b.com" or "postmaster@xn--ss-eka.com" ? Should I check to see if all of the various ASCII encodings exist in the DNS and fail with an "Ambiguous domain" error if there are more than one?
I'd like to support internationalised domains, but there doesn't seem to be a "correct" way of doing it at the moment.
It also doesn't help that none of the popular open source databases or mail servers have functions built in for doing the conversions. So I can't store the Unicode representation of a domain in a database and then expect my mail server to be able to transform the incoming punycode encoded version to Unicode before doing lookups against that database.
Seeing as about half of e-mail verification scripts won't accept "+" in a username, a pretty well used and specified way of using multiple addresses especially in Gmail* then I wouldn't hold out for any changes to allow for these type of rules on most sites for a good few decades.
*For non Gmail users it allows you to add any characters after your normal username in the e-mail address to add unlimited aliases which you can then individually block or filter. E.g. if your e-mail is myname@gmail.com you can (without registering it) just sign up to sites using myname+thissite@gmail.com and you will see if they sell your e-mail address on or it gets stolen and can then block that address and inform the site of your disappointment.
Came here to post the exact same comment - drives me mad when websites refuse to allow the plus sign in an email address. What's worse than that though is when they either:
a) Accept it, but just strip it out so you don't actually get the emails
b) Accept it, send you emails, all well and good until you go to unsubscribe and find that their unsubscribe page requires you to enter the email address you signed up with and on THAT PAGE they don't allow the '+' sign so you can't unsubscribe...
Aaaarrrgghh.....
> a) Accept it, but just strip it out so you don't actually get the emails
I registered on e-bay with an e-mail address that contains a | character in the local part. Everything works fine unless I try contacting support - then I get some bounces from their internal servers saying that | isn't a valid character (it is).
As for IDNs, I've had an IDN domain for a few years now, and I also have e-mail on it. The only users that have had problems replying to messages sent from this domain have been using gmail...
Try RFC 821, written before most of the gootards were out of three-cornered pants.
The fact that most of the daft WWW-based email MTAs don't comply with the RFCs doesn't automagically make the gootards option a good idea.
Side-note: ElReg seems to be almost compliant with the various RFCs, at least as far as email is concerned. (whoever is moderating this post, check my email address ;-)
Sub-addressing, sendmail calls it "plus addressing", it's been around since at least sendmail 8.36 (early 1993) this is far from a Googlism. The RFC821 spec for the "local part" to the left of the @ is surprisingly loose, and most systems will choke on one thing or another.
http://en.wikipedia.org/wiki/Email_address#Address_tags
RFC 821 does not include how sieve addressing works or using the + as a special character to allow for aliases.
I was giving Gmail as an example - not saying it was the creator of it just that they are the ones who seem to promote this method the most and would be one that many people are familiar with. It wasn't 'history of the SMTP specs" competition to see which old-school unix admin can remember the meeting where the requisite acceptable character sets were first brought up.
There are a surprising number of "odd" characters that are considered valid in an email address per RFC specs (the plus sign being a definitive yes).
http://www.remote.org/jochen/mail/info/chars.html
And RFC 821 has been obsoleted by RFC 2821, which has been obsoleted by RFC 5321.
It is part of sendmail. The fact some websites don't accept it is probably by design, since some people use it to redirect spam (i.e. DougS+spam@foo.bar) Or try to, I used to do that but many spammers figured out they should delete the + and everything after it, so it is better to use an underscore and rewrite rules in sendmail config to handle it.
Like it or not, just as Arabic drove early science, Latin (Koin Greek, Aramaic, et al) drove early Christianity, Deutsch drove later science (etc., I won't continue. You are quite welcome), American English is the lingua franca of tehintrawebtubes.
Trying to make it otherwise is probably similar to swimming against the Boston Molasses Disaster.
Running code trumps all.
"American English is the lingua franca of tehintrawebtubes."
I would say more people use proper English online. Only Americans generally write color, period etc. Most international users of English in my experience generally put colour, full stop, etc.
"more people use proper English online"
I'm not talking about TheGreatUnwashed babbling. That's irrelevant. I'm talking about how it works. Read the RFCs. Read K&R C source comments. Read BSD source comments. Read Linux source comments. It's all American English. You don't have to like it, it's just reality.
linux/Documentation/logo.txt
"This is the full-colour version of the currently unofficial Linux logo..."
The reality is that it is us westeners pushing all those "international" stuff.
The web & the internet was built in English by English-Speaking people in English-Speaking institutions.
The rest of the world made a free will decission to embrace it, those are all facts, always keep them in mind.
Everybody forgets that in keeping domain and user names in a single language with a limited character set, allows for universality of its understanding. English is dead easy to learn compared to Cantonese for example.
My name when written properly contains characters not available in the standard character set accepted by email systems, and I never thought that as a problem but as a blessing, as expecting an English, Chinese or Indian person to know the distinction between á, a, à, ä, or ã is plain stupid. The most important thing is to get my email, because I care about the contents of the message, and those can be written in my native sensibilities!
I'm sorry if I'm not up to modern cultural sensibilities, but all of this push for more TLDs and more international crap is just a con from some people who do not know what to invent, to make/start another economic frenzy, and they remember the recent boom of the internet and thought: "Remember the good old times when people were registering millions of domains?" "yes, how can we make this happen again?"
Do not believe me? There has been a lot of capital investment in registrars in the last 10 years, the people behind such investments are of the opinion that a registrar is a money printing machine, and they want to print as much as they can. I do not blame them for that, but lets not lose the north completely shall we?
I find it amusing that you managed to find perhaps the single instance of a language which is easier to learn than English.
I understand how it got that way. First up we mashed together two languages to get Anglo-Saxon, then we threw in Latin, then French as our primary sources, with a little of something from everybody else just so nobody would feel excluded. But let's face it, if you aren't a native speaker of either one of the Romance languages or German it's a nightmare to learn.
I've always been grateful that I was born where I would learn it as my native language. In my early schooling I learned some Latin, a bit of Greek, Spanish, and some German. Most of the time I learned more about English than I did the language I was nominally studying. I've since forgotten most of the languages and I'd never offend a native speaker by mangling their language to their face.
I find it amusing that you managed to find perhaps the single instance of a language which is easier to learn than English.
While "ease of learning" is a highly subjective evaluation with no real metric, I dare say nearly any linguist who deigns to have truck with it would be able to nominate several natural languages as either easier or more difficult than English, depending on which you want. (You say "easier" above, but in context it appears you meant "more difficult".)
In similar vein, why not internationalise air transport? Why shouldn't a <insert rare minority language> pilot be able to communicate in his own language with any control tower anywhere in the world? And shouldn't the tower be able to answer in its language of choice?
Silly example I know, but for some purposes (porpoises if you are from the big round cornered one) it is rather sensible to standardise for what is increasingly international communication.
Like it or not, English, for two reasons in particular, has become the international standard. Firstly, as a result of the British Empire and resultant trade it has been spread around the world over several centuries and become a de facto standard for international communication and secondly the spelling has sensibly cut out all the unnecessary extra characters and accents. (The Septics have helped somewhat in this process in killing a few dipthongs.)
In countries like India where there are so many local languages English as one of thes official languages is a positive advantage internally as well as externally.
Some idiot tried to invent an international language - Esperanto. I suspect more people are fluent in Klingon.
Seems to me, if there is a properly defined and accepted standard (I said standard not standards - good luck with that one) the coding is not a major problem. The potential confusion and resulting security risks are. I'll bet there are no solutions to that one. Do you really want to have yet another set of wackamole security patches to worry about?
No doubt some countries will want to do it, and probably ok internally if they have a sensible character set and the webmaster kills the confusion of all the others. Including English. I wonder how many sites would go for it?
There is zero chance that non-spam domain will be created using those new TLD. Cyrillic or chinese? All possible countries using those domain encoding are already blacklisted for spam. Left-to-right? Same. Weird umlauts or magical spanish chars? Those people will just use latin. This is a storm in a teacup only to please godforgotten UN countries/entities and to sell new domains
Bound to collapse under the weight of its own nonsense.
I still groan when I see people create AD and email accounts with a ' in them. Perfectly valid as an email address but not accepted by tons of sites, MS LiveId being one.
Can't help but think if Jon Postel was still around a more usable solution would have been found.
"As a result non-English speakers will often have a back-up email address that they use for those systems that will not accept their main email address. But like IPv6, this approach is not sustainable."
Er, of course it is sustainable, how does it compare to the ipv4/ipv6 issue? Email addresses are not limited so the comparison is flawed. I'd guess most people have more than one email address. I'm sure people find it frustrating but I doubt there will be any widespread acceptance in the short term.
As far as I can see, you don't even have to get to the infinity vs infinity squared argument. Yes, we've been told the last IP4 address has been sold, there are no more, and <blink>YOU MUST MOVE TO IP6 NOW!!!!!</blink>*
As far as I can see, it still isn't happening. So the comparison failed before it even started.
*Yes, I think I may have found the first appropriate place for an HTML blink code.
Those I've seen using non Latin characters in a user name are about 95% scammers, trouble makers and assholes. They use the other language to fool people, fake using someone else's name and be difficult to report. Don't forget the one's who post some foreign gibberish with a link to a virus. The other 5% who bother to write in a language that can be translated at least are legitimate sincere users who just don't speak my language.
Interestingly looking at my contact list I communicate with people from over 50 countries regularly a number of whom speak no English and I have to run whatever I have to say through a translator to speak with. Only about 10% of them have a non Latin character in their names, despite the service we are using allowing non Latin characters for ages. In the end what does that say? The people I've found worth of my time have often been willing to make concessions in the interest of participating in the community.
The current Latin only email situation can't last, change must happen, nothing stays the same, and in the end I imagine the world will be richer for it. I have to ask though, does anyone want the change?
I'm having difficulty with the intent of this statement:
"[...] systems will often simply reject emails that don't work within the old DNS model - refusing to allow dashes for example."
The dash/hyphen has been allowed in DNS labels since forever (RFC882, 1983), the ACE form of an IDN breaks no new ground here. I've been involved in email administration for many a year, and I can't say I remember a single incident where a (valid) hyphen in a domain caused a problem. Hard-coded TLD "validation" or logic is a different problem.
Also, the IETF Email Address Internationalization working group and RFC6530 (and friends) seems to have escaped mention.
If I can just nitpick, here...
Are you talking about a "true" dash – that is, an "em dash", like the ones on either end of this phrase – or are you talking about a "hyphen", like this - ...like you see in typeset copy with hyphenated words, or many British surnames, such as Gervaise Brooke-Hamster? The terms aren't interchangeable, though many people do use them interchangeably.
(graphic design geek mode off.)
I'm talking about ASCII 0x2D, strictly IA5 "hyphen (minus)", informally "dash", and what's allowed in the "ASCII" subset letter-digit-hyphen, as historically used in DNS labels. A feature of IDN, which presumably justifies its complexity, is that it is backward compatible with this (ignoring the fact that a label is simply an octet sequence and technically need not be 7 bit ASCII).
One (my) interpretation of the above statement from the article is that the hyphen/dashes in the ACE "xn--" prefix or separator are going to cause problems. The encoded value cannot contain either a leading or trailing hyphen/dash (see RFC 3492 section 5), so where's the new problem?
There are other far less plausible (to me) interpretations, possibly involving systems that decode IDNs and reject dashes for some reason.
FWIW, valid IDN encompasses a large subset of Unicode, including all the dashes you might need or want. Further reading: Unicode Consortium IDN FAQ, reports TR36 & TR46, and the IDNA2008 RFCs 5890-5894 (get cracking, it's less than 5 hours until beer o'clock, this will be on the test ;-)
I'm not going to go out of my way to accept these domains.
If Exchange / postfix / etc. don't support them in their latest stable version, and a couple of stable versions before that, I'll likely never see them in use anyway.
The way to deploy something like this is to be low-impact (punycode stuff isn't), backwards-compatible, and get the software working first before you start selling such domain names en-masse.
Chances are, most people who buy those domains will quickly discover that nothing works for them and nobody ever answers them, then stop using them. By the time the software does catch up, nobody will trust them (or their "fixed" replacements) anyway.
Honestly, I see no reason that punycoding something should affect existing email rules anyway. If it's just as simple as allowing hyphens in the domain name, that's a one-liner of a patch to the majority of email software out there and nothing else should really be affected. But unfortunately, it's just not that simple.
And, I've told you before, Reg. You can mention IPv6 when you put out an AAAA record for your own domain. Or did you not bother to write that into the spec for whoever did the new design / CMS for you?
No it won't. The HD resource will just type it on the non-latin keyboard he is using for whatever hellhole Asian country he works from.
The real trick will be working through the two accents instead of the English both customer and vendor are use to using.