back to article Microsoft 365 tries again at filtering swearing, bad behavior: Classifiers for seven languages offered

Microsoft has gone back to the drawing board and once again emitted tools to detect and filter out swearing and abuse on its Microsoft 365 cloud. News of the profanity protector popped up on the Microsoft 365 Roadmap, a feed of information from Redmond about new features coming to the tech giant's sprawling subscription …

  1. b0llchit Silver badge
    Facepalm

    A textual salute

    I salute the whole concept with a choice of fingers shown on both hands, where the choice of fingers probably will fall on a couple of centrally positioned fingers, extended upright, while the others are folded back into a fist-like structure.

    1. NATTtrash

      Re: A textual salute

      Clippy: I see that you're pretty pissed off. Do you want me to suggest some safe terms to continue bitching, or would you rather like me to report your language to Nanny?

    2. Joe W Silver badge

      Re: A textual salute

      Ah, you mean "the ancient sign to ward of demons"? (T. Pratchett, "Interesting Times").

  2. deadlockvictim

    US-centric

    And, of course, all of the decisions made will be US-centric.

    Over this side of the Pond, cunt is an everyday swearword, not especially sexist, strong but not very strong, and rarely used in its original context. Usually it means that the fellow working with you has fucked up or has made you angry. It is mostly vernacular though.

    In the US, however, it has become The Most Evil Word Ever™, presumably because of their Puritan history.

    Of course, now I will have to think of a way to incorporate cunt in a Microsoft document.

    What do I know about Scunthorpe? There is surely a cunning stunt I can pull.

    1. Evil Auditor Silver badge
      Trollface

      Re: US-centric

      Consider this: you oppress religious fanatics. They leave and find another place to fuck up and they grow and breed. Many -severel hundred- years later, just about when you thought that you overcame moral constraints and finally live in a liberal society, they come back at you telling you what you can say and what you cunt.

      1. martyn.hare

        They also hate people with Cock as a surname

        I had to turn off their profanity filters (which are useful for blocking viagra spam) because it couldn’t differentiate context. Microsoft support was useless as usual and it was down to me to find out why emails were not being received.

        For all their touting AI you would think they could check if the word is part of the From line tagged in an email body due to people replying/forwarding... but no... their stuff is completely SHIT* as to be expected!

        * Stupid Hassle Irritating Techies

    2. IGotOut Silver badge

      Re: US-centric

      "Over this side of the Pond, cunt is an everyday swearword, not especially sexist, strong but not very strong, "

      You'll find it varies from region to region and age tonagr in the uk.

      My guess is you live in the South East.

      1. deadlockvictim

        Re: US-centric

        Dublin, actually.

        1. NeilPost Silver badge

          Re: US-centric

          Do you not mean in cunting Dublin

          Fecking hell !!

    3. Michael Wojcik Silver badge

      Re: US-centric

      In the US, however, it has become The Most Evil Word Ever™, presumably because of their Puritan history.

      I don't think the Puritans have much to do with it. They were actually pretty liberal when it came to talking about the naughty bits, and for that matter about using them. Some studies of their records suggest a majority of women were either pregnant or had children by the time they were married. And it was customary, at least in some communities, that if an eligible bachelor stayed overnight with a family that included an eligible daughter, the two would sleep together; we have various documents substantiating that.

      What did happen in the US, but in the eighteenth and nineteenth centuries – well after the Puritans had become no more than a relatively minor (if wildly overrated) chapter in US history – was a long-lasting panic over any mention of female anatomy. Bryson (yeah, I know, but he cites reliable sources in this case) discusses it at some length in his book on US English. It was so bad that serious medical conditions often went untreated because women couldn't describe their symptoms to doctors.

      The reasons for this are unclear, but they probably have more to do with social climbing and attempts to delineate class structures, which historically have been nebulous in the US, than with leftover Puritanism.

      Things are improving, though. Why, in the states covered by the 10th circuit, women can go topless in public now. They pretty much never do, but they can, at least anywhere men can. (Businesses, for example, can require shirts for everyone.) The day may yet come when we're not stricken with horror at the prospect of our own bodies.

  3. Ken Moorhouse Silver badge

    minimize comm risks by helping you detect, capture, act on inappropriate messages

    What about a lawyer, sending verbatim evidence regarding a court case?

    Is MS o365 going to be allowed to mangle it?

    1. Anonymous Coward
      Anonymous Coward

      Re: minimize comm risks by helping you detect, capture, act on inappropriate messages

      I was about to post something similar. Can this be turned off by the user, perhaps on a per-document basis, or is it something which is set centrally by an admin?

    2. Danny Boyd

      Re: minimize comm risks by helping you detect, capture, act on inappropriate messages

      It's like spell-checker, you can turn it off or ignore its warnings.

      1. aks

        Re: minimize comm risks by helping you detect, capture, act on inappropriate messages

        Somebody would have that authority. What's the chances it isn't you?

    3. Anonymous Coward
      Anonymous Coward

      Re: minimize comm risks by helping you detect, capture, act on inappropriate messages

      The manual says you must wrap it in double quotes.

  4. Martin an gof Silver badge

    Multi-language documents

    So, if you have this setting on by default (I am assuming you can turn it off?) but happen commonly to write documents containing a mix of languages, supported and unsupported, what happens then? I already have issues with the spell checker assuming that a whole document is in the same language...

    M.

    1. Chris G

      Re: Multi-language documents

      The Thought Police on outlook constantly drive me nuts with spelling and grammar suggestions, I regularly mix English and Spanish in emails.

      It goes totally wonky if I add Russian.

      I haven't seen a way to turn it off.

      1. Anonymous Coward
        Anonymous Coward

        Re: Multi-language documents

        Try Teams. It keeps asking you "Are you writing in English now?" or "Are you writing in Spanish now?" and sometimes it even grabs the focus. As far as I can tell there is no "fuck off you cunt and don't bother me again" button.

      2. MiguelC Silver badge

        Re: Multi-language documents

        That!

        I write documents that regularly mix Portuguese, Spanish, French and English (well not all at once, but at least two or three), sometimes within the same sentence, and Word always loses the plot and throws hissy fits...

        Also some grammar rules fail regularly (eg. using consecutive homographs is relatively common in Portuguese) and there's no way of excluding those particular repetitions

        It's so bad I just turn spell checking when reviewing the document and ignore most flagged errors and suggested corrections.

  5. Neil Barnes Silver badge
    Flame

    Call me optomistic

    but I can really see this working well... no, wait... no, it'll work almost as well as the grammar. It will assume that people didn't mean what they typed, and try and dress it in the latest flavour of fashionable newspeak.

    Suggesting alternative spelling for words it thinks are misspelt is fine, provided there is no auto insertion of corrections (look at the amusement that can be had with autocorrect on a phone, for examples) - but that is where it should stop. No machine should *ever* replace what the operator writes...

    If you, as a company, have a problem with the way your employees communicate, the issue is one of education, not of farming the problem out to software.

    /rant

  6. Big_Boomer Silver badge

    FAQing Tw@ts

    Whilst I can understand detecting bullying and abuse, the use of so called swearwords often also indicates stress and dissatisfaction. Will the companies that implement this actually read and act upon such emails or will they just use it as one of the excuses to get rid of the "troublemakers"?

    I have never understood the classification of words as good or bad. I can insult someone deeply enough that they never speak to me again and do so without using a single swear word. Words are just words, tone and context are far more important than the specific words used.

  7. John Sturdy
    WTF?

    Perhaps we can learn from the past...

    Perhaps we can learn from the past... and revive some Shakespearean insults that it doesn't know about? For its sin’s not accidental, but a trade.

    1. deadlockvictim

      Re: Perhaps we can learn from the past...

      You base football player!

  8. Diogenes8080

    What fresh hell is this?

    I will be interested to see what is finally offered, but:

    US products are often notoriously US-centric but sold as-is to other English-speaking nations. Never mind about different cultural values and the resulting false positives; these dictionaries will miss a lot of unpleasant local vernacular that could potentially get your senders in trouble. If you work in healthcare, also beware these dictionaries picking up "clinically correct" expressions.

    A lot of Eastern European profanity is, I understand, euphemistic and only profane in context. It would take a fairly impressive AI to get that right. Other languages may pose similar problems.

    As other posters have pointed out, legal and HR teams often need to handle statements verbatim. You can typically exempt their mailboxes, but that leaves them unscreened and what about shared storage? In personal communication, what is acceptable amongst friends is not so acceptable from enemies and strangers. Again, it's a matter of context.

    You have a hit. You, or an automated rule, tells the sender / author not to. It doesn't take long for that person to adopt obfuscation, after which you are in a labour-intensive and ultimately fruitless loop as the possible permutations spiral beyond your product limits. Alternatively you can drop / delete / quarantine the content and create a support burden there instead.

    1. Evil Auditor Silver badge
      Thumb Up

      Re: What fresh hell is this?

      Eastern European profanity, such as a mother caringly saying to her daughter: «jebo ti pas mater»

      I'll leave this to your preferred translation service...

    2. Anonymous Coward
      Anonymous Coward

      @Diogenes8080 - Re: What fresh hell is this?

      I'm native from an Eastern European country and I can attest to that. Simply saying the words 'Your mother' to someone can be a serious insult. It's all in the context, the tone, the accent and even your facial expression can make the difference. And there are at leaset 2 or 3 variations of the word mother you can chose from in order to vary the intensity of the insult.

      Frankly, I find swearing in Western languages to be pretty mild and lacking imagination.

      1. veti Silver badge

        Re: @Diogenes8080 - What fresh hell is this?

        Well yes, saying "your mom" in American can be pretty insulting too.

        1. Michael Wojcik Silver badge

          Re: @Diogenes8080 - What fresh hell is this?

          Saying "hello" can be pretty insulting. It's all1 in how you do it.

          1Well, all except diction, by definition.

    3. John Sturdy
      Happy

      Profane by context

      A classicist friend once told me that every Ancient Greek word has three meanings: its literal meaning, the opposite meaning (for ironic use) and a smutty innuendo. I wonder if the filters try to remove *all* the words that could be smutty.

      The position of the lioness on the cheesegrater is still a bit of a mystery, but may well get through El Reg's filtering regime.

  9. Doctor Syntax Silver badge

    The headline & first line of the article makes it clear that this is yet another attempt at this, even for Microsoft. Slow learners.

  10. Howard Sway Silver badge

    But is it AI?

    Are we talking about a simple blocklist, or will they have to train an AI to police the language?

    Because if they want to train an AI, all they have to do is turn the microphone on whenever anyone is using their software, and they'll be able to capture the full range of profanity in every language and dialect in record time.

  11. Conrad Longmore
    Alert

    Scunthorpe

    Scunthorpe. I'll leave it at that.

    1. imanidiot Silver badge

      Re: Scunthorpe

      I'll add De Cocksdorp.

      1. Huw D

        Re: Scunthorpe

        Fingringhoe

      2. Anonymous Coward
        Anonymous Coward

        Re: Scunthorpe

        shitterton

  12. Anonymous Coward
    Anonymous Coward

    Could we please have a single checkbox in the MS Word options to turn off ALL of the annoying options.

    Something along the lines of "What I type is exactly what I want, don't correct it or reformat it or bollocks it up in any other way. Just leave it the fuck alone."

    1. Ken Moorhouse Silver badge

      RE: a single checkbox in the MS Word options to turn off ALL of the annoying options.

      There is a separate item on the Accessories menu for that. It's called NotePad.

    2. veti Silver badge

      Well, yes, it's called "autocorrect". You can absolutely turn it off entirely with a single checkbox.

      I did that for a time, but eventually I realised it really was nice to have something correcting my basic tyops, and autocorrect does a pretty impressive job at that. So instead I just went through all the options and chose exactly which ones I wanted to keep.

      1. Ken Moorhouse Silver badge
        Coat

        Re: and chose exactly which ones I wanted to keep.

        You forgot the one that corrected typos though.

        Edit: Sorry, forgot my coat again--->

      2. Michael Wojcik Silver badge

        At least for US English, the autocorrect list in Word is depressingly long, and you can only delete one entry at a time.

        Fortunately, it turns out to be relatively straightforward to automate the process of removing entries using Office OLE Automation, which you can drive from the command line if you use a Powershell shell. I was able to delete them all with half an hour or so of research and experimentation (I don't use Powershell often), and it would have been easy enough to leave some had I wanted to.

        I would have just left Autocorrect turned off, but I wanted it to convert "--" to an N-dash without screwing up hyphen-letter combinations as Word's "autoformat as I type" option to insert dashes does. (Tip for the Word developers: a single hyphen means a single hyphen.) Then I noticed that WinCompose gives me compose-hyphen-hyphen for that purpose, so I could have saved myself the trouble; but deleting all those autocorrect entries was so pleasing I didn't regret spending the time.

  13. Tron Silver badge

    Feck, Arse, Girls, Drink.

    Can you imagine proof reading an edition of 'Trainspotting' using this software?

    Turn off Auto- everything, because it's all shite.

    I suggested having user-definable levels of 'AI' in software in a recent design brief, and emphasised the need for users to be able to turn it off.

    As evidence I would point to Google Search. Type in a complex search request and it will use its third rate chav AI to ignore most of your search terms and direct you to an article about a pop star, whose first name matches just one of your search terms. Not having a 'Turn 'AI' off' button in Google Search has made it almost as bad as Bing.

    1. CrackedNoggin Bronze badge

      Re: Feck, Arse, Girls, Drink.

      The definition of AI is a little too broad to pin it down. However, most searches seem to return multiple hits all containing the most "a priori" likely meaning, based on all searches by all users - therefore your annoying result. One might hope that at least mutually similar results would be suppressed, so a wider range of interpretations would be returned. Or perhaps a couple of check boxes next to each result - "less like this" and "more like this".

      Perhaps part of the problem is the service is driven by advertising revenue. If search were a service paid for by the users, we might see better development.

    2. imanidiot Silver badge

      Re: Feck, Arse, Girls, Drink.

      very much this. Google has become pointless for some complex searches because it just ignores all those carefully crafted search terms for it's on interpretation of what it thinks you might have meant (had you been a boozed up 4 year old suffering ADHD).

    3. This post has been deleted by its author

    4. veti Silver badge

      Re: Feck, Arse, Girls, Drink.

      "That would be an ecumenical matter."

  14. Anonymous Coward
    Anonymous Coward

    Absolutely necessary for Office 365

    I absolutely understand the necessity for Office 365 to abolish swearing and worse.

    Here comes my anecdote: Last week my son was preparing a lengthy text complete with table of contents and bibliography.

    When trying to add a caption to a figure Word 365 on Mac decided to corrupt the document so that only 4 pages remained.

    No way to recover the text within Word. Pages and LibreOffice still displayed the complete text but could not deal with the bibliography other than simply displaying. Saving allowed to continue with Word without the bibliography sources.

    There is a Web version of Office 365 but the capabilities are even more limited than the Mac version of Word.

    TOCs have been requested for iPad since 2015 at least. Excel on Windows supports barcode generation natively on Mac it does not.

    So, yes, no swearing allowed is a good thing, and my son has learned several lessons.

    1. b0llchit Silver badge
      Coat

      Re: Absolutely necessary for Office 365

      ...and my son has learned several lessons

      Ehm, several,... make regular backups and never use any MS Office products?

      1. imanidiot Silver badge

        Re: Absolutely necessary for Office 365

        I'll add to that: "A backup isn't a backup until you've tried restoring it and found it to indeed contain your data". Because even if you DO save your documents Word has a nasty habit of refusing to load something it created 5 seconds before.

    2. Ken Moorhouse Silver badge

      Re: Here comes my anecdote

      A solicitor client of mine who used Word habitually used Save rather than Save As, and ended up with a 95-page lease comprising asterisks.

      WordPerfect, we need you! Table of Contents, Bibliographies were such a doddle.

      1. David 132 Silver badge

        Re: Here comes my anecdote

        VMS had, as I recall, an auto-versioning file system, which would also have solved your solicitor's problem.

        LEASE.DOC;1 LEASE.DOC;2

        LEASE.DOC;3 LEASE.DOC;4

        ...and so on.

        1. Michael Wojcik Silver badge

          Re: Here comes my anecdote

          Anything I care about goes into a source code control system (I prefer Subversion, but pick your poison) with a remote repository, even non-text-format files, and gets committed frequently. Then I always have multiple backups from various points on its timeline.

          I have conventional off-site backup mechanisms too, of course, but historical backups that are immediately accessible are invaluable.

          Some years back an academic friend and I mooted a document-editing system (I won't call it a word processor, since I think that whole concept is fatally flawed) that would have used LaTeX for formatting, with a relatively novice-friendly front end like a stripped-down LyX, and a storage backend that would be git using a custom diff that operated at the text-and-formatting-markup-token level rather than text lines and weighted replacements and moves higher than additions and deletions (an easy change to the standard MED algorithm), and frequently pushed to a remote repo. Then we'd provide a timeline slider so it would be easy for users to see how the document changed over time, and controls for reverting particular changes, etc. The idea was to keep essentially all non-trivial revisions of a document with useful visualizations and controls. Never built it, but I think it would be a vast improvement over the primitive 1970s Xerox PARC model most people are still stuck with.

  15. WolfFan Silver badge
    Childcatcher

    Oh, dear

    Many, many, MANY years ago (damn, has it really been that long? Now I feel old…) I used Eudora for email. I was then on a medieval history mailing list. (Several of my friends and relatives, all of history nuts, were on the list as well.) One of my cousins also used Eudora, and he got a certain update before I did, and sent a note to the list filled with much more profanity than normal. You see, we were then discussing a certain Guilliame le Batard. And Eudora’s latest update included a feature called ‘Peppers’ which allegedly showed how ‘hot’ your email was and advised you to cool it down. Multiple uses of ‘Bastard’ could and did generate the max three peppers indicating a very naughty post. Those of us who used Eudora made a game of seeing how many peppers we could get with as little effort as possible. My cousin tried to get three peppers on every post.

    I suspect that this MS feature may have similar results.

    1. Ken Moorhouse Silver badge

      Re: ‘Bastard’

      Just file it away.

      Being a history buff, I'm sure you already knew...

      https://www.wonkeedonkeetools.co.uk/files/how-did-bastard-files-get-their-name

  16. Anonymous Coward
    Anonymous Coward

    So that will mean most of the sysadmin emails after patch Tuesday are going to be somewhat 'different' then...

    "Dear Users,

    Another ******* mess with ****** Microsoft patches has caused a ****ton of ****** problems all over the ******* place. No ETA on when normal ******* service will return."

    1. David 132 Silver badge
      Happy

      "Brussels is the capital of *******"

  17. ibmalone
    Flame

    Seven languages you say?

    Wake me when I can change the default dictionary for new documents.

  18. vogon00

    Oh, great - more editing.

    Oh, FFS.

    I can't see the point of this MS 'written content filtering'. It's the human author's responsibility to ensure the content they write is sanitised to the correct level for their audience - not someone/thing far,far away, with a different (most likely wrong) understanding of the authors situation and context.There are so may ways that AI proof reading can screw this up. How do you train an ML model to deal with nuanced multi-language documents anyway with any degree of accuracy?

    It's got to be a human decision, by the author (and possibly the editor) of the whatever is written word is. There is already a scheme for doing this - simply tag things as 'NSFW' in the first paragraph. If people read past that warning, then they can't really take offence. It s a bit like getting a complaint about you walking naked past your own window; who's to blame - you for accidentally doing it, or them for looking? Smart people don;t take offence without real cause.

    Yet another example of political in-correctness enforcement going too far, and being passed off as a useful innovation. I used to think Orwell's '1984' could never materialize.... now I'm not so sure.

  19. A. Coatsworth Silver badge
    Trollface

    So many missed opportunities

    "Threat, Targeted Harassment and Profanities Classifier" Pfffttt...

    Why not calling it Targeting Incoming Threats System?

    Or Profanities and Harassment Universal Controller and Keeper

    Or Total Work Anti-Threat System

    Or...

    1. Jimmy2Cows Silver badge
      Childcatcher

      Try harder...

      Controls Limiting Insulting Text On Respectable Institutional Systems

      1. David 132 Silver badge
        Happy

        Re: Try harder...

        Unfortunately that acronym is already being used by the Committee for the Liberation and Integration of Terrifying Organisms and their Rehabilitation Into Society.

  20. Jimmy2Cows Silver badge

    Puritanical insanity and intolerance run amok

    Perhaps time to break out the Dickensian insults.

    Whoops. That'll probably trip the filters.

    Why oh why does MS continue to pander to this ridiculous minority that wants to tell everyone else how to live their lives?

    1. Anonymous Coward
      Anonymous Coward

      @Jimmy2Cows - Re: Puritanical insanity and intolerance run amok

      Maybe it's because nobody has the courage to stand up and fight ? Once you're being tagged by these new world commissars you no longer exist. Being wrong or right no longer matters, everyone will dissociate from you and will forget you ever existed. We experienced this kind of situation during the years under the Communist regime (it was one of their favorite threat agains those who did not comply) and I thought all this has gone into the history's dustbin never to be seen again. At least not in the civilized Western world. In those dark days I learned to keep my mouth shut and forefeit my freedom of expression. Suddenly, that old skill comes in handy these days.

  21. Anonymous Coward
    Anonymous Coward

    Newspeak is coming.

    This is a model of thought control. U can see this gradually getting tweaked until it gets to the point where the computer reports you for thought crime.

  22. Daniel von Asmuth
    Windows

    What took you so long?

    There has clearly been a need for a filter in Word against profanities like 'Microshaft' or 'Windoze'. (Microschoft and Windhoos in Dutch).

    1. Ken Moorhouse Silver badge

      Re: 'Microshaft' or 'Windoze'

      Sometimes they do acknowledge their inadequacies.

      Windows POS for example.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like