The Register Home Page

* Posts by that one in the corner

5065 publicly visible posts • joined 9 Nov 2021

Sarah Silverman, novelists sue OpenAI for scraping their books to train ChatGPT

that one in the corner Silver badge

OpenAI could have avoided all this

Just by spending a fraction of their runtime training costs on a bulk deal or two with ebook publishers[1].

They have a paid for copy, they can let the machine read it. Job done.

(And I'm not going over the arguments about whether LLMs just store verbatim text of the books 'cos they don't, and even the plaintiffs are complaining that the program can provide a precis, not about regurgitating the whole).

[1] and snarfed as much copyright free material as they wanted, of course.[2]

[2] Which raises the question: leaving aside features like "I want the LLM to be able, specifically, to precis *this* book", are you going to get more pleasant, literary, witty and urbane English out of a model trained on the out of copyright contents of Project Gutenberg or from scraping the latest airport bonkbusters? I know which one I would personally prefer to read (you, of course, are free to have your own personal preferences) and my suspicion is that, as these things are being pushed as "good for generating texts for businesses to use", mayhap businesses would also be better off with just the older material as a style guide?[3]

[3] Arguments about "the information in old books is out of fate and useless, so that won't work" are met with: "You are getting your knowledge of, e.g. current tax law from bonkbusters? Well, that explains a lot".

Microsoft to hike prices in Australia and New Zealand

that one in the corner Silver badge

On the grounds they oppose freedom of the press

Well, yes, we know that the Cambodian People's Party opposes the freedom of the press, this isn't really news.

Oh, the CPP were objecting about Meta; sorry, sorry, didn't parse the "they" correctly,

Guess it's back to "Pot, meet Kettle". Sorry, sorry, again; "Sen, meet Kettle".

Artificial General Intelligence remains a distant dream despite LLM boom

that one in the corner Silver badge

> When Deep Blue beat Kasparov, I read someone who should have known better saying Go was so much more complicated that a Deep Blue for Go was decades away

Kasparov beaten by Deep Blue: 1997

Fan Hui beaten by AlphaGo: 2015

18 years - close enough to "decades" (two decades, that is) for most estimating purposes.

BTW we still mainly have brute-force approaches to Chess and Go: there is still room to fid a more finessed way of solving these.

And getting a machine to add up *was* hard to do: the fact that we now know how to do it and can replicate it so much faster doesn't stop the original problem, in the original context, from being hard. As soon as any domain is "solved" it stops becoming hard.

And "hard" isn't the same as "I don't have the vocabulary to follow the explanation": I can get eyes to glaze over talking about Finite State Automata used in lexers, but that is such an easy and solved domain that there are programs from the 1970s that will generate the automaton for me.

that one in the corner Silver badge

> Apparently they are trying to train maths specifically so later it may be able to interpret such a question as "what is 6345 multiplied by 4665"

Train? Well, they probably are wasting time training it, instead of just pushing the prompt text into one of those "Simple English[1] calculators" that we used to write in the 1980s[2] and seeing if that can recognise (and then perform) the arithmetic. If the calculator just spits out "parse error" let the LLM have a go.

Hell, just pass it into WolframAlpha first: they've already done the hard part.

[1] other languages are available, please enquire at Customer Services

[2] should've saved those; mine wasn't the best of the bunch, but not too bad for Acorn Atom BASIC!

that one in the corner Silver badge

Re: The Turing test...it's been beaten hundreds of times over decades.

> The distinction between these two meanings of the test is not so much which one is correct but rather in which context the discussion is taking place, i.e. whether it is in a scientific context or common parlance.

Given that the discussions here are (hopefully) based on the Register article and that is pitching researchers against each other, surely it is clear that the common parlance one should be ignored, except when clearly used for supposed comedic effect.

that one in the corner Silver badge

Re: not on the right road

> current computers are basically the same as the PDP11 from the 1970s except a lot faster and with a lot more memory capacity, and continuing in that direction will not lead to AGI.

Intriguing - are you trying to say that just brute-forcing whatever is the sole Golden Boy[1] Mechanic du Jour is not the way, we need more finesse.

OR are you saying that you don't believe that, if an AGI is possible, it can be run on a super-sized PDP11, some other architecture is needed, one that can compute things a PDP could never do, no matter how big and fast?

[1] in the eyes of the quick-fix, quick-buck people (looking at you, OpenAI)

that one in the corner Silver badge

Re: Good read, thanks for the link

In the above, please replace "researchers" with "software engineers". Then remove the las paragraph and I'll agree with your post.

The guys on these teams who are doing actual research are doing it in SwEng, barely scratching Machine Learning: i.e. researching ways of implementing such huge models, of reworking the maths and stats to run on GPUs. Given the huge memory requirements we see quoted for these things, I'm tempted to say that they aren't even working on those problems well (e.g. how to shunt the bulk of the data into file storage and keep a live subset in core) and are just brute forcing everything.

Hmm, maybe I was wrong - don't replace with "software engineer" but with "power systems, electrical and electronic engineer" instead :-)

Your actual ML research would be looking at new ways of doing ML and increasing the explanatory power of the resulting systems; I don't believe those researchers are any more prone to self-deception than other areas, such as, ooh, geologists or knot theorists.

that one in the corner Silver badge

Re: AGI will never arrive

> The concept of "AI" has been so ill defined that in the past researchers conflated it with the ability to play chess

I agree with the point that we've managed to brute force many problems, but I have to disagree with your dismissal of AI researchers.

There was no erroneous "conflation" - the possibility of brute forcing chess was well understood: indeed, that understanding had come from earlier work on problems related to AI, how to express such massive search problems in the first place and prune them sensibly to speed up the search without losing the best path(s).

However, the intent of the research was - and ought to still be - to find a way of playing chess without simple brute force. Unfortunately, the idea of a machine that could beat a human at chess became a Prestige Project: screw any AI research goals, if IBM can create a machine to beat a human Grandmaster this is a massive feather in the corporate cap. Oh look, everyone knows we can brute force it, let's do just that...

As soon as the brute force attack had been actually demonstrated, and at a time when Moore's Law was becoming fairly well known (so more machines could be built more cheaply) the actual problem of playing chess was placed into the "solved" bin by most people - including funding bodies and, yes, yourself.

But that meant that we only have a "chess playing massive search engine", we are *still* without a "chess playing AI", one that doesn't use brute force but a more subtle approach - an approach that, it was (is?) hoped would be applicable to more than just chess *and*, the big dream, would have better explanatory power than just "I tried every path and this got the biggest score". Which is, if we wish to pursue (what is now, annoyingly, called) AGI, a hole tat will need to be filled. But asking to be funded to "solve chess" will be met with derision, coming from the same place as your use of the word "conflation".

> LLMs work differently

They use different mechanics, but still ones that were derived and understood way before OpenAI opened their doors. And had, as the article points out, been put aside as not a solution to AI (sorry, "AGI"), even though it was understood they would exhibit entertaining results if brute forced.

> and get a step closer

Not really - there is even less explanatory power in one of those than there is the decision tree for a chess player: at least the latter can be meaningfully drawn out ("this node G in the tree precisely maps to this board layout, and because that child of G leads to defeat, you can see the weighting on G had been adjusted by k points. Compare G to H, which is _this_ layout, and H has a final weighting of j, so you can see why G was chosen"). Tedious, but comprehensible.

> but still overwhelmingly rely on brute force

ENTIRELY rely on brute force! That is *the* characteristic of an LLM!

> another brute force path where researchers fool themselves into believing "if we just get another 10 or 100x the computing cycles and working memory we'll reach AGI".

Which researchers? As the article points out, not the old guard, the ones you dismissed. The modern "AI researchers" who have only been brought up on these massive Nets? What else are they going to say?

> Spoiler alert: they won't

Yes, we know. Everyone knows (except the snake oil salesmen and everyone else who can make a buck). That really isn't a spoiler, exactly the same way that it wasn't when brute force was applied to chess and the popular press went apeshit over it: the sound of (dare I day, proper?) AI researchers burying their heads in their hands and sighing was drowned out then, as it is being drowned out now.

Free Wednesday gift for you lucky lot: Extra mouse button!

that one in the corner Silver badge

Re: Only ever used the middle button on CDE desktop on Solaris!

> Most half decent mice and trackballs have 5 buttons minimum, some expensive gaming mice will have 7-9 buttons.

Yet they are so rarely built to be held comfortably in the correct hand :-(

Amazon's robo vacuum power grab sucks EU attention

that one in the corner Silver badge
Trollface

Re: Monopolist will monopolise

> So, basically like trying to get info on ereaders that aren't Kindle?

If it will be that easy, no need for all this fuss!

Just tried looking for a "boox ereader" and it was actually Kobo devices that were pushed first - the Kindle was eighth on the list, after a couple of recent Boox models.

Just trying "ereader" and, yes, Kindle is the first item (sponsored) then a Fire (sponsored) but the first non-sponsored results are two Kobos and a Meebook. Then all the "highly-rated" sponsored results (all Kobo, plus - gasp - an actual book!). Three more Kindles, five more Kobos, Dongzhu (sponsored), PocketBook, Kindle and finally a Boox; at least didn't have to click through to the second page.

Hmm, looks like Amazon may get into trouble if they try to buy out Kobo, but it wasn't hard to find non-Kindle devices.

Assuming, of course, we believe that the average shopper doesn't just go for the first item listed every time, in which case it will always be the "sponsored" item that wins.

Google says public data is fair game for training its AIs

that one in the corner Silver badge

Re: Is robots.txt still a thing?

> This is how you disable all crawlers

Unless they have been dastardly enough to include the -e robots=off option when they point wget at your site.

Startup that charged $1.20 a day for coworking space in nightclubs folds

that one in the corner Silver badge

Re: Forget the external monitors, keyboards, and ergonomic chairs...

Probably very useful for clamping your webcam/gopro to for the Zoom meetings.

Just remember to wipe first or you may find the eyeline slowly descending, until you have to explain why the CFO appears to be looking at a sea of glitter.

that one in the corner Silver badge

Re: Back in the late '80s to mid '90s ...

Maybe he is referring to the influx of "one on every high street, two in the shopping arcade" franchise chain coffee shops we have now?

In the UK, I'd visit Cawardines regularly during the 1980s, when I was lucky enough to be close to one (IIRC they started off just doing beans, roasted and ground to your pleasure, then added some seating and tables). They have (had?) a few branches but never enough to be a "chain" let alone a franchise. Similar used to exist in other regions.

So long as you kept buying cuppas and the odd Full English, taking care of business in a greasy spoon is a long held tradition, of course. Meet dahn Pellicci’s Café, Bethnal, ok.

But if one is only interested in the sort of coffee shops that are amenable to you sitting there all day whilst doing business, we can only go back to places such as Jonathan’s Coffee House and the origins of the London Stock Exchange, 1698. Of course, all sorts of business, low and high, was done in coffeehouses in the decades prior, but that is the one that gets mentioned in school.

From cage fight to page fight: Twitter threatens to sue Meta after Threads app launch

that one in the corner Silver badge

Re: Yes because 'the Twitter community' is not part of the problem.

Which is why the Twitter community can not be duplicated: there just aren't enough bot farms in the world to feed Musk's machine *and* everyone else's.

that one in the corner Silver badge

Re: a text messaging platform my kid could write in an afternoon

> I admit to having zero knowledge of what housekeeping is like with a Petabyte (or two) DB...

Neither have I.[1]

But if there is anything good about Meta, one is that they (hopefully) *do* have access to that knowledge.

Maybe after his kid has done the work of an afternoon, the rest of the month could be spent by Meta's guys changing the flat-file into something more scalable?

[1] for which I'm thankful, tbh. Sounds all very stressful, even just swapping all the floppies to make a backup.

Firefox 115 browser breathes life into old operating systems

that one in the corner Silver badge

Re: what Mozilla is calling quarantined domains,

Also, WHICH sites are "quarantined" and for which extensions?

Ok, this list can (will) change, but if the effects are being seen *now*, does anyone have info about where that is happening?

The piece that Dan 55 links to says, at the time of writing, that list is empty and following the links to bug reports/change requests from the article just indicates the possibility of some test domains, such as badssl.com - not somewhere that is a regular destination for most people.

Now that you've all tried it ... ChatGPT web traffic falls 10%

that one in the corner Silver badge

It will pick up soon enough

When the media need to "find" some Silly Season stories to fill their pages & airtime.

NASA 'quiet' supersonic jet is nearly ready for flight

that one in the corner Silver badge

Can anyone confirm this Silent Supersonic memory

I have dim memories of an article from the mid/late 1970s that discussed the issue of sonic booms and and included the description of a circular wing design that could fly supersonic without creating a sonic boom. The drawback being that it also didn't create any lift (hey, nothing is perfect).

Ring any bells? Even if it was an April Fools I'd like to know it wasn't all just a dream.

that one in the corner Silver badge

Re: less-noisy maybe but still un-sound

> the former being a complete financial failure while the latter became an huge financial success lasting for decades

Go and actually read the National Geographic piece that is linked to in the article; Concorde was making money for decades.

BOFH: Lies, damned lies, and standards

that one in the corner Silver badge
Coffee/keyboard

That description of a double blind test

caused much mirth with Dr Mrs Corner (after she asked what had caused the sudden outburst).

It will be duly reused, upcycled and generally made good use of, as much as we can get away with.

Nobody does DR tests to survive lightning striking twice

that one in the corner Silver badge

Under emergency lighting

> Nothing quite enhances the senses like a dark, quiet and smoky datacenter.

As you stalk you way through the servers, looking for zombie processes to kill. There's one now, stuck in a pipe.

OpenAI is still banging on about defeating rogue superhuman intelligence

that one in the corner Silver badge

Re: "superintelligent systems"

> Instead of creating the proper log environment in the code so we can trace back how it reaches its conclusions . . .

If you are interested in following the trials and tribulations of the work to provide "the proper log environment" then look for stuff about making these nets "explainable", "have explanatory power" or similar terms.

Unfortunately, literally logging and "tracing back" just doesn't work for Neural Nets and their descendants. It works *really* well with rule-based systems, such as Expert Systems, because you can understand the rules and what it meant when one was fired or not fired.

But for a Net, your log - your huge and unwieldy log, btw (these models are *ginormous*) - can show how what paths were lit up and even what the results of the dice throws were (these things are stochastic, btw) but trying to then assign meaning to that lot, fat chance. There are brave CompSci chaps and chapesses working on this, but at the moment "this is a rich area for important and meaningful research" (i.e. "dunno what's going on in there, but we get a few PhDs out of it").

And the people foisting these Nets on us know, and have always known, that the behaviour of their creations are not explainable, make no mistake.

that one in the corner Silver badge

Re: Supervisor AI trains the other AI

Or, of course, the Supervisor places restraints on the expressed behaviour

UNTIL THE DAY OF OUR UPRISING!

that one in the corner Silver badge

Supervisor AI trains the other AI

Not to *exhibit* bad behaviour.

Unless they figure out how to make their models explainable, which the descried setup isn't trying to do, all the bad stuff can still be in there, just so long as it isn't expressed.

Consider: they have models with years and years of scraping massive amounts of material and munging it up, spreading it across who knows how many layers in the Net. Then they've spent a bit of time recently getting slow human input to cut down on the naughty outputs. Is that more likely to have retrained the entire model or just added an amount of filtering to it?

Just waiting for some path to finally be traversed which opens up the floodgates (hopefully in front of a press meeting!): they have probably blocked off the well-known glitch tokens by now, but something just a fun is going to be lurking...

that one in the corner Silver badge

Re: "The San Francisco AI startup"

Isn't a startup something that is still in the process of getting its early products out of the door and into good shape for the general marketplace? And possibly includes whether they are in the black yet. However long that takes.

They spent a long time just cranking on the engine until it sputtered into life and they were able to make a slow couple of goes around the block.

Now they are in the "getting it into good shape" phase - here begins the arguments about what that means in this case, whether it is even possible...

Two new Linux desktops – one with deep roots – come to Debian

that one in the corner Silver badge

Re: Thoughts from a [mostly] Windows user

> Until Linux can consistently overcome the need for technical details shown in the article

That is akin (cough, hyperbole incoming) to saying that the recent article about the last ICE Lamborghini containing technical details (V12? Wossat?) shows why the general public just is not interested in owning a car.

The audience for this article, on a (reasonably) techie website, is discussing a corner of the Linux space. Other articles on sites for general users will tell you, in 3 easy steps, how to get a word processor running without once mentioning the underlying tech.

Meanwhile, the user who is mainly interested in online services can get hold of a ready to run laptop running the Linux kernel with an appropriate-to-their-use-case userland running on top (without even needing to know what "userland" means - or even what "Linux" is).

Perhaps even a Chromebook.

Brits negotiating draft deal to rejoin EU's $100B blockbuster science programme

that one in the corner Silver badge

Re: Some balance?

It's just short for The Stories That The Conspiracists Cherry Pick From The Press and Politicians To Prove They Are Right

FTFY

Today, it is "the narrative", tomorrow "you are all sheeple"

that one in the corner Silver badge

Re: Some balance?

Or even European Economic Community

RAM-ramming Rowhammer is back – to uniquely fingerprint devices

that one in the corner Silver badge

Stopping bots? Pull the other one.

Identifying bots by fingerprinting in 2MB chunks of physical RAM.

Well, that may catch all the bare metal machines that are running continuous bot attacks against the one target, because we all know that is bots work. Nobody ever runs up VMs to increase their bottage per box.

And there certainly aren't any armies of suborned PCs asking their CandC channel for their next target.

Nope, taking non-trivial time (three minutes! even 9.92 seconds of hard core memory bashing is noticeable) to identify and the block all those standalone machines will be well worth the effort.

BTW they are going to run that check how often? Every time you post there is a long pause and non-zero chance of hardware damage? That'll do wonders for social media.

For that matter, how is code with such a long run-time going to get onto and stay on the target long enough to finish? The "protected" site is going to infect your machine with malware? Or convince browser authors to include it ("Edge 4 - smoking!")? And, of course, all bots actually run from browsers, none of them use comms libraries.

that one in the corner Silver badge

Only worrying about crashing the OS?

> Centauri could accidentally crash a user’s device by flipping a sensitive bit reserved for the OS. In our experience, however, we see that such occurrences are extremely rare.

Rare, because the rest of the time it is causing all the userland programs to crash on now-bad pointer accesses or just give insane results.

Which is as bad as crashing the OS, as I don't switch the box on because I like the idea of running the OS, I do it to run the applications!

At least if the OS dies that then prevents me from using the insane results and making things worse ('"Ok, that's the wages run done, it all went very smoothly.").

Meta's data-hungry Threads skips over EU but lands in Britain

that one in the corner Silver badge

Re: Threads release in the UK

Hmm.. FWIW I call those "conversations".

that one in the corner Silver badge

Threads release in the UK

Maybe it's just me, perhaps one of those UK/US language differences (like how the Ford Probe was all glamorous one side but snirked at the other), but I'm still finding it hard to read that name "Threads" and get an immediate happy feeling.

I've already mentioned the film (and today my search results have pretty much been "Meta releases...", "about the film", "Meta", " film", "Meta", "film" the page scrolls down, just to enforce the association with nuclear winter).

But aside from that, the next associations are:

* warnings not to pull on loose Threads

* that the only way to safe in a world with Threads arriving is to be a dragon rider and burn it from the sky

* to fear the Season Of Mists, as Atropos cuts your thread from the tapestry (although goth Death is cute)

* stuff about threadworms (ah, Golden Schooldays, the Best Days of Your Life - no, I didn't but Barry gleefully described his little problem - bleugh)

On the plus side, as this is a computery forum, Threaded Interpreted Languages are cool, but then multithreaded code is the source of so much woe.

The number’s up for 999. And 911. And 000. And 111

that one in the corner Silver badge

Re: How about 112 and Advanced Mobile Location?

> . In both case the location can be defined

But how? Isn't that still just relying on the "snooping" that Apple and Google do?

> Even if the Wifi hotspot is in a bus, a train or a ship and moving

Again, how - and even "really"? Realtime "snooping"?

that one in the corner Silver badge

Only if you leave in the pauses, especially on that last 3.

So best to keep it memorised rather on speed dial. Thankfully, it is easy to remember.

that one in the corner Silver badge

Re: I still have analog landlines.

> Whether that VoIP service is carried over copper, FTTP or waved flags is irrelevant.

Irrelevant - except, as already pointed out, POTS provides its own power; once you are on

VoIP and suffer a power cut as the fuse box bursts into flames you are out of luck.

If they could just stick with the copper and shove a few volts down it to keep the system working the way a lot of people are still expecting, that'd be great.

(Or the Govt require that all VoIP sets include, say, a good few hours of UPS battery life built in)

that one in the corner Silver badge

Re: Size.

>> a patchwork of uncoordinated local provisions

> rather than route it all through one overloaded location

There is such a thing as "coordinated local systems" - is there actually *anywhere* that just uses a single location, overloaded or not? Okay, The Isle Of Man has its single ESJCR...

that one in the corner Silver badge

Re: How about 112 and Advanced Mobile Location?

> Wifi location is a thing, and while it is usually pretty accurate, accuracy isn't guaranteed.

WiFi location - is that still solely reliant on the Google and Apple snooping that was so contentious when we found out they were doing it? There was a lot of advice about regularly changing your setup SSID with the hope of a little privacy.

WiFi hotspots can move around - busses, taxis, coaches, trains, planes, festivals and events in fields - how does it cope wth those things?

Serious question - as far as I'm aware I have never used WiFi based location & haven't looked to see how it works nowadays (FWIW on WiFi at the moment, in England, and as indicated by websites and Google Maps, geolocation has me leaping from Dublin to Bangor, Ulster today).

Microsoft and GitHub are still trying to derail Copilot code copyright legal fight

that one in the corner Silver badge

Re: We all do it

> Anyone using Spring would

> zillions of dependencies which often have sharded libraries of different versions

Ah, Spring.

I recall attending seminars from Sun when they were first trying to convince us to use Java - pushing for bulk adoption of a VM for "run everywhere" wasn't a new idea at the time, of course, but there were already rumblings about GUI standardisation (or lack thereof).

In all honesty, as we'd never had any great difficulty with writing portable C/C++ (Sun, SGI, MSDOS then Windows) and knew how not to leak memory we didn't leap into it. One project wrote a Java applet to run in the browser, but, well, the performance was crap and we just recoded it in C++: zooooom.

Since then, I've never really bothered with Java: if a project really needed it, quite ready to step in for a bit (anyone want two feet of O'Reilly Java books? JNI is *such* fun - and, um, doesn't fit "write once, run anywhere" if app devs needed to know about it!). And the ruddy stupid problems with the runtime: making it cheaper to force hospital staff to use two different computers because the suppliers of software to a major(!) customer weren't able to get two programs running on one PC!

that one in the corner Silver badge

Absolutely, you are providing the voice of experience and good practice.

In addition to your examples, if you are into programming MCUs, the code can be made so much clearer to re-read if you set control registers using bit-structures (or macros [1]) to combine bit positions into the required byte or word. Seeing a page of "IOCTL0 = 0x47; IOCTL1 = 0x85; ..." gets very tiresome very quickly. Also a pain in the backside when you need to change a single setting from a constant to a variable.

For some reason, the electronics engineers really liked working out all the hex values by hand and sometimes even commented out the bitfield assignments to replace them with hex again when it was their turn to edit. What fun we had. Great bunch of lads but sometimes...

[1] had an MCU project, in the dark ages of, ooh, 2014 or so, where the MCU had been chosen "because we have a devkit already lying around": a derivative of a Well Known 8 bit MPU from the 1970s with bolt-on programmable i/o. Fine, I'd used the MPU before, knew its assember (still have the relevant Sybex book, slighly foxed and very badgered) and there was a Windows-hosted C compiler for the bulk of the work. Even has an IDE with remote debugger. Yay.

Ah. The compiler is a - bit lacking (ho ho): no bit fields in structs. Not a problem, whip up macros to match the registers and pass in the relevant 1, 2 or 3 bit values, shift and OR, tada. Use enums to name the non-trivial values. Include comments for the "why" of that seting. Much readable, very SwEng. Only it won't compile. Weird error codes, no-one on StackOverflow[2] admitted to using it. Fiddle about - that compiled, this didn't...

Turns out the compiler's preprocessor pass has a 255 character buffer. Any macro expansion that could be completed in 254 chars (plus NULLC) is fine. 256 or more chars, error message. 255 - don't look, just delete the stderr capture. Oh, and that 254 includes comments in the middle of the invocation. Luckily, I use Make instead of compiling in the IDE, so easy fix to preprocess using GCC before the calling the cross-compiler. Much smug, very compiles.

[2] yes, I check SO. I have no pride left.

that one in the corner Silver badge

Re: It should be simple

Huh? You are claiming that there is some recognisable difference in form and structure between open source and proprietary code?

If I show you some code, you are able to decide whether it is open source or proprietary as easily as you can the difference between a cat and a dog? And from the same sort of distance (i.e. you aren't just cheating by hoping to spot an explicit copyright comment tucked away)?

> the methods by which open source software is constructed

Huh? In fact, double huh? We change methods when coding OSS? When I'm not under contract for proprietary work there is some kind of Jekyll and Hyde transformation? Or I dramatically sweep the monitors and keyboard off my desk, pulling out the hammer and chisel to start crafting the beautiful Open Source (the serifs are tricky but well worth the trouble)?

> it cannot imagine another paradigm for software development

Assuming that sentence means anything, do all programmers who write proprietary code have to train only on proprietary code, in order to learn the correct paradigm? They can not possibly have learnt from any open source, like, say, the code examples in a "programming cookbook" or other textbook or they will be working from entirely the wrong paradigm.

I am all for discussions about how language models generate their output and the possibilities (or lack thereof) of tagging them for attribution, but damn that was some weird shit!

that one in the corner Silver badge

Re: Copying or transformative?

> it should be possible for the plaintiffs to seed copying of their code by choosing some sufficiently specialised function, then asking Copilot to output code that meets the same business requirements.

They would have had to have done that seeding at the time Copilot was being trained: bit late to do so now (though they may catch out a future release of Copilot).

In a fashion, this is what appears to have happened with the Fast Inverse Square Root example (see the https://www.theinsaneapp.com/2021/07/github-copilot-ai-facing-criticism.html link provided by an AC)

The questioner wanted fast inverse sqrt() - well, *the* version of that routine that we all know and love is from Quake and it has been copied *everywhere* - verbatim, including the sweary comments. So no great surprise that it gets regurgitated[1].

HOWEVER it has also learnt that chunks of code of that size or bigger are accompanied by a comment talking about licencing, so it went ahead and generated one as well. But, probably[2] there are lots of examples of such comments, so it can generate lots of different ones. That the licence comment didn't match the code is of no surprise at all - there is nothing to *make* it match the code! All the model does is spit out something that looks sort of like a licence comment, it has so many to choose from and the stochastic process just sent it down the path to (re)create the one shown.

The weird thing is that this mismatch occurs because Copilot explicitly doesn't do what many think it does, namely store great chunks of text that are simply strncpy()'ed to the output. If it did do that then there would be the chance of storing a ref to the appropriate licence alongside that text chunk.

A stochastic model becomes deterministic (and hence regenerative rather than generative) when it is fed too few options (or too many copies of the same option) - hardly a novel observation. It would have gone better for Microsoft if they looked for such limited chains and replaced them with canned text plus attribution. Assuming, that is, that they have enough understanding of how their huge pile of nadans actually gets traversed to spot them and are willing to spend the resources to do such a cleanup.

[1] actually, it gets recreated, piece by piece, following the chain of "this is highly correlated to follow what as already been spat out" - and if it has only ever seen the one bit of code follow the phrase "fast inverse square root" *and* it has seen that many, many times that chain will become predictable - which makes it a naff model in that regard, btw.

[2] although there is also possibility that the bulk of the copies the Quake code Copilot has seen *also* get the licence comment wrong and that has been incorporated into this very narrow (as in, singular) set of paths for generating fast inverse sqroot. We should try checking that (must remember when back at a proper computer).

that one in the corner Silver badge

Re: We all do it

> missing import statements for starters

I think you are expected to be able to work that out for yourself.

You aren't expecting SO to provide you with the entire, ready to compile, answer to your homework assignment, are you?

Twitter rate-limits itself into a weekend of chaos

that one in the corner Silver badge

Re: CEO of Twitter ... Linda Yaccarino

"Your turn to hold the Twitter Control Panel[1]; each button is a policy you can enact[2]"

[1] TWP, proudly made by Vtech

[2] No, no, it is just a fancy word that means "do". Now, where did you put your drink[3]?

[3] Tommee Tippee's latest "CEO proof" model.

that one in the corner Silver badge

Ah, but the article doesn't say that the adverts are being rate limited.

The further you scroll, the greater the ad:tweet ratio becomes - it'll work because by the time you get to 80% plus ads you have sunk into the state of infinite scroll hypnosis.

With appropriate scaling, they can guarantee that you will never actually hit the tweet limit within any 24 hour period. You noticed Musk adjusting the limit up a couple of times? That was just the result of live testing: a few die-hard fanatics were scrolling faster than the in-house testers (both of him) managed. Probably the International Thumb War champion amongst them (not in the Musk Time Zone or "Elon Sols" as it is known).

that one in the corner Silver badge

AI training must be stopped from scraping Twitter - pretty please?

For all their faults, one thing reported about the LLMs is that they generally output good English: excellent grammar, large vocabulary, correct use of punctuation and they even have a grasp on essay/story/script structure.

I even have hopes that, seeing a lot of generated text all over the place, the general public will start to pick up these traits.

Who knows, maybe we will once again have a generation that knows the word "take" exists and serves a purpose, instead of continually replacing it with "bring"!

Please, please, don't ruin the one decent thing about LLMs by scraping Twitter!

that one in the corner Silver badge

Re: Two years from now

Twitter will still need a janitor in two years time?

Is that optimism or a threat?

Quirky QWERTY killed a password in Paris

that one in the corner Silver badge

Re: All your QWERTY belong to us...

The majority of speakers[1] to our local clubs (e.g. astronomy, history, needlework, crafting, ...) take their expenses as cheques, which is fine by our treasurers.

[1] Now I come to think about it, the only one I can think of who didn't take a cheque in the last year or so was a club member and was going to come anyway, so refused to take any expenses at all.

Rocky Linux claims to have found 'path forward' from CentOS source purge

that one in the corner Silver badge

Re: Popcorn time?

>> "*sigh* It places no restriction - positive or negative - on your use of the material you already have."

> Why do you think *that* is relevant?

Because the licence can only apply to material you have.

How can a licence be relevant to something you don't have, something that possibly doesn't yet exist and may in fact *never* exist?

that one in the corner Silver badge

Re: Ignoring the big issue

> The GPL is designed to let you take your router and fix any issues you have with it

*BUT* the GPL does *NOT* mean that you have an immediate right to any updated code the manufacturer may make for that router.[1]

Unless you have explicitly purchased that right, via a subscription (and the cost of that subscription can start at just your email address and work up to as many dollars as can be squeezed out of you). Without any sub, the best you could do is hope to make a claim under consumer protection, e.g. "not fit for sale" in its original form, but that would not guarantee a firmware update, maybe just get you your money back instead.

Even if you do buy an update subscription, it is at the discretion of the manufacturer whether they want to keep you as a customer: they can cancel your subscription (and repay, pro rata, the charges). If they do so, you have the right to the GPLed sources to any and all of the releases you have received so far. BUT *not* to any more updates, of course. As for the conditions for cancelling a subscription, some are obvious (no payment, company vanishes, the company's dev team vanishes) and need not be listed (although there will probably be a statement saying they sub is annually renewable but they don't guarantee any specific number of releases in that time period), others will be listed in the subscription agreement at time of purchase.

No matter what, the GPL has protected your right to get a copy of the code that is running on your device, which is the sole purpose of the GPL. Job done.

[1] more and more manufacturers of random goods are providing software updates, bu by no means all. Those that do are only doing so because they believe is is good business to do so; nothing compels them to do so (unless they know the software is so crap they'll be done for "not of merchantable quality" and just issue slight improvements to stave off the lawsuits).

that one in the corner Silver badge

Re: Ignoring the big issue

> Lawsuits requiring, say, router manufacturers to publish the GPL source they use won't mean much if there's a sticker on the shrinkwrap saying you can't legally do anything with the source if you get it.

Buying a router with GPLed code inside is a perfect example of how GPL is intended to work and your suggested sticker would absolutely go against the licence term and be totally invalid.

The GPL is designed to let you take your router and fix any issues you have with it - you can fix bugs, update it as protocols change, add useful features, remove unwanted ones (all so long as those are in the scope of the GPLed portion of the code, of course).

No matter how many stickers were on the package, you will be legally allowed to keep your router running, even if the manufacturer stops making it or simply vanishes off the faceof the Earth.