back to article ChatGPT, how did you get here? It was a long journey through open source AI

When OpenAI released ChatGPT 3.5 in late November 2022, no one expected much from the new release. It was just a "research preview," explained Sandhini Agarwal, an AI Policy researcher at OpenAI. "We didn't want to oversell it as a big fundamental advance," added Liam Fedus, a scientist at the org. Ha! That was then. This is …

  1. Great Bu

    Open or Closed...

    I don't mind if it is open or closed source as long as it keeps doing my homework for me.......

    1. Sp1z

      Re: Open or Closed...

      Good luck when you get to the exam........

      1. Great Bu

        Re: Open or Closed...

        The only exams I get these days are on my prostate.....

        1. Anonymous Coward
          Anonymous Coward

          Re: Open or Closed...

          I hope you passed, or at least can pass when you need to.

          Old guy.

    2. NATTtrash

      Re: Open or Closed...

      I don't mind if it is open or closed source as long as it keeps doing my homework for me.......

      Well, if your homework passed and your teacher stamps it off as OK, then your teacher is more in need of education than you are.

      We took ChatGTP for a ride, asking it some very specific questions, requiring expert medical, regulatory, and therapeutic knowledge, as a result of the suggestions that "ChatGTP will write top notch articles and you can't see it did!" So, we asked it to write and discuss (Q&A) a potential peer reviewed article, its statements backed by literature references.

      TLDR: Don't get me wrong. The language model is very polite and impressive. There is the genuine feeling that you are talking to a real person. But...

      We asked it to write a piece on an innovative medical treatment. Not only did it come back with wrong base facts and assumptions (and no "Talk with your physician" disclaimers), was its data significantly dated (literature only ≤2012 if I remember correct), but when asked to give literature references for its statements, it made up references that were wrong so you could never find them (different DOI, wrong journal, pages) or completely fictional, do not exist and were made up by combination of other literature references (!). Furthermore, it told us that pharma products and medical devices are governed and controlled by the same legislation, then, when corrected, that the different regulations and directives do the same, that some medical treatments in several countries were reimbursed (when they are not), or, most serious, to do "certain things" in the case of some serious medical issues. And that is of course worrying when we know that "people" seem to look up their ailments on line by default, and act on the info they find there.

      When we pointed out the blatant mistakes and, yes, indeed, fiction/ lying and BS, again ChatGTP was very polite. But if you don't know what the answers should be, well... Us it showed that as a knowledge base it is utter rubbish, and even worse, misleading, selling BS as "the truth". Not really new to "The Net" in general I suppose, but worrying if you realise that this is what the MS and Google of this world want to answer your burning questions with...

      For me the take away was that the suggestions that OpenAI gives on the entry page are perhaps the most valuable; I think I remember something like "Start a conversation with ChatGTP something like: What would be the perfect present for a 10 year old?". Then again, after what we got back... Is it still OK to give a 10 years old a chemistry set? Or is my age now showing..?

      1. seldom

        Re: Open or Closed...

        Sounds like it was created for HR bods or CEO's

    3. Ken Moorhouse Silver badge

      Re: Open or Closed...

      Sounds like you are talking about boxes.

      A much better name for the product would have been Pandora's Box.

  2. unimaginative Bronze badge

    It sound as if they are doing somerhibg wrong, but the entire point of using a BSD/MIT type license (rather than a GPL type license) is to allow propreitary redistribution.

    Yes, it would be great if it was opensource, but they are not doibg amything sneaky by keeping it proprieatryc

    1. zuckzuckgo Silver badge

      Other organizations are still free to assemble the same open source components into a competitor, they just can't access whatever "special sauce" was added by OpenAI. And it's probably just mayonnaise.

    2. matski

      But they did and are doing the very wrong thing - they lied and mislead everyone. Pretended they are doing "research" and claiming its all for "science", getting all of that data for free, ignoring the licensing with no one opposing and then all of a sudden selling themselves to Microsoft. If Microsoft went out and said that they are going to collect all open source code available publicly to create a commercial product for generating the code - there would be massive protests I can tell you that. They would want to prevent MS from doing that and maybe even force it to release software as Open Source. Sneaky cheeky way this got monetized is something that really annoys me.

      Also all I heard they took all code available in github - I didn't hear anything about MIT license only. They probably just took everything. And even MIT and any other license - Never Allow To Take Out the Author. You need to attribute the author! And GPT is not doing that. To me this is breaking the license law and the using the code from GPT in your own - you might end up in legal issues.

  3. elsergiovolador Silver badge

    Americans would say

    Schmucks - all those developers giving out their time for free, missing out on important life events, family, self-development...

    Just to get those few stars on GitHub that nobody cares about or get a few more followers.

    Then maybe one day someone from FAANG will notice and schedule the dream interview.

    In the meantime big corporations have other ideas.

    Make billions and don't pay a penny back, and not a penny forth.

    1. Antipode77

      Re: Americans would say

      Government should tax business users of open source. Then put it back into supporting the open source eco system.

      We all would be better off.

  4. Pascal Monett Silver badge

    So, NotOpenAI is monetizing ?

    Vote with your wallet, people.

  5. matski

    What about borrowing (stealing) the code

    The truth is that OpenAI stole the licensed code and is keeping it inside itself as a part of it's model. They just claim it was "public" so they could take it. This was allowed when it was doing the "research" and non-profit but right now it's a commercial software. And they are using this original code (that's tokenized and stored in some form) to generate answers and code without showing any license or even credit for the original creator. This is a massive legal issue - this product is flawed and products that use the code it produces can be later sued for infringement. It did so for every other piece of work - science papers, art, etc.

    1. Long John Silver

      Re: What about borrowing (stealing) the code?

      Your remarks illustrate the mess we get into when ideas, and when easily replicable digital code, are made into a pretence of property.

      Just look at what already goes on in the area of software copyright and software patent disputes. Consider too, the rotting rambling edifice of 'intellectual property' supposed rights across the board of industry and culture. A framework of 'rights' so extensive, so convoluted, and so prone to ambiguity and contradiction, that bright ordinary people understand little, and trained lawyers need to specialise in particular aspects. All of this arising according to the dictum of a false proposition giving rise to all propositions true and false: the origin of this mess being the false supposition of ideas being equatable to physical property.

      The only possible currency for valuing the worth of ideas rests on attribution and praise.

      1. matski

        Re: What about borrowing (stealing) the code?

        You seem to dismiss the idea of intellectual property. I understand you are trying to say that it would benefit the humanity not to have any boundaries at all and just use whatever creation someone's made. Even if I partially agree with you - in that case GPT should also be free and available for download and instead it is a closed commercial product that is benefiting from other's work while not sharing much on its own. They even stopped sharing information about internal structure of the models created. That is hardly a fair move.

        However, I do not agree with you fully because there is a significant effort that one has to take to perform an act of original creation. It requires skill and years of mastering and synthesis of information for numerous sources (visual, art, human interaction, abstract ideas, pain, etc.) to build something unique. In that sense AI is not adding anything to this creation - just reassembling and regenerating what it already "knows" in a different way. And intellectual rights need protection because it's too easy to copy someone else's work. If they didn't exist people would not want to even create anything as it might just get taken and reproduced by someone with more money, etc. This would not benefit the humanity as creativity would cripple - it would not be economically viable.

        A good example I came across would be - would it be fair to get all of the books by Stephen King into an AI and then just generate a new book based on that and publish it by the name of Sarah Queen to directly compete with the author? Maybe even generate a better book. I think it would not be fair and if language model has been built that contains the books inside to generate new content - that would be derivative work from the original author. And original author should consent to that or at least be able to be credited and compensated.

        Same with those pictures, art and code - GPT contains those in its model without even crediting the authors or where it comes from.

        1. Long John Silver

          Re: What about borrowing (stealing) the code?

          You raised the anticipated question about means for rewarding creative activity.

          What I, and others, favour is a return to pre-Statute-of-Anne (1710) days and abandoning all that followed regarding copyright and patents. Setting aside, various royal prerogatives in European nations of old to issue monopoly licences to favoured people, nobody dreamt of income for life (and beyond) accruing from a single mental exertion.

          Two categories of creative people existed. The first drawn from persons of means; these also mostly being people with basic education, and beyond. The other consisted of a wide category ranging from artisans through to scholars who gained income via divers channels from customers, employers, and patrons. Royal courts, noble families, the nouveau riche, universities, and institutions within the Church of Rome, were major sources of patronage.

          Leonardo da Vinci had a succession of patrons. He received no income from works after their completion satisfactory to patrons. However, his accomplishments became increasingly widely known. Works that please others acquire approbation. Resulting from that, reputation builds. Increasing reputation enables creative people greater choice among prospective projects. They gain influence on patrons as to what a new project could be. They command higher fees/salaries. They set aside money for old age.

          Our recently begun era of the Internet offers challenge and immense opportunity in the context of creative endeavour. The former being the inevitable collapse of effort to equate artefacts representable by digits with physical property and its ownership; simple mass disobedience is seeing to that at opposite ends of culture: streamed/recorded popular entertainment, and in academia.

          Opportunity flows from the chance of reaching out to more prospective admirers/patrons, and that more quickly, than ever before. Another factor being separation of 'content' (digitally represented) from individual instances of a substrate (e.g. print on paper). To publish a novel, recorded music, film, or computer program, no longer depends upon traditional resources or skills; whilst skilled aid may be required this will be on a fee, or collaborative, basis rather than a stake in pretence of a digital sequence possessing monetary worth. It's an intellectual free-for-all with anyone, should they choose, able to distribute any sequence, to distribute a modified sequence, or to produce a 'derivative' work; the key DISCIPLINE holding this together being the requirement to give proper attribution to the originator(s) of works one distributes, and for all works modifying or deriving from somebody else's original work. The only crime is false representation based on deliberately not acknowledging somebody else's work, and thereby feeding from unearned reputation.

          Incidentally, blockchains offer powerful means for recording sequences of attribution, these along with points of bifurcation; this along lines of citation indices.

          Money accrues from voluntary patronage and, perhaps, from sale of associated goods and services. Only patrons pay upfront; this in expectation of further good works from an individual/collective they admire. Patronage is not financial investment. It is cultural investment in hope of return, which cannot be expressed monetarily (anathema to Neo-liberals). This bears strong analogy with how academics subsist on various kinds of patronage (employment by an institution, and research grants). The world, via the Internet, is every talented person's oyster.

          IP rentier economics, i.e. selling access to, and rights to distribution of, ideas represented by digital sequences, is incompatible with market-capitalism because digital sequences, easily reproduced, lack scarcity. Supply/demand and price discovery cannot apply. There is a pseudo-market dependent on legally backed monopoly.

          The only true market is for competing creative talent along with associated technical skills. Reputation, aided by attribution, determines success.

          The above easily carries forth to consideration of the destructive effect on creativity of patents.

          (Released under the Creative Commons Attribution international licence 4.0)

          1. katrinab Silver badge

            Re: What about borrowing (stealing) the code?

            "Incidentally, blockchains offer powerful means for recording sequences of attribution, these along with points of bifurcation; this along lines of citation indices."

            But blockchain offers no verification of whether the original claim recorded on the chain is real or not.

          2. Michael Wojcik Silver badge

            Re: What about borrowing (stealing) the code?

            "Incidentally, blockchains offer powerful power-hungry means for recording sequences of attribution..."


            (Proof-of-work) blockchain is not a solution to anything. It's a miserable way to do an append-only ledger.

    2. MOH

      Re: What about borrowing (stealing) the code

      Its not just the code though. The training data too.

      The whole thing is essentially a giant plagiarism machine. Sure, it'll chop and change a bit, but ultimately all it's doing is regurgitating other people's content.

    3. elsergiovolador Silver badge

      Re: What about borrowing (stealing) the code

      This is a massive legal issue - this product is flawed and products that use the code it produces can be later sued for infringement. It did so for every other piece of work - science papers, art, etc.

      It's not an issue, because they now have billions, so they can ensure any politician coming forward with anything that may challenge them, quickly changes their mind.

      Once the rich put their greedy paws on something, there is nothing you can do.

      You can scream from the top of your lungs that is is unfair or even illegal, you can spend you lifetime savings on pursuing this legally, you can even crowdfund legal challenge, but you'll get nowhere.

      It's the world of the rich and they can do whatever they want.

      1. Size10

        Re: What about borrowing (stealing) the code

        Maybe we should set chat-gpt/derritive ai works the task of pursuing this for us?

  6. Long John Silver

    Where is Robin Hood?

    Eventually, various incarnations of Robin Hood will release corralled code into the wild. Once gone, as IP rentiers in general are learning, no amount of huffing and puffing, no number of erroneous cries of "Theft!", and no host of avaricious lawyers, can coax the code back into the corral.

    Scattered among academic and commercial centres across the globe, the code will be used, extended, and improved. Unless by then, all Western universities have been converted into "for maximum profit" institutions, as demanded by Neo-liberal doctrine, academics shall seek kudos flowing from open publication of their efforts. Then there is the Far East and 'Global South' to consider: already they chaff at Western (mostly USA, but cheered on by 'lesser' nations) restrictions on use of ideas and trade of products and services derived therefrom.

    The days are numbered for behemoths such as Microsoft. The pace of change in the 21st century rapidly assigns them to anachronism. Fleet-footed small firms, co-operatives, and cottage industries shall dance between the legs of ailing dinosaurs.

    1. GraXXoR

      Re: Where is Robin Hood?

      I’ve come across your comments these last few days and can’t help thinking you got GPTchat to craft them for you. The language structure is incredibly stilted and right at the bottom of the uncanny valley. .

  7. anderlan

    Everyone likes to trash the GPL.

    But the GPL keeps this type of thing from happening. Linux and it's accoutrement wouldn't be what it is except it's mostly GPL.

  8. CatWithChainsaw

    "Not What I Had In Mind At All"

    And those were his last words before Skynet became sentient at 2:14am EDIT on August 29 1997.

    I wonder how those who are angry about code being used in this way feel about the art models. In at least LAION's case, the artwork was collected for research purposes, but then suddenly once there was money to be made, yoink.

  9. Nifty Silver badge

    Publish the source code and it'll take the bad actors a year to work up their own 'jailbroken' and custom versions. Using TikTok style catnip algorithms this AI could...

    Keep GPT's code proprietary and that process could take 18 months longer.

    1. Michael Wojcik Silver badge

      There are plenty of LLMs already available. Just take LLaMa; it's far more efficient anyway. If you think you really need orders of magnitude more parameters, scaling the model up is just a matter of finding the hardware resources. There are plenty of corpora available to feed your shoggoth.

      While I sympathize with the article's tone, the argument's over table scraps. Here's the thing. Building an LLM is easy. Training it is easy, if expensive in hardware costs. LLaMa has shown you can easily filch another model's "personality" by fine-tuning against it, so you can make your own model take on the surface appearance of GPT-4 if you really want to. That means there's no competitive advantage in the model itself, aside from the hardware costs of running the thing, and once someone's made a Really Really Big model you can in effect steal much of its "feel" using a model that's merely Really Big.

      So why are Alphabet and Microsoft and Meta1 fighting over this? Because the scarcity, and thus the value, is in context. Alphabet owns a ton of user context, in search history and the web index itself Gmail and Google Docs. Microsoft owns a ton of user context in 365 and GitHub and Bing's web index. Meta owns a ton of user history in cat pics and racist rants, which isn't as rich with business data and the like but is good for manipulating people, so it's pretty valuable. That is what makes their control over LLMs valuable. And that's what other people don't have, and why the big LLM owners will crush the zillions of startups.

      LLMs are going to shift even more attention and power to the big players, because while the typical Reg commentator might hold his or her nose, most people will happily stroll into the information abattoir for the convenience. Even a large proportion of techies; we're already hearing all this noise about how LLMs make programmers so much more productive, and how it's the future of software development – as if writing new code weren't the single least important thing a good software developer does. (If you write enough code for an LLM to significantly improve your productivity, you're almost certainly Doing It Wrong.)

      1There are other players, but the Chinese ones are effectively in a different market, and I don't think any of the smaller ones have a chance.

  10. JDX Gold badge

    "There you have it. Another depressing open source tale."

    No, not really. I think you'd struggle to find any proprietary application that doesn't use some open-source. And that's perfectly fine and normal. I am very glad the days are largely gone where people were clamouring that anyone using any OS should have to OS their entire code-base. That's simply not the point of FOSS.

    It is disingenuous to the original project ethos and they should probably just say that and rebrand, but there's absolutely nothing depressing about building a profitable product on the shoulders of FOSS. Quite the opposite in fact, it is enormously empowering.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like