Presumably the source code it generates will be providing the full attribution necessary
for the use of the copyrighted source code it has been trained with, and thus generated from?
Meta has released yet another sort-of-open machine learning model, this time tuned for generating software source code. Code Llama is a family of large language models – hence the occasional capitalization "LLaMA" – based on the Llama 2 model released in July. It has been fine tuned and trained to dispense and discuss source …
A quote from "Why Developers Are Flocking to LLaMA, Meta’s Open Source LLM", in "The New Stack":
LLaMA is much more adaptable for developers. This is potentially very disruptive to the current leaders in LLM, such as OpenAI and Google. Indeed, as revealed by a leaked internal Google memo this week, the big players are already concerned:
“Being able to personalize a language model in a few hours on consumer hardware is a big deal, particularly for aspirations that involve incorporating new and diverse knowledge in near real-time.”
... there are various techniques for fine-tuning LLMs, such as hard tuning, soft tuning, prefix tuning, and adapter methods. He explained that the adapter method is attractive because it allows training of the whole LLM, while keeping the rest of the transformer frozen — which results in smaller parameters and faster training time.
"... the smaller method, where I [Raschka@Lightning AI] only have these intermediate layers like LoRA, it takes only one to three hours instead of 18 hours on the same data set, basically. So it’s an advantage because you have smaller parameters."
This is a significant development and contribution, despite not being completely open.
However, it should be noted that the limitation of not being able to use Llama output as part of training is a significant obstacle to developing new learning techniques. Specifically, it disallows learning by "trial and error", which is of course how humans learn - by trying, feeling either the pain of error and/or the joy of success. We know this is vital, and we can see how stupid current LLMs are when they blithely make really dumb mistakes without "feeling" any "shame". To be clear - using Llama output is a prerequisite for any kind of "trial" activity.
That would even including prohibit teaching Llama by the "Socratic Method" - something anyone who took Philosophy 101 could understand the folly of.
Yes, by all means, let's employ more and more tools which spoon-feed developers answers (correct or not) rather than forcing them to consult authoritative sources and perhaps learn something.
I have a CS degree (plus some graduate study in the field, as part of my other degrees), and more than three years in the industry; but a great deal of what I know arrived serendipitously. Some of that was from obvious sources like reading CS journals and papers, but quite a lot of it was stumbling across APIs and other technical information while browsing documentation, or looking at other people's code, or reading newsgroups like comp.lang.c and sci.crypt, or trial and error.
LLMs are making us – well, many of us – stupid. Easier programming is often not better for software development.
Tools such as compilers mostly improve software quality because they abstract away details that are usually rote and irrelevant to quality. That's not broadly true of things like algorithm and API choices, where understanding the available alternatives can make a big difference.