And this is why corporations should not control AI output.
They will only let you see what they want you to see.
Microsoft is updating the Bing AI chatbot service to prevent users generating fake film posters containing Disney's logo over fears of copyright infringement. People on social media began sharing their AI-generated creations of dogs as Disney characters mimicking its film posters online. The trend may have caught the …
So, tell me:
If you have to control exactly what data you can feed it, double-check its responses with an automated system, and put in safeguards so it doesn't wander off topic, hallucinate, or talk about inappropriate things...
At which point is all of this just fancy heuristics ("human-written rules") and nothing more? Same way that Tesla still has to put in a human-managed blocklist of locations of where the AI goes totally wrong and they just say "Look, override and just drive straight here" (which appears to be a danger in itself!).
And how did it recreate anything Disney unless it was trained on Disney materials? It's not like someone sat and described in intricate detail what Mickey Mouse looked like and it created its own interpretation (which would be valid clean-room reverse engineering for most things anyway). It knew. It had seen it. And that's copyright infringement unless the imagery was licenced to be part of their training database. So who authorised that and what else is in there?
(I mean, I know the answers to all the above, it's pretty much rhetorical).
"And that's copyright infringement unless the imagery was licenced to be part of their training database."
I believe this question is still bouncing around the courts and lobbyists, based on the process being one of pattern extraction; not to say it's definitely infringement-free, rather that it's not a question that's been answered definitively. There might be a secondary question of the source if Disney suspected the content were also pirated but it would be pretty hard to make this claim with movie posters, which are generally released for free distribution.
Lots of things are publicly available and publicly visible but you are not allowed to copy, bank notes are the most obvious example but artwork such as the Mona Lisa or some crappy film poster equally qualify.
In a lot of cases you can get away with "interpretations" and "derivative" works as long as you don't take the piss and try to pass off your work as original.
Sorry for being unclear, the whole publicly released part was WRT potential allegations of piracy, as an additional but distinct claim of the type of infringement occurring.
I believe the core question is whether a digital record (image, video, audio, text) that is legally obtained for the purposes of viewing but enjoying general copyright protections can also be used to generate information in a database through pattern extraction (simplified but, I believe, sufficient to generally describe 'AI' image, text and sound generators) without the permission or compensation of the original copyright holder.
There's a lot of opinions on the matter but, as far as I'm aware, no test cases that pertain exclusively to this core question have reached a conclusion and no specific laws have been passed. Sadly, given the amount of money at stake on both sides, I can't foresee a likely outcome that primarily benefits the average consumer or small time content creator. Both those in the business of screwing over consumers in the name of copyright and those in the business of hoovering up content for resale have managed to exploit the technical ignorance of politicians time and time again.
I don't discount the possibility of a sharp legal mind crafting a reasonable law but I would suggest that the majority of confidently given, simple rationales for any position are based upon not understanding the question and/or not caring about the full consequences.
Copyright law is long-established and very clear:
If you don't have a licence to use an image, then you don't have a right to use an image.
Same for books, audio, movies, software, paintings, etc.
The only exceptions are for fair use (hint: feeding the world into an AI isn't fair use), review, satire, etc.
It's very, very clear what's going to happen, because there is no law to the contrary - if you didn't have the explicit up-front authorised rights from the creators (or their licensees) to suck that content into an AI model, then you weren't legally allowed to do so.
I had a hunt round and I would say this article gives a reasonably good surface level introduction to the concept, with some legal precedents. It also explains precisely why the law is anything but 'established':
https://www.photo-mark.com/notes/how-did-transformative-use-become-fair/
Hopefully, this is a good starting point on your journey to better understand this area of the law.
It's a hardware/software combo delivering to the public the AI tech that they want and need the most.
It's a keyboard with an extra key - The F-Off key - that turns all the AI off.
If anyone out there has $200m to invest, I'm open to offers, bribes and honey traps.
Ah, but how will you determine that an AI is speaking to you? It's all very well and good now, when AI things tend to be labelled as such, but what about when it's not completely obvious?
Will your software use an AI to determine the AIness of what you're talking to?
All attempts to "guardrail" (or whatever is the term in vogue) AI are doomed to failure if it *is* AI.
No ifs, buts or maybes.
And if you can't work out why then (a) stop calling yourself an AI guru and (b) give me a ring and a wedge of cash and I will explain why.
But is it wrong? Given publically available data to it the answer to 'an average woman of Afghanistan' is basically what it produced. It is going to be a merging of the images out there and 90% of them are going to be the iconic image so it is no surprise that the result is very similar to the original image but impressively facing towards the 'camera' which puts this in a different league to a simple copy. What about the other lesser influences to this image? Where do they stand? When it comes to copyright this is a minefield.
Yeah, but that's poor "AI", misunderstanding the context of "average". The request, to a human, would be an "average Afghan girl", not an "average of all the afghan girl images you've ever seen". Based on the first contextualisation, the face might bear some similarities, but more likely look like a very different person. In the second case, the image will trend towards the most commonly seen photo if, as you say, 90% of the images found to base the new image on are that same image.
What that, and the whole Disney thing demonstrate to me is that those AI companies claiming the training data is not used in the final model are lying. They say the AI is trained on the technique and skills or whatever and the actual data is discarded. A human, in most cases, doesn't have a perfect memory and so will mostly end up creating a "fair use" image based on various memories, combinations and imagination, while a computer can't help but have a perfect memory and no imagination but is being trained to not use its perfect memory with varying degrees of failure.
No AI in existence is able to infer or reason. Pretending that it's just a particularly poor example of AI is misunderstanding how these things work.
This thing just averages stuff out, based on superstition ("last time I did this random thing, I was rewarded, so this random thing must be 'better'"). I would say something about pigeons pecking targets or football fans wearing their lucky socks, if you get those analogies.
The result from these things is the original training data, statistically modified based on those superstitions (which are largely random).
There's no way that thing generated that image statistically without that original image being in the training set, and hitting a lot of statistics for keywords like Afghan and girl together.
My years 10s have just handed in assignments on famous rockets.
They are unusually literate - spider senses are tingling.
Feed a para to ChatGPT and I can find out where it has been copied from. The source website has white text on a black background ... amazing coincidence, so does the work the student submitted.
I have also had a "Sure I can help you with that" cut and paste into the assignment.
This is a genuine question and not intended to be snark.
Based on the premise that you are a professional teacher and given the genie with generative LLM's is out of the bottle and we can’t just stuff it back in, are there any coherent plans from professional teachers to deal with this issue long term?
I mean clearly from your comment these submissions where fairly obvious, but eventually these models are going to become more complex and branched across various platforms including offline and portable models, so I doubt cheating in this manner be so obvious in the future or easily checked.
Moving to skills testing instead of knowledge testing may be a component of this transition, but is this practical for all subjects? are there subjects where rote learning is the only practical method? can the teaching profession make the shift from the firmly entrenched rote learning models to a new model that makes it worthless for students to use LLM's to do their work?
I am genuinely interested in this because this could be one of the biggest shakeup's in how kids are being taught in school in many years, government interference notwithstanding.
I think the solution is for the teacher to give an assignment and then go through each submission with each student, asking a few questions here and there.
While it won't prove that the assignment wasn't written by an LLM, it will prove that the student has knowledge on the topic.
So therefore, the assignment has achieved its purpose.
If you want to follow the advice of a relative of mine who is a multi-PhD in education and specialised in the career of teaching for 40+ years, then the answer is:
Don't bother with homework.
In-class, how are they getting GPT answers? They're not. Devices are not allowed.
At home, they are getting those answers elsewhere because THEY HATE HOMEWORK. Everyone does. It is also of limited value educationally precisely because everyone hates it, it's completely unguided and unassisted, it's often nothing more than "more of the same" of what you were just doing in lesson, the environment is unsuitable, the atmosphere is generally not one focused for learning, the marking sucks up far too much time and achieves little, and the progress in the subject because of homework is zero to minimal.
Even the difference between independent (private) and state schools. State schools give you homework on what you just did. Independent schools often call it prep work. It's what you'll be doing NEXT LESSON and is there to get you up-to-speed so you come into lesson with questions and knowing things without wasting the valuable teaching time.
Even that, however, is of limited utility and in almost any study once you eliminate other factors in the average schoolchild's life, "homework" or "prep work" makes almost no difference to educational outcomes.
How about this: If you want to know if a child knows something, ask them independently at a point where they can't run off to a computer to look it up.
As time goes by, using homework to judge status, progress or ability of a child is getting less and less useful, and things like GPT models are a MASSIVE wake-up call that you can't - and many argue never have been able to - judge those kinds of unguided assessments and usefully use their results in your assessments whatsoever.
So long as you don't let them answer the questions by GPT in-lesson or in-exam, school education doesn't really need to change at all. All you need to do is stop with the pointless unguided homework that has no real evidence to justify it already.
The same guy taught a year in a private school recently. He was threatened with sacking because they accused him of cheating. Because EVERY ONE of his students got an A. Every. Single. One. He was utterly offended, grabbed the nearest teacher (who hated him), had them get a higher-level exam paper from the stores, asked their entire class to resit the exam with only the other teachers as invigilators, and had no part in the resit exam choice, question, invigilation, sitting or marking. They all passed again. All with A's again. Without the guy anywhere near them whatsoever.
When they tried to apologise, he resigned in disgust anyway at the accusation, he just wanted to prove his innocence first.
When someone like that says that homework is a crock, and has in fact filed dissertations on the subject, it's about time educators started to listen.
P.S. the same guy home-schooled my daughter in ALL SUBJECTS throughout COVID, and she now attends school in a foreign country / language. Last week her teacher called her out for not understanding the language, and demanded to see all her notes because she assumed she must be cheating off her peers. She handed in more notes than anyone else, passed all tests, and replied to her teacher in fluent Spanish when confronted. She is so far ahead of her peers it's not even comparable.
Many, many years ago, when I was still at school, extra homework was frequently used as a punishment.
That pretty much tells you what the real purpose of homework is.
None of this AI cheating is really new. Homework has always been pretty much useless as a teaching aid.
There has always been ways of copying others work, and only the most blatant copying was ever caught.
I've even seen 'helpfulc parents effectively doing their kids homework for them in order to boost their grades, if it forms part of any assessment.
The incident has revitalized arguments that AI text-to-image models violate copyright since they were not only trained on protected IP or trademarks, but faithfully generate the content too
Not just that, Microsoft's response almost feels like an admission of guilt.
I'd love to know what concessions Disney managed to choke out of MS. Because I suspect they're rather bigger than MS would be comfortable admitting in public.