English as a disadvantage.
"... increase the competitiveness of students who are non-native English speakers"
From now on, claim you don't know English and let the robots do your work.
This tip brought to you by ChatGPT.
OpenAI's ChatGPT outperforms the average university student in computer science, according to researchers, but interestingly, not in math. The large language model's performance also beats the average student in other subjects, including political studies, engineering, and psychology, a paper published in Scientific Reports …
User
Can you make 147 using only 396, 200, 16 and 519?
ChatGPT
Sure, you can make 147 using the numbers 396, 200, 16, and 519 with the following mathematical expression:
(396 - 200) + (16 * 519) = 147
User
No, it doesn't
ChatGPT
I apologize for the mistake. You're right, the expression I provided doesn't result in 147. Here's the correct way to make 147 using the numbers 396, 200, 16, and 519:
(519 - 396) + 200 + 16 = 147
So it's been bugging me.
396 - 200 is 196
Square root of 196 is 14
Square root of 16 is 4
519 + 14 + 4 is 537
396 + 4 is 400
Square root of 400 is 20
200 / 20 is 10
537 - 400 is 137
137 + 10 is 147.
Or in one expression 147 = 519 + SQRT( 396 - 200) + SQRT(16) - (396 + SQRT(16)) + (200 / ( SQRT ( 396+SQRT(16) ) ) )
This can be simplified further to: 147 = 519 +SQRT(396 -200) - 396 + (200 / ( SQRT ( 396+SQRT(16) ) ) )
Which probably breaks the rules, as it uses square roots and the number 396 three times.
I have no doubt there is a more elegant solution, but that's the best I could come up with.
Whether the 'results' of ChatGPT versus human students says more about the subject than the marks? But I guess that maths is maybe still a bit too precise for AI (although arithmetic should not be a problem).
After all, ChatGPT has an extensive collection of books it has 'read' from which it can choose items for an answer, whereas students (and probably some professors) have not read nearly as many texts, or remembered them as accurately as a computer can. In maths you cannot just 'make stuff up' and hope that if it looks ok you will get marks. Well, OK, when I was marking undergraduate examples and tests, I think some of them did just that, but I didn't give them any marks for it.
Looking at it another way, it may be that assessment methods are deficient, in that they may not actually test students' understanding of the subject, only the ability to regurgitate from books.
IMO, the test would be whether the knowledge can be applied to a real problem solving exercise.
As a footnote, I was a bit surprised that ChatGPT didn't do so well in economics - a lot of that subject is bullshit, I would have thought an ideal playground for ChatGPT. (disclaimer: my first degree was economics)
That is certainly optimal, and it's common as courses get more difficult, but it is sometimes difficult to find a new problem that can be solved by students that haven't learned much yet and has never been considered by anyone on the internet. The basics of education will unavoidably be things which are known and discussed by lots of people, and students will need to get past those levels in order to get to the places where they can be set more complex problems that haven't already been solved.
It's also worth noting the economics (sorry, Rikki) of classroom instruction and evaluation. Low-level courses tend to have large class sizes, often in the hundreds of students for some disciplines. Evaluating that many students quickly becomes very expensive; it's not feasible unless you use methods that permit quick, preferably automated evaluation. By definition, such responses will have quite low information entropy, and so LLMs will tend to perform well on them.
Also, at least in US higher education, introductory courses are asked to serve multiple competing aims. They are supposed to introduce students to a subject area so they'll be prepared for more advanced study. They're generally asked to perform as general-education courses, i.e. to help students "catch up" to the base level expected of undergraduates. Administrators want them to serve as gatekeepers, discouraging students who think they might be interested in a discipline but turn out not to be from continuing. Students use them for easy (or easier) credit hours to fulfill requirements. Departments staff them with the cheapest faculty – fixed-term, contingent, and graduate-student workers – and run many sections, to make their budgets. Students take different approaches to the courses depending on which of these aims align with their own motives, and many are content to simply squeak through.
These programs don't remember what they "read". They store statistical information about the relationship of words in the totality of their inputs.
What would happen if you ask, "What is the text of Act I, Scene I of Shakespeare's Romeo and Juliet"?
Not curious enough to make my own account...
Except they aren't remembering the text, merely some information _about_ the text. That's worlds different. If you want to argue that they function as a lossy compression function, that's a useful argument, but it's the lossiness it's that belies the claim of "remember".
Due to the large size and dimensionality of the parameter space in contemporary LLMs, and the large context windows, it often is possible to elicit large passages of text verbatim from a source. It's not hard to find gradients in parameter space in these models to recover that sort of thing, particularly if you keep the temperature low enough to avoid introducing noise or annealing the output into some "better" (in objective-function terms) result.
I think it says about as much about the subject as the program, since what it says is whether the solutions to problems asked at that level are likely enough to have already existed on the internet. At low levels, the answer is yes for most subjects. A computer science question asked of relatively early students like "Write quicksort on arrays of doubles in C++" is a question where someone has written that code and published it on a page with most or all of those keywords on it, thus it's relatively easy for a program that has mashed up a lot of websites to find and reconstitute something close to it. A similarly difficult history essay is likely to be on a topic that has already been discussed by some people online, maybe even in these forums, and the bot can copy from that. A political course that focuses on modern events means that the bot can mash up some newspaper articles and make an essay that approximates an acceptable response. A mathematical problem, made from randomly chosen numbers, is probably not written on a website since the sites will have been written with instructions on solving such problems in general, not that one in particular. Hence, the bot can't solve it, since it hasn't been built to reason or to understand, just to repeat what the internet generally says in a format that looks like natural language.
That's been a major component (double negatives etc.) of the CISSP exam for years, and a pass still means bugger all in terms of actual capacity to deliver infosec.
If you want a real test of knowledge (i.e. the capacity to actually use the information gained to solve a real problem rather than just to regurgitate it) a properly informed party (one that can already use the information) has to get the candidate to explain why they provide their specific answer. The big problem with LLMs is that they're not and can not be informed as they have no concept of meaning or anything else. They're just meaning-blind generators of most probable token sequences based on their training set. Oh, how often must we repeat this until it sinks in?
Humans also only analyze correlations between tokens: you think your sensory neurons feed you direct access to unvarnished truth? LLMs are just one step further removed than humans are; neither of us experiences unmediated reality.
If you have any causal account of meaning arising from correlations in reality and sensory perception, that account would also justify meaning in LLMs.
No, we analyze correlations between a lot of things, not just tokens. That's why we are more capable than LMMs. We can make correlations between actions and consequences, intentions and results, processes and products. It can only make correlations between chunk of text and other chunk of text. We can also experiment on our own to gain new correlations, without having to have a new training run feed it to us from someone's exogenous information. Oh, and we have the ability to generalize which it mostly cannot; this is why a lot of children can solve any mathematical equation using operations they know, and any computer can do it extremely quickly, but GPT cannot solve a simple equation if it hasn't already memorized the answer.
we analyze correlations between a lot of things, not just tokens
Care to cite any evidence from, say, methodologically-sound neuroscience research, or even psychology, to support this claim, for some useful definition of "token"?
As I've noted before, I'm not particularly impressed by LLMs. I'm even less impressed by handwaving appeals to res cogitans to try to claim human cognition is magically different.
"I'm even less impressed by handwaving appeals to res cogitans to try to claim human cognition is magically different."
It doesn't impress me either, and I've argued against it on multiple occasions. I question whether you correctly parsed my comment, as I didn't say that or anything like it for this case. I don't even have to argue by neuroscience here. My argument is purely technical and concerns what we are capable of doing as independent organisms and what an LMM is capable of doing.
At a very basic level, we are capable of taking an action and seeing what happens. We can drop an item and notice that it falls, and from this arrive at a guess that every item will fall when dropped. A bit more experimentation and we can start to guess that items will fall faster if they're more compact and start to create ideas about air resistance. An LMM is not able to drop any items or watch what happens, so any information it has about what objects do has to be told to it. It is able to answer questions about kinematics only because someone has written them down and it is repeating what someone else said.
Similarly, with my discussion of whether LMMs reason (they don't). This does not mean that a program is incapable of reason. I think that, if we got a computer powerful enough and a program designed to make more general logical connections, it would be capable of making generalizations the way we do, and we definitely do as it's one of the core parts of education. LMMs, however, are not written to do that. Their only purpose is to write, and they have not been set up with logical constructs that are more advanced than writing likely passages of text. I am not saying that human brains are more powerful than computers in some way. I'm saying that human brains do things that this program was never intended to do, which is why our brains will be capable of things that this program never will. Give me a new type of program doing different things in a different way, and that could all change.
If you hook an LLM up to an action harness where a certain output results in an injection into the token stream of the result of that output, interpreted as an action, onto reality, as measured by a sensor, then the LLM is in fact capable of picking up quite involved correlations in that.
That's the whole point of the plugin architecture.
ChatGPT has neither "intelligence" nor "understanding", just the mass statistics of the global idiot network to support it's "answers." It has little to no hope of EVER being accurate about much of anything because it simply statistically summarizes the details that accuracy requires. The details get lost in the statistical averaging, summaries, and noise.
...and from what we keep hearing about "training data", there's little if any curating going on, it seems to be quantity over quality. Curating the training data, even to the extent of limiting the scraping to "reputable" sources, seems to be too much cost and effort, despite the claimed billions being spent on it.
.. the mass statistics of the global idiot network to support it's "answers."
One thing I've noticed from playing around with ChatPT is that its responses are reasonably correct grammatically, although the content is often dubious.
If its/it's or their/there are used correctly I am sadly now more suspicious that the answer is AI generated.
Until now a creation arguably purely human.
I was trying to think of a decent humanities question to give ChatGPT a real workout :)
Something like "Compare and contrast an aspect of Shakespeare's depiction of the relationships portrayed in both 'Taming of the Shrew" and in 'As You Like It' relating your answer where possible to recent performances."
I suspect you would likely get a great deal more twaddle from human students than from the AI although I do not think it would be an easy question to answer but probably an interesting one.
I was labouring under the mistaken impression that ChatGPT etc incorporated theorem provers and some sort of reasoning system but the poor showing in mathematics probably means they are only elaborate neural networks or perceptrons on steroids.
There is a general consensus between educators and students that the use of ChatGPT in school work should be acknowledged, and that it will increase the competitiveness of students who are non-native English speakers.
If one were to replace the use of ChatGPT with cheating one would have pretty much the same logic.
Is this statement intended to apply to such students studying in English speaking nations or an implicit claim of the global supremacy of written English in communicating science, technology etc? If the latter I suspect the French might beg to differ. ☺
Also curious to which activities this increased competitiveness is intended to apply. College admission, assessment (examinations etc), employment? Or indeed how any of this finally relates to capability and competence.
Certainly interesting times in which we live, unfortunately.