I’ve been asked, in various roles1, to give my opinion on the challenges posed by Large Language Models (LLMs)2, also known as “stochastic parrots” (Bender, Gebru, McMillan-Major, & Shmitchell, 2021), for assessing academic writing assignments. A concern seems to be that students legitimately can use these systems and that then we would be unable to assess their ability to write essays.
My opinion, in brief, is that LLMs cannot legitimately be used to write academic essays.
Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.Bender et al (2021, p. 617)
Good academic writing education minimally teaches students not to plagiarise and to use good and unbiased literature review and citational practices. These criteria are part of the scientific integrity of scientific writing.
LLMs, by design3, produce texts based on ideas generated by others without the user knowing what the exact sources were. Moreover, automatically generated texts reproduce biases that exist in the texts used to train the LLMs (incl. citational exclusions and inequities, and hegemonic views) (Bender et al., 2021; Birhane & Guest, 2021; Hofstra et al. 2020; Teich et al., 2022). Hence, the use of LLMs for writing academic texts would further promote those biases, rather than challenging them as good scholars would want to do (Birhane & Guest, 2021).
(…) Black women’s writings are systemically omitted from syllabi (…) Both soft and hard power within academia is afforded disproportionately to white people, especially men, and to those who are aligned with the current hegemony.Birhane & Guest (2021, p. 62)
No serious scholar or scientist in their right mind would want LLMs to produce their texts; and hence, also no student pursuing an academic education would want to do so.
It has become mainstream in AI to distract from real issues by concern trolling about fictional scenarios (e.g. Birhane & van Dijk, 2020). A concern about how to assess academic texts “written” using LLMs sets an analogous trap. My advice to teachers and examiners is not to fall in it. Automated plagiarism causes harm when used; to oneself, to others, and to the entire scientific enterprise. That is the real issue.
It is our responsibility to teach students the real issue.
-Iris van Rooij
Edit: After completing this blogpost I saw Emily Bender said it more succinctly in this video: “I don’t see the point of writing essays as producing the form of an essay. If students are turning to some fallback (…) GPT or another human, the problem was upstream.” — Watch the full interview below.
- Bender, E.M, Gebru, T. McMillan-Major, A. & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 610–623.
- Birhane, A., & Guest, O. (2021). Towards decolonising computational sciences. Kvinder, Køn & Forskning, 29(2), 60-73.
- Birhane, A., & van Dijk, J. (2020). Robot rights? Let’s talk about human welfare instead. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 207-213).
- Hofstra, B., Kulkarni, V. V., Galvez, S. M.-N., He, B., Jurafsky, D., & McFarland, D. A. (2020). The diversity-innovation paradox in science. Proceedings of the National Academy of Sciences, 117(17), 9284–9291.
- Teich, E.G., Kim, J.Z., Lynn, C.W. et al. (2022). Citation inequity and gendered citation practices in contemporary physics. Nat. Phys. 18, 1161–1170.
1 One such role is as chair of an Examination Board of a Bachelor and Master Artificial Intelligence programme. I have also been asked to speak on the topic for a general audience (which so far I have declined).
2 ChatGPT is the latest hype, but there are other such systems and surely more variations will be developed in the future.
3 LLMs are trained on texts generated by other humans/authors (e.g., by scraping content from the internet). Hence, the generated texts will inevitably plagiarise without the user even having a way of tracking how. Also, given how they are constructed, these systems will reproduce (citational) biases and inequities that exist in the literature.