How to respond to the affordances and challenges of generative AI is a pressing issue that many learning technologists and open education practitioners are grappling with right now and I’ve been wanting to write a blog post about the interface between AI, large language models and the Commons for some time. This isn’t that post. I’ve been so caught up with other work that I’ve barely scratched the surface of the articles on my rapidly expanding reading list. Instead, these are some short, sketchy notes about the different ethical layers that we need to consider when engaging with AI. This post is partly inspired by technology ethics educator Casey Fiesler, who has warned education institutions of the risk of what she refers to as ethical debt.
“What’s accruing here is not just technical debt, but ethical debt. Just as technical debt can result from limited testing during the development process, ethical debt results from not considering possible negative consequences or societal harms. And with ethical debt in particular, the people who incur it are rarely the people who pay for it in the end.”
~ Casey Fiesler, The Conversation
Apologies for glossing over the complexity of these issues, I just wanted to get something down in writing while it’s fresh in my mind
Ethics of large language models and Common Crawl data sets
Most generative AI tools use data sets scraped from the web and made available for research and commercial development. Some of the organisations creating these data sets are non-profits, others are commercial companies, the relationship between the two is not always transparent. Most of these data sets scrape content directly from the web regardless of ownership, copyright, licensing and consent, which has led to legitimate concerns about all kinds of rights violations. While some companies claim to employ these data sets under the terms of fair use, questions have been raised about using such data for explicitly commercial purposes. Some open advocates have said that while they have no objection to these data sets being used for research purposes they are very concerned about commercial use. Content creators have also raised objections to their creative works being used to train commercial applications without their knowledge or consent. As a result, a number copyright violation lawsuits have been raised by artists, creators, cultural heritage organisations and copyright holders.
There are more specific issues relating to these data sets and Creative Commons licensed content. All CC licenses include an attribution clause, and in order to use a CC licensed work you must attribute the creator. LLMs and other large data sets are unable to fulfil this crucial attribution requirement so they ride roughshod over one of the foundational principles of Creative Commons.
LLMs and common crawl data sets are out there in the world now. The genie is very much out of the bottle and there’s not a great deal we can do to put it back, even if we wanted to. It’s also debatable what, if anything, content creators, organisations and archives can do to prevent their works being swept up by web scraping in the future.
Ethics of content moderation and data filtering
Because these data sets are scraped wholesale from the web, they inevitably include all kinds of offensive, degrading and discriminatory content. In order to ensure that this content does not influence the outputs of generative AI tools and damage their commercial potential, these data sets must be filtered and moderated. Because AI tools are not smart enough to filter out this content automatically, the majority of content moderation is done by humans, often from the global majority, working under exploitative and extractive conditions. In May, content moderators in Africa who provide services for Meta, Open AI and others voted to establish the first African Content Moderators Union, to challenge low pay and exploitative working conditions in the industry.
Most UK universities have a commitment to ending modern slavery and uphold the terms of the Modern Slavery Act. For example the University of Edinburgh’s Modern Slavery Statement says that it is “committed to protecting and respecting human rights and have a zero-tolerance approach to slavery and human trafficking in all its forms.” It is unclear how commitments such as these relate to content workers who often work under conditions that are exploitative and degrading at best, and a form of modern slavery at worst.
Ethics of anthropomorphising AI
The language used to describe generative AI tools often humanises and anthropomorphises them, either deliberately or subconsciously. They are ascribed human characteristics and abilities, such as intelligence and the ability to dream. One of the most striking examples is the use of hallucinating. When Chat GPT makes up non-existent references to back up erroneous “facts” this is often described as “hallucinating“. This propensity has led to confusion among some users when they have attempted to find these fictional references. Many commenters have pointed out that these tools are incapable of hallucinating, they’re just getting shit wrong, and that the use of such humanising language purposefully disguises and obfuscates the limitations of these systems.
“Hallucinate is the term that architects and boosters of generative AI have settled on to characterize responses served up by chatbots that are wholly manufactured, or flat-out wrong.”
~ Naomi Klein, The Guardian
Ethics of algorithmic bias
Algorithmic bias is a well known and well documented phenomenon (cf Safiya U. Noble‘s Algorithms of Oppression) and generative AI tools are far from immune to bias. Valid arguments have been made about the bias of the ‘intelligence” these tools claim to generate. Because the majority of AI applications are produced in the global north, they invariably replicate a particularly white, male, Western world view, with all the inherent biases that entails. Diverse they are not. Wayne Holmes has noted that AI ignores minority opinions and marginalised perspectives, perpetuating a Silicon Valley perspective and world outlook. Clearly there are considerable ethical issues about education institutions that have a mission to be diverse and inclusive using tools that engender harmful biases and replicate real world inequalities.
“I don’t want to say I’m sure. I’m sure it will lift up the standard of living for everybody, and, honestly, if the choice is lift up the standard of living for everybody but keep inequality, I would still take that.”
~ Sam Altman, OpenAI CEO.
Much has been written about the dangers of AI, often by the very individuals who are responsible for creating these tools. Some claim that generative AI will end education as we know it, while others prophesy that AI will end humanity altogether. There is no doubt that this catastrophising helps to feed the hype cycle and drive traffic to to these tools and applications, however Timnit Gebru and others have pointed out that by focusing attention on some nebulous future catastrophe, the founding fathers of AI are purposeful distracting us from current real world harms caused by the industry they have created, including reproducing systems of oppression, worker exploitation, and massive data theft.
“The harms from so-called AI are real and present and follow from the acts of people and corporations deploying automated systems. Regulatory efforts should focus on transparency, accountability and preventing exploitative labor practices.”
Nirit Weiss-Blatt’s (@DrTechlash) “Taxonomy of AI Panic Facilitators” A visualization of leading AI Doomers (X-risk open letters, media interviews & OpEds). Some AI experts enable them, while others oppose them. The gender dynamics are fucked up. It says a lot about the panic itself.
Not really a conclusion
Clearly there are many ethical issues that education institutions must take into consideration if they are to use generative AI tools in ways that are not harmful. However this doesn’t mean that there is no place for AI in education, far from it. Many AI tools are already being used in education, often with beneficial results, captioning systems are just one example that springs to mind. I also think that generative AI can potentially be used as an exemplar to teach complex and nuanced issues relating to the creation and consumption of information, knowledge equity, the nature of creativity, and post-humanism. Whether this potential outweighs the ethical issues remains to be seen.
A few references
AI has social consequences, but who pays the price? Tech companies’ problem with ‘ethical debt’ ~ Casey Fiesler, The Conversation
Statement from the listed authors of Stochastic Parrots on the “AI pause” letter ~ Timnit Gebru (DAIR), Emily M. Bender (University of Washington), Angelina McMillan-Major (University of Washington), Margaret Mitchell (Hugging Face)
Open letter to News Media and Policy Makers re: Tech Experts from the Global Majority ~ @safiyanoble (Algorithms of Oppression), @timnitGebru (ex Ethical Artificial Intelligence Team), @dalitdiva, @nighatdad, @arzugeybulla, @Nanjala1, @joana_varon
AI machines aren’t ‘hallucinating’. But their makers are ~ Naomi Klein, The Guardian
Just Because ChatBots Can’t Think Doesn’t Mean They Can’t Lie ~ Maria Bustillos, The Nation
Artificial Intelligence and Open Education: A Critical Studies Approach ~ Dr Wayne Holmes, UCL
‘What should the limits be?’ The father of ChatGPT on whether AI will save humanity – or destroy it ~ Sam Altman interview, The Guardian