AI Carpentry? Helping learners make better choices with genAI.
In two recent community discussion sessions, we explored what mental model of machine learning/deep learning we could teach to learners already familiar with the basics of programming, to help them safely and sensibly implement these methods in their software and data analysis. Some great lessons on these topics already exist in The Carpentries Incubator and Lab, and I am excited about the prospect of adapting them into new official Carpentries workshops, either within an existing lesson program or under a completely new banner (e.g., “AI Carpentry.”)
More sessions are planned this month to explore what might be taught in a workshop about the use of generative AI (genAI) tools such as large language models for coding. Details of those sessions can be found at the end of this post.
What does The Carpentries have to say about genAI for coding?
GenAI tools are increasingly widespread, marketed as assistants to or replacements for humans in almost every cognitive task. This has prompted a great deal of discussion, including within The Carpentries, where we hosted a series of community discussions summarised in a series of blog posts. (Find all blog posts with the “Artificial Intelligence” tag.)
One conclusion drawn from those initial discussions, which remains true, is that workshops teaching the basics of coding and data analysis to novice learners are still needed. The effective use of genAI to produce code and – even more importantly – the ability to critically evaluate what is produced, requires that the user first has a grasp of the fundamentals of coding. Consequently, changes to the existing curriculum for most Carpentries workshops have been limited to guidance on whether and how to use genAI while learning the basics of the lesson topic.
Evolving context
However, in recent months I have observed increasing discussion of “vibe coding” (where a human relies on genAI to produce programs with little to no time spent inspecting the source code themselves), and of the use of “agents” for coding (where genAI is implemented in a loop, receiving feedback such as error messages and making changes to reach a desired result without human involvement).
The target audience of our workshops is early career researchers and other professionals who need to use computational methods to work reproducibly and robustly with large volumes of data. Amidst the promotion of the aforementioned genAI approaches in particular (and the pervasive attention given to genAI more generally), I worry that people who are typical members of that audience may no longer see the relevance of what we teach or feel sufficiently motivated to join a workshop. I fear a growing misconception that good practices for reproducibility and robustness of analysis are no longer necessary or important. Regardless of whether we are right that these skills are still needed, Instructors cannot make an impact on people who are not in the room when they teach.
Meanwhile, those same early-career researchers and data scientists are under just as much pressure as they always have been to get results and publish them. The temptation to take shortcuts when analysing data, accessing information, and developing software is real! All of this likely adds up to increasing adoption of genAI to produce code and perform analyses that the user is not capable of evaluating for correctness, or maintaining in the medium to long term. How long until we start to see research papers and PhD theses retracted due to errors in analysis introduced by genAI? How would researchers feel about admitting to relying on genAI to craft or understand results within a paper, a conference talk, or a poster, or hearing about others who have done the same?
How can The Carpentries help?
I have grown increasingly (and reluctantly) convinced that The Carpentries, with an established, global network of computer-literate educators and a strong commitment to promote efficient, open, and reproducible research practices, is well placed to offer guidance and encourage better decision-making on this topic.
Many ethical concerns about the use of genAI remain, as outlined in a previous blog post. My reluctance is based on those, as well as the perception that we remain in a hype bubble around genAI. But the inescapable reality is that many people are using genAI for coding and, especially for simple tasks where high-quality resources have been used to train models, these tools can produce scripts that work – often on the first try. Regardless of whether the bubble bursts soon or not, the technology itself is here to stay.
How can The Carpentries best reach people at risk of making mistakes with these tools: integrating incorrect code or fabricated results into their analyses, de-skilling themselves, and/or producing software that cannot be maintained or sustained in the long term? How might we help these learners to make informed decisions about whether and when to use genAI? What part can The Carpentries play in advancing progress towards what UCL Professors Alison Littlejohn and James Hetherington describe as a data empowered society?
Be part of the conversation
Many questions remain. What and who is it most essential to teach? Where and how should the boundaries be drawn that distinguish between what is pragmatic to teach and what contradicts our core values? How might we ensure that workshops on this topic are globally accessible when resource divisions are widening? How can we develop and maintain lessons on a topic that is changing so rapidly?
In the hope of finding some answers, the Curriculum Team will host two more community discussions in August. The community has a lot of expertise and a wide range of perspectives on this topic, and I am grateful for the honesty and openness that they have shown towards this complicated topic already. In the next sessions I hope we can continue those discussions and kick off a collaborative effort to develop curriculum and pilot new workshops.
If you have experience, expertise, and perspectives to share in a nuanced discussion on this topic, please join the sessions.
Session 1
- Tuesday, 12 August 2025
- 11:00 UTC
Session 2
- Tuesday, 12 August 2025
- 19:00 UTC
Thanks for reading to the end! If you have any questions or concerns about the issues discussed in this post, please get in touch with the Curriculum Team by email or feel free to reach out to me directly on Slack. Similarly, if you could not join the previous or upcoming sessions but want to get involved with the curriculum development effort, please get in touch!