Code reviews in academia

1. Research software isn't reviewed

Code reviewing is a standard practice among professional software developers. Generally, this means code is subject to human feedback, complementary to machine feedback such as output from compilers, automatic linters and test runners. Code reviews are most useful to look at aspects of software quality for which feedback cannot be automated, such as software design or code readability. In these cases, human feedback is crucial to ensure that the code is written in a way that facilitates its study and reuse by other developers. In a commercial context, this could mean ensuring that the software remains easy to maintain, update and extend. Therefore, code reviewing is ultimately about software sustainability.

Code reviews also are a key component of open source software development. Organised around platforms such as GitHub or GitLab, contributors to open source projects propose and discuss their contributions publicly. At the core of this development model is the simple idea that contributed code isn't set in stone, and that several rounds of feedback from other project contributors will be necessary for it the be ready to be integrated into the project's codebase.

Meanwhile, in academia, research code is almost never reviewed. Despite the ever-increasing reliance of modern research on software, software quality remains a widely overlooked topic, if not completely ignored. This is because research software is rarely the end product, as opposed to the scientific results (e.g figures and tables) that it generates. On top of that, the highly competitive, publication rate-based funding model is not particularly rewarding to researchers ready to dedicate time to writing good software. In fact, spending too much effort into software development is often considered harmful to a researcher's career development, as it is detrimental to their publication rate.

2. Researchers are developers

In 2021, the situation is slowly evolving, and has been so over the past few years. The "reproducibility crisis" opened many people's eyes (and continues to do so) to the fact that a significant number of research findings, in a wide range of disciplines, cannot be reproduced by an independent team of researchers. A common cause of non-reproducibility is software issues, for instance the code being unavailable, or left in an unusable state after the main paper was published. Although some journals and funding bodies nowadays require associated code(s) to be made available at the time of publication, the large majority of research software is still written by temporary staff (graduate students or post-doctoral researchers), without formal training in software engineering, often operating under high pressure. As a result, an increasing number of researchers experience the deep frustration of having to build upon poorly written, undocumented, and untested software.

One upshot is that this results to an increasing awareness that cutting corners regarding software quality is counter productive, as this actually slows down research in the long run… as well as in the short run. In most cases, unfortunately, the gap in software engineering knowledge is simply too big to fill for researchers, especially for those isolated and/or working in research groups where a suitable software engineering culture isn't established, and where mentors cannot be found and where feedback isn't available.

As survey conducted at the University of Oxford in 2018 shows that, among 375 scientists, 75% declared they could do with expert software engineering help. The first training course organised by the Oxford Research Software Engineering group for researchers at Oxford, focusing on version control, was fully booked in about 20 minutes (~ 30 seats), despite very limited advertising. This illustrates that a large majority of researchers are not happy with the way research software is developed, and there is often a consensus on the fact that "there must be a better way of doing this". Training courses are part of the solution, but cannot be the whole of it. The research community must seize software development as part and parcel of the research effort, as opposed to view it as a foreign activity that researchers can only be amateurs in. Researchers are software developers.

3. Code reviewing is not bug-hunting

Consequently, it should be common practice for researchers to "talk code". Currently, the opposite is true, as most research software is written in isolation, almost hidden from others' gaze, in fear that the author's (self-assessed) low programming skills will be exposed. On the one hand, and to put it mildly, it is true that a large portion of research software could be written in a better way. On the other hand, researchers tend to be too harsh on themselves, by forgetting (or not being aware of the fact that) writing quality code is an iterative process that greatly benefits from other people's input – especially when the original author has limited software engineering experience themselves.

This is a call for code reviewing in academia. Instead of hiding analysis scripts, hoping nobody will notice they are unreadable for anyone outside the project, these should be reviewed by other researchers on a regular basis all along their development. Done regularly, code reviewing doesn't come with a large overhead: an hour every week, for instance spent sitting with a colleague to review the latest modifications. In this way, participating in code reviews should join the list of the many activities that make up the day of academics. There is a lot to be gained in reviewing a colleague' code: new ways of writing software, programming tricks or elegant designs, but also a refresher on things to avoid, such as bad naming or monster functions.

It is worth stressing that beginner programmers can make very good reviewers, and therefore reviewing isn't limited to researchers with substantial programming experience. Generally, the purpose of code reviewing – at least in academia – is not to catch bugs or programming mistakes. To some extent, machines can already help with this. Rather, the purpose of a code review is to flag up those things that make the code difficult to read, debug, reuse and extend. From a software sustainability and research reproducibility point of view, it is crucial that even researchers with limited experience in software, or simply limited experience in that specific programming language, are able to understand the program. Beginner programmers are therefore well placed to review research software and should feel encouraged to do so. And of course, reviewing "real-life" research code makes for a fantastic learning opportunity for them.

4. Distributing knowledge and experience

In academia, programming and software engineering skills are both scarce and unevenly distributed. This is true at the level of both individuals and academic disciplines. In this context, another purpose of code reviewing is knowledge transfer. At the scale of a research group, regular code reviews between all members will ensure that members with the least software experience benefit from the knowledge of more seasoned members of the group, learn from them, and in turn become mentors for others. If several members are involved in a common project, regular code reviews spread knowledge of the codebase across the team. This means more independent teammates, as well as more efficient and fruitful teamwork. But above all, it prevents the codebase from falling apart as some members of the team move on to work on other projects. With regular code reviews, there will always be someone in the group who is able to mentor new recruits.

Going beyond research groups, code reviewing has the potential to spread software engineering practices across disciplines. As many computational approaches are common to several fields, a biologist may be well suited to provide insightful feedback on the readability and modularity of a code written by a physicist, and vice versa. Interdisciplinary code reviews enable knowledge transfer at a larger scale, enriching different research communities from one another's experience and practices. This is one of the core motivations behind the Oxford Code Review Network, an initiative at the University of Oxford that promotes and facilitates code reviews between researchers across the university. Code reviews involving researchers from different fields are particularly important since few communities boast the greatest concentration of programming and software engineering skills (for instance particle physics), and democratising interdisciplinary code reviews would make the rest of the research community benefit from that expertise.

In a world where research software is recognised as a key component of research, code reviews would certainly be a widespread practice among academics, just like it is for commercial software developers. At the same time, democratising code reviewing in academia would help driving the culture change required for software development to be considered a key part of a researcher's activities, as well as making software sustainability a core concern of quality research. Short, regular code reviews provide an opportunity for researchers to "talk code", both teaching or learning from colleagues. Although these meetings can go relatively unnoticed in researcher's daily schedule, they greatly impact the readability, debugability and reusability of research software.

Why is that important? Because poor software quality cripples research, in both the short and long term. By contrast, quality software accelerates research: it makes it more efficient, more open, more collaborative and more reproducible. Code reviews also spread programming and software engineering experience, contributing to the education of future generations of researchers and making research teams more resilient to change. And with an increasing number of research fields heavily relying on software development, code reviews enable knowledge transfer across disciplines. This can lead to the adoption of new practices inspired from other fields of research, but also to transdisciplinary research collaborations… what's not to like about this?