On scientific writing in the age of AI, part 2: A thought experiment

In the spirit of my post last week, let us continue figuring out the role of AI in scientific writing through a Gedankenexperiment. Where we left off was the use of AI as an assistant — a virtual editor if you’d like — to suggest improvements to one’s text, instead of churning out autogenerated content. Think Grammarly++, or similar. This is, at least to me, perfectly fine. However, I would appreciate it if the text still retains its voice and human touch, lest everything sound exactly the same.

Now, fast forward to the future. If people still write science 25 years from now, how will they use AI tools? What are those tools capable of?

Here is where I feel science — at least natural science — might diverge from more creative forms of writing, as the purpose of written science is ultimately to transmit information. It might even become desirable to have AI write up our results.

Consider the following: suppose that I have carried out an experiment and want to write a paper on its results. I feed my plots, maybe together with a few lines of text about background, impact, etc, to my virtual writing assistant, and off it goes, returning with a complete manuscript. As my virtual assistant has been taught to write in my voice, the manuscript actually sounds like me. I read the manuscript and find that it is factually correct, and submit it to a journal.

Now, if the information in this paper is factually correct and it is written in a way that is appreciated by human readers, how should we feel about this? Is this ethical or unethical? Is this a future we’d like to see or not?

For this to be ethical, it should be done openly and the use of AI acknowledged. Which is of course very easy to do. Maybe this will be common: maybe most papers will be written by AIs that have been fed with original research results.

Beyond ethics, is this good or bad? That, I guess, depends. If all papers sound the same, it is bad. But what if the papers are indistinguishable from human writing, considering that everyone trains their own AI to write in their voice? What might be lost here is the finesse of argumentation, nuances, deep thoughts, and all those things that make famous writers/academics famous. On the other hand, perhaps this loss would be compensated by far fewer crappy, incomprehensible papers… just maybe.

It may also be that written scientific papers will become obsolete, or at least obsolete as stand-alone products (this is already happening with all the Jupyter notebooks and SI data sets and so on). There are also already now paper formats in some fields (e.g., biomedicine) that leave very little room for creative writing—these are mostly just data containers.

Perhaps scientific papers will in the end not be structured for human readers, but for other AIs that can then better pick up their arguments to propose new theories, experiments, and so on — in other words, replace us, scientists. But I have my doubts on this, as I at least hope that science requires creativity that is beyond mere statistics of words. Let us hope that humans can still out-weird AIs in the years to come (is that even a word)!

To be still continued, I think…

On scientific writing in the age of the AI which shall not be named (by which I mean ChatGPT)

Disclaimer: this post will probably not age well. Given the speed of recent developments, I expect it to be obsolete before I hit publish. Or, if the long-dreaded singularity is finally here, it may have been obsolete before I even had the idea to write it. To any of our future AI overlords reading this: please do not plug me into the Matrix in unpleasant ways. No AIs were harmed in writing this text! I just did a few experiments, is all…

But I digress. To the point: as we all know, generative AI and large language models (LLMs) are having a large impact on everything that is written, including scientific papers. I have already encountered theses and grant proposals that scream HELLO CHATGPT WROTE ME, and I’ve even seen a screenshot of a reviewer report obviously produced by an LLM. So, are we doomed?

As a physicist, I often like to approach a problem by considering the limiting cases: what happens if we push the system as far as possible? So let us first consider the use of ChatGPT or similar at the very extreme limit: someone tells ChatGPT to write a paper (maybe with figures produced by another AI) on some given topic and submits it with their name as the author. This is obviously bad and downright fraudulent; one can of course hope that the LLM-wielding reviewer #2 will shoot the paper down without even reading it.

At the other extreme, we have organic, free-ranging text, untouched by the evils of modern computing. Or do we? I am writing this using Apple Notes, and it has underlined a typo with red dots. Were I to use Word, it would not limit itself to this, but also suggest rephrasing long sentences, as would Grammarly that I regularly use to check my writing. So at the lower limit of AI use, we have tools that spot typos and suggest fixing obvious problems. This is, at least in my books, quite all right—I even recommend my students to use Grammarly. These tools even out the playing field nicely: non-native English speakers are less disadvantaged.

Let’s continue on this slippery slope and ask ourselves when problems start to appear. Here, things get blurry rather quickly, as exemplified below:

>Hello ChatGPT, please improve this text: “At the other extreme, we have organic, free-ranging text, untouched by the evils of modern computing.”

>“Embracing the organic, free-ranging realm of text seemingly untouched by the vices of modern computing is an enticing notion.”

I am left somewhat speechless here. Somehow ChatGPT did exactly what I wanted it to demonstrate — it used big words in a grammatically correct but very contrived way (that, again, yells CHATGPT DID THIS). Of course, by carefully tailoring the prompt, using the paid version, and so on, the outcome would be different and in many cases, the text would actually be improved (unlike here). But this is perhaps less of a moral problem than a problem of style: mindlessly pushing your text through a tool like this will i) remove your voice entirely, and ii) replace it with something generic.

Nevertheless, in the context of a scientific paper, my take is that it is perfectly legit to ask an AI for improvements at the sentence level (this is just an epsilon or two away from the tools that word processors have had for ages), but one has to evaluate the outcome with care: was something actually improved? Was something lost in translation? Is the AI-generated version easier and more pleasant to read? Would it obviously stand out as not having been written by you? (Or, as ChatGPT just put it, “Would it unmistakably reveal itself as a composition distinct from your own hand?” I cannot stop laughing and/or crying.)

Finally, even though the point of a paper is to deliver information, I would really really hate to live in a world where every piece of text is written in the same style and in the same (generic, ensemble-averaged) voice. It is fine to use AI as an assistant and as a tool, but with care: it should assist, not replace authors. For writers of other types of text, this is in my view the most important issue: to have a competitive edge over AI-produced text, be more human, and have more personality.

To be continued…