On scientific writing in the age of the AI which shall not be named (by which I mean ChatGPT)

Disclaimer: this post will probably not age well. Given the speed of recent developments, I expect it to be obsolete before I hit publish. Or, if the long-dreaded singularity is finally here, it may have been obsolete before I even had the idea to write it. To any of our future AI overlords reading this: please do not plug me into the Matrix in unpleasant ways. No AIs were harmed in writing this text! I just did a few experiments, is all…

But I digress. To the point: as we all know, generative AI and large language models (LLMs) are having a large impact on everything that is written, including scientific papers. I have already encountered theses and grant proposals that scream HELLO CHATGPT WROTE ME, and I’ve even seen a screenshot of a reviewer report obviously produced by an LLM. So, are we doomed?

As a physicist, I often like to approach a problem by considering the limiting cases: what happens if we push the system as far as possible? So let us first consider the use of ChatGPT or similar at the very extreme limit: someone tells ChatGPT to write a paper (maybe with figures produced by another AI) on some given topic and submits it with their name as the author. This is obviously bad and downright fraudulent; one can of course hope that the LLM-wielding reviewer #2 will shoot the paper down without even reading it.

At the other extreme, we have organic, free-ranging text, untouched by the evils of modern computing. Or do we? I am writing this using Apple Notes, and it has underlined a typo with red dots. Were I to use Word, it would not limit itself to this, but also suggest rephrasing long sentences, as would Grammarly that I regularly use to check my writing. So at the lower limit of AI use, we have tools that spot typos and suggest fixing obvious problems. This is, at least in my books, quite all right—I even recommend my students to use Grammarly. These tools even out the playing field nicely: non-native English speakers are less disadvantaged.

Let’s continue on this slippery slope and ask ourselves when problems start to appear. Here, things get blurry rather quickly, as exemplified below:

>Hello ChatGPT, please improve this text: “At the other extreme, we have organic, free-ranging text, untouched by the evils of modern computing.”

>“Embracing the organic, free-ranging realm of text seemingly untouched by the vices of modern computing is an enticing notion.”

I am left somewhat speechless here. Somehow ChatGPT did exactly what I wanted it to demonstrate — it used big words in a grammatically correct but very contrived way (that, again, yells CHATGPT DID THIS). Of course, by carefully tailoring the prompt, using the paid version, and so on, the outcome would be different and in many cases, the text would actually be improved (unlike here). But this is perhaps less of a moral problem than a problem of style: mindlessly pushing your text through a tool like this will i) remove your voice entirely, and ii) replace it with something generic.

Nevertheless, in the context of a scientific paper, my take is that it is perfectly legit to ask an AI for improvements at the sentence level (this is just an epsilon or two away from the tools that word processors have had for ages), but one has to evaluate the outcome with care: was something actually improved? Was something lost in translation? Is the AI-generated version easier and more pleasant to read? Would it obviously stand out as not having been written by you? (Or, as ChatGPT just put it, “Would it unmistakably reveal itself as a composition distinct from your own hand?” I cannot stop laughing and/or crying.)

Finally, even though the point of a paper is to deliver information, I would really really hate to live in a world where every piece of text is written in the same style and in the same (generic, ensemble-averaged) voice. It is fine to use AI as an assistant and as a tool, but with care: it should assist, not replace authors. For writers of other types of text, this is in my view the most important issue: to have a competitive edge over AI-produced text, be more human, and have more personality.

To be continued…

Slides for my CCS warm-up presentation

The young researchers in Complex Systems Society (yrCSS) invited me to talk on scientific writing at Palma de Mallorca on October 15, 2022. It was really great to speak to an active & interested audience!

Here are the slides — I hope you find them helpful!

There is a video recording of the whole talk as well, available on YouTube. Go check it out.

Science — stories or pure data?

Writing a scientific paper

In his recent post, Petter Holme presents an entertaining inner dialogue about whether one should market one’s scientific output or not. Much of this centers around the concept of stories — and the discussion on whether we should publish papers that have storylike narratives or just plain data has been going on for a while.

Being an advocate of papers-as-stories, let me add another point of view to the mix.

I feel that there are two dimensions here. The first one is the axis from facts to fiction, and being scientists, we all know where we should place ourselves here. The second dimension is about pure data versus understanding/insight, and it is this dimension that in my view necessitates some storytelling.

Let me explain my reasoning by starting from pure data. Suppose I have carried out an experiment/done some simulations/analyzed a bunch of data I found on the Internet. Now, if I wanted my output to be pure data, I could just release the numbers as tables or graphs or whatever, and maybe an explanation on how the experiments or simulations were carried out. Pure data — no story.

However, my pure data would probably not make sense to many people, if any. To take a step in the direction of meaning, I should at least explain what the research question is that the experiment/simulations/analysis project was designed to answer. I might also feel compelled to tell how the data answer this question, i.e., to give the numbers some meaning.

Notice the elements of a story sneaking in? There is a question, there is an answer.

But even after these additions, only an expert reader would be able to see the meaning in what I have done. For anyone else, more would be needed — why should this question be asked? What is the context for the question? And why should one care about the results?

Add these elements, and we have arrived at the typical structure of a scientific paper that begins with an introduction and ends with a discussion. We have also strayed pretty far from pure data, and are now firmly in the realm of stories. First, we introduce the world and the characters that inhabit it, then we create tension with an open question, and release this tension with an answer.

But such stories of science are not works of fiction; they are told with facts. This, to me, is why papers should be stories — stories provide clarity, understanding, and meaning. They help the reader to connect the dots. Of course, one can and should release pure data too: numbers, results, code, everything. But these only get their meaning through stories.

Podcast interview on writing

How to Write a Scientific Paper book cover

I was recently interviewed by Daniel Shea for his podcast Scholarly Communications — you can listen to the interview here: https://newbooksnetwork.com/how-to-write-a-scientific-paper

We discussed my writing book and writing in general. This was a very enjoyable discussion & Daniel had plenty of good points and new perspectives that I could immediately agree with — do have a listen, highly recommended!

How to Write an Excellent Master’s Thesis

How to Write and Excellent Master's Thesis [slideset cover]I was asked to give a talk on how to write a Master’s Thesis at our department’s Comms & Coffee event this morning; here are the slides.

This talk is an adapted version of my paper-writing system (no, I haven’t written a book about writing Master’s theses, at least not yet). You’ll notice that companies & businesses are mentioned—Aalto is a technical university, so many MSc theses are in fact done as interns/trainees in companies.

I hope the slides are useful. Feel free to share with your students!

Do you want to study complex systems? M.Sc. admissions open until 3.1.2020!

Complex systems

Are you looking for a master’s programme? Admission is now open for our Life Science Technologies master’s programme at Aalto University, Finland, Europe; there is a Complex Systems major within the program and I am the responsible professor. And I am looking for talented and motivated students from all parts of the world!

What’s in the major? A lot of interesting and cool things: network science, data science, machine learning, nonlinear dynamics, to mention a few! Here’s why networks are the thing. And if you want to know more about what complex systems are, just have a look at previous posts in this blog, e.g. on mobile-telephone calls, ants, and the immune system.

Here is a complete list of courses in the complex systems major for this winter (only minor changes coming in 2020).

There is a lot of freedom in designing your own curriculum: there are many courses to choose from, including courses by other Life Science Technologies majors. This makes it possible to mix and match: want a combination of machine learning and complex networks? Check. Want to be a network neuroscientist? Check. Want to get a broad training in data science? Check.

Note: even though the programme is called Life Science Technologies, you can almost completely avoid anything that begins with “bio” if you so wish. As an example, I have students who focus on social networks, computational social science, or public transport networks, and who I believe haven’t taken any biomedical courses. But, if bio is your thing instead, there are plenty of opportunities here too!

One more thing: the doctoral track. If you are talented and your grades are excellent, you can apply to the doctoral track where your final target is not the master’s degree but a PhD; your studies are tailored towards that goal and you’ll get to spend time as intern in our research groups, with the aim of publishing the first journal article(s) of your thesis already before you get the master’s degree.

So, what are you waiting for? Apply here! The deadline is on Jan 3rd, 2020.

Slides for my keynote at Complex Networks 2019

LisbonTalkCover

I gave a keynote talk at the Complex Networks 2019 conference in Lisbon—here are the slides, if you are interested.

If you are interested in temporal networks in general, here are some pointers:

Postdoc Wanted — Network Science, Public Transport Networks, Cities, etc

HelsinkiPTN2

We are looking for a postdoc (2 years) to work on the intersection of complex systems/networks, transport engineering, human mobility, the science of cities, and data science.

This position is related to ongoing collaboration between my group and prof. Milos Mladenovic’s (Twitter: milosplanner) transport engineering group  (both at Aalto University, Helsinki area, Finland).

We want to bridge the gap between network science and transport engineering, including city planning and public transport network planning; for our earlier joint works, see, e.g.,

What we can offer:

  • Access to unique data: e.g., details of all trips from Kutsuplus, famous for being the world’s first on-demand public transport service; vehicle-level geocoordinate trajectories for public transport in the Helsinki region; aggregated mobile-phone flow data; and more coming in.
  • True multidisciplinarity with real-life application potential: in addition to the two teams from different domains (networks & transport), we interact with on-demand transport companies, the Helsinki Region Transport Authority, etc.
  • Access to heavy-duty computational resources (our Triton cluster, etc)
  • Access to lots of in-house expertise on networks, data science, and transport studies
  • Lively environment: Aalto University with a campus ~10 km from the centre of Helsinki with its own subway station (great public transport connectivity!)
  • Decent salary: >3keur/month, which is really quite OK in the Helsinki area (despite taxes + costs of living being a bit higher than in most countries)
  • Darkness of winter that is compensated by almost around-the-clock sunlight in the summer!

What we expect:

  • Expertise in network science/complex systems/data science
  • Some level of expertise in cities, transport, spatial networks, geodata, etc
  • PhD in a field relevant to the above
  • Skills in Python or willingness to learn them fairly quickly (packages such as gtfspy will help you get started)
  • Interest in the topic!

The call is open until 20th of December; the applications will be processed and (Skype) interviews with shortlisted candidates will be conducted in January 2020.

Please email a single combined PDF document containing 1) a cover letter, 2) your CV and publication list, 3) contact details for two references, to jari.saramaki@aalto.fi, with “Mobility postdoc” in the topic.

How to write a press release that journalists want to publish?

pressrelease_blogpost

[This post has been co-written with sci comms coordinator Anu Haapala].

If you happen to come across research results that are worth sharing with the general public through online publications or traditional newspapers, you’ll usually need to approach them with a press release. As the inner workings of press releases are notoriously difficult for scientists to grasp (what, you have to present things in the wrong order??) and as no-one actually teaches scientists how to write them, I have teamed up with our specialist Anu to provide some help.

Of course, if you live in fairyland, your university has a scientific communications team that simply reads your paper, understands its content and implications better than you, and compresses all this into a readable, exciting press release that instantly makes you a media superstar. But if you live in the same world as the rest of us, you might actually have to work a bit with said comms team as your results might not be as comprehensible to non-specialists as you think. Also, it might help if you’d understand what it is that the comms people are trying to achieve — what is their output? And, more often than not, you might, unfortunately, even need to write the press release yourself because there are not enough comms people around… so how should you do it?

There are two key things to understand here: 1) the intended audience and 2) the structure of the press release.

Let’s begin with the audience. In fact, a press release has two audiences: the first is the journalists who act as gatekeepers, and the second is their audience, the general public or its subsets such as tech-savvy readers or wannabe astronomers or similar. The gatekeeper role of the journalists comes from their need to serve their own audience: they only publish your story if they think it is of interest to their audience.

This has direct consequences on the form and structure of a scientific press release. First, the press release has to be written in a way that journalists are used to seeing and they can make best use of it (which is very different from scientific writing!), and second, its language and content should be comprehensible to laypersons.

The way journalists would write any news story – and the way you should write your press release too – is to put the most important thing first, followed by other things in decreasing order of importance. This inverted-pyramid structure has historical reasons: there is limited space in a newspaper, and shortening a story is easier if the editor can just chop off a few last paragraphs without doing much damage to the central point. At the same time, this structure makes the story more readable: the readers do not need to wonder what the point of the story is if this point is the first thing that they encounter. In other words, they see immediately what all the fuss is about and whether they want to read more about it.

The problem is that we scientists are really not used to writing this way: it almost physically hurts us to give away the main result immediately, in the very first sentence, without lengthy motivation or background or methods or anything to prep the reader with. But no pain, no gain: this is what you should do. Always begin with the main result, formulated in plain language that even your grandmother who never went to high school can understand. This is difficult, we know, so coming up with the proper words might take quite some time. But it’s worth it.

After introducing the main result, you should tell why the results matter and what follows from them, again in plain language and using only words that your audience can understand. What is now possible? What new and wonderful things can now be achieved? How has your result made the world a better place? And after this, you can continue to add in paragraphs in decreasing order of importance (to your target audience!). These paragraphs can add further details to your result, talk about the setting where it was obtained (your research group, an international collaboration…), sketch some future directions, and so on. It is probably safe to leave methods last, unless they contain something that would be especially interesting to your target audience (of non-scientists, remember!).

Journalists are used to killing their darlings, though, and you should, too. This means that you should critically evaluate each paragraph you write. If any of them seems unnecessary or trivial to anyone outside your own research community, don’t hesitate to press the delete button. News desks receive dozens of press releases each day, which means that journalists are ready to give their precious time only to a selected few. The shorter and snappier your press release is, the more likely journalists are to read your release through and publish it as such.

Leaving blanks in the right places can even encourage them to grab their phones and call you with follow-up questions. For this very reason, always remember to include your phone number and email address at the end of the press release. Journalists want to contact you, the specialist, directly and right now instead of trying to catch you through your university comms for days. (Believe us, they can hardly imagine anything more frustrating than an interviewee who is playing hide and seek!) So when you send out a press release, make sure you do it at a time when you are actually available to pick up your phone and discuss your research, even if just for five minutes.

What should not be the first thing that you leave out of your press release, however, are quotes. Good press releases contain an element of human interest in the form of quotes, things that you or a colleague of yours say about your results or research. “We had never thought about X until we figured out that Y”, says N.N., a postdoctoral scientist. “Then, the solution practically presented itself, and we knew how to do Z.” Quotes are an easy way to build bridges from one topic to another in the storyline of your press release: it might be even easier to use a quote than to write something up as a full paragraph (see the example above). In addition, humans (your readers) are always interested in other humans, so quotes make your press release more appealing.

Please remember that your press release is NOT a scientific publication: it does not need to tell everything (like the details of your methods). That’s what your original paper is for. You should leave out things that are too difficult (or too boring) for the intended audience. You may need to invent analogies or to simplify your results a lot: as long as you are truthful, this is perfectly fine! The only thing to avoid is over-generalization or exaggerating your work (despite some sci comms folks and some journalists craving for sexy headlines): make everything simpler, but keep it real. Also, send your release out in a format that is easy to copy, paste, and edit. Most comms teams use centralized press release services, but if you cannot access one, send out a simple email message! This is much better than hiding your release in an attachment: here, creating a nice-looking PDF will only slow you down.

Finally, timing counts too. Remember your first target audience: the gatekeeper journalists. Journalists want NEWS, they want things that happen right now, and they want news before their competitors! This means that you should send out your press release so that as soon as the result is out (some journals have press embargoes), they can run their story. A week or a month later won’t do; it’s very hard to make a journalist interested in a result that was published weeks ago. So as soon as you know your publication date, contact your sci comms people and start preparing your press release.