How to choose a research question?

In mathematics, the art of proposing a question must be held of higher value than solving it. –Georg Cantor

It can be said with complete confidence that any scientist of any age who wants to make important discoveries must study important problems. Dull or piffling problems yield dull or piffling answers. It is not enough that a problem should be ‘interesting—almost any problem is interesting if it is studied in sufficient depth. —P. B. Medawar

Choosing which problems to work on is perhaps the hardest and most crucial part of science. It is also an invisible and underrated part. In university, we are taught to solve problems that someone has designed for us to solve. I’ve yet to see an exam where the task is to invent a problem rather than solve one! But without problems, there are no solutions either. Problems come first, and solutions are only meaningful when the problems themselves are meaningful.

Despite this, solutions and the clever methods we develop to achieve them tend to take center stage when we write up our research. While we typically provide a post hoc justification for our research question in the introduction, convincing the reader why it is important, we practically never document how we actually chose that problem over a large number of others.

How did we search for the problem? How did we identify it? Was it already circulating in the literature, known to others as well? Did it come to us suddenly in a flash of inspiration — this would be a great problem to study? Did we have an intuitive feeling that if we begin to chip away at this problem, something useful will emerge? Or did we accidentally stumble upon some results, only later realizing what problem they actually solve? There are several ways of arriving at a research problem.

While having a lot of experience in one’s field is of course useful for identifying great problems, not knowing everything can also be an advantage. Creativity thrives when unexpected connections are made, and knowing too much can lead to to tunnel vision. This is why it might be a good idea to switch fields once or twice during your career… Of course, if you know nothing, it’s not possible to invent meaningful problems, and if you know too little, you might come up with problems that others have already solved. So know your literature, but don’t be afraid to set your own research goals, not solely relying on what everyone else thinks is important.

When doing a Ph.D., start developing the skill of inventing meaningful scientific problems from day one! This investment pays off compound interest: the better problems you invent, the more doors your results open, leading to even better questions.

Often, a good research problem is like a seed: plant it in fertile soil, tend to it well, and more great problems will sprout.

Let’s now suppose that you have a list of candidate problems—research questions that you might well like to study. Which one should you pick? What does a great research problem look like? First, let’s impose some real-world constraints.

Of course, the problem should be important, and its solution should make a difference. But it also needs to be solvable—impossible problems are not worth it, especially if you’re trying to finish a Ph.D. Still, problems may look deceptively easy, and it’s probably safe to say that the large majority of problems are more difficult than they first appear.

Since there is a finite time in which the Ph.D. has to be completed, it might be wise to mitigate risk by balancing your more ambitious endeavors with some “safe” problems that guarantee results. It is also good to have a plan B for getting something publishable out of problems that are too hard or too slow to completely crack. Then there is always material to write about, even if the more grandiose undertakings fail or take one to several lifetimes to complete.

Therefore “finding a cure for cancer” or “developing an artificial brain” do not qualify as great research problems for your Ph.D. Rather, they are long-term targets of entire fields of science. However, “figuring out the role of pathway X in preventing our immune cells to attack tumours” or “understanding the role of criticality in the prediction power of liquid-state machines” could be steps in the right direction.

Confession: I made these up sort of randomly (though not entirely randomly). Yet both are focused or at least focused and concrete enough that one could actually start attacking them from some experimental or theoretical angle.

Thus far, we’ve covered the easy part—problems should be concrete and solvable, and impossible problems should better be left alone. But not all concrete, solvable problems are worth solving. The elephant left in the room is importance. What makes a problem important? And what does that even mean?

Of course, your results might yield obvious and direct benefits—perhaps contributing to a new medical treatment or laying the groundwork for future advancements in artificial intelligence. However, most of the time, “importance” is a more elusive concept. It’s usually easier to assess the significance of a result after the fact: if a scientific article is highly cited, it likely had a substantial impact on other scientists’ thinking.

When we move from concrete outputs like articles to the more abstract and hazy realm of ideas, one way to visualize science is as an ever-growing network where ideas give birth to new ideas. An impactful idea is one that sparks many others downstream, either directly as offspring or indirectly through a chain of intermediate concepts. This latent process is what generates the aforementioned citations: ideas spawning ideas, and, in a Darwinian sense, great ideas giving rise to more ideas.

Unlike in biology, however, an idea can have far more than two parents, and its fitness isn’t always immediately apparent—there can be a long delay before its importance is recognized. And unfortunately, sometimes the building blocks that had to be put in place before a major new idea could emerge are forgotten. Everyone knows about Einstein’s theory of relativity, but few are aware of all the earlier efforts that went into developing ways to synchronize clocks across countries and continents using electric cables. Yet Einstein was undoubtedly familiar with this work, and it must have influenced his way of thinking.

In any case, even if assessing the importance of a research question is not trivial, it is worth asking why someone would cite your results, say, 10 years down the line. Is the question fundamental enough so that if you solve it, others will build on your results? Try to see the question as part of a bigger tapestry.

The above picture of science as a flow of ideas being born, merging, and mutating is also helpful for reviewing the literature, both for coming up with research questions and understanding what has already been done in the general vicinity of a question that you have chosen to address. Science is a network—to make discoveries, follow the links of that network!

Identify impactful and highly cited papers and try to figure out why they are important. Then, use Google Scholar or some other tool to find out who cites them; just look at the abstracts of the citing papers to get the big picture and then dive into the details of those pieces of work that sound relevant to you. This is, in my view, a more useful way than trying to read all the literature in detail, in some random order. Try to first see the forest from the trees, and then focus on those trees that you find important. If you feel that some trees or entire forests are missing, you have a research question!

Baboon Song — a Temporal-Network Composition

Here’s something playful for you — Baboon Song, a track I composed using temporal networks:

I took the Baboons’ interactions dataset from SocioPatterns, turned some baboon interaction timelines into MIDI files, chopped the files into bits, and constructed loops by feeding them into various synths in Ableton Live. I built the skeleton of the song in Ableton, added more instruments, and mixed the track in Logic Pro X (a tool I know better than Live).

The repeating rhythmic patterns are interaction timelines corresponding to a day in some baboon’s life — the pattern that starts the song is between two baboons, and the bell-like pattern that enters next is a whole temporal ego-net of another baboon. There are whispers of other ego-nets in the L and R channels… On top of the baboon sounds, I added a bit of synth bass, a few rhythmic elements, and some slightly cheesy harmony at the end of the track that I felt just had to be there, because of sunrise over the savannah or something.

Enjoy!!

PS If there is a production nerd among you who wonders — rightly so — what the beautiful, spacious reverb that everything swims in is, it’s just the default preset of Valhalla’s VintageVerb. The best reverb that there is. Rant over.

PPS Inspired by Llyr (https://open.spotify.com/album/4al4RrIK5oMUd95Gvjlyiu)

The Sound of Temporal Networks

I recently gave a talk at the Complexity, Aesthetics, and Sonification workshop in Bielefeld, Germany, organized by Thilo Gross, Maximilian Schich, and Cristián Huepe. A really great workshop with lots of different points of view from art to science!

For the talk, I did a bit of exploration in representing temporal networks with sounds. As those who have dabbled with temporal networks know, visualizing them is very difficult, as they live in time instead of space. But so do sounds. Let’s hear what temporal networks sound like, then!

So what was that? That was one month’s worth of data on students’ phone calls from the Copenhagen experiment, compressed into 13 seconds. I took 10 random students and assigned each their own random pitch so that a sound is played every time the student makes a call. I then turned the time series into MIDI which was fed into one of the synthesizers of Apple’s Logic Pro X.

For such a simple and straightforward exercise, there’s a surprising amount of information in the sonification. If you are into temporal networks, you can hear several familiar patterns: there is a daily cycle, weekdays are different from weekends, and there’s also burstiness.

Let’s continue listening to these data. The Copenhagen data set contains metadata on text messages as well, so let’s pick one of the students and listen to their egonets — everyone they call or text will get their own pitch, so that, e.g., one friend is always C (on some octave). Then we’ll feed the calls into a sampler with piano sounds and the texts into another with sampled upright bass.

Quite jazzy, isn’t it? And, again, one can pick up a lot of information here. The daily cycle and the burstiness are still there — and there are even some repeated patterns, parts of temporal motifs. There is also a finding that had escaped my attention earlier — at around the middle of the timeline, there is a cluster of notes being played on the piano, as the student makes a large number of calls in a short period of time. This pattern is, in fact, present in several other students’ timelines at the very same time.

Now let’s have a bit of fun with probing the network with random walkers. I use greedy walkers — a random walker is placed on a node (student), and when the student makes a phone call, the walker moves on to the student being called, and so on. Every newly visited student gets their own pitch that is one semitone higher; when the pitch goes down during the process, this means that the walker is visiting nodes that were already visited. Let’s hear one walk, starting from a random node:

The walker explores a larger subnetwork around the starting point, sometimes backtracking, before escaping off. Now let’s hear another walk:

Quite different, right? This walker has literally become stuck in a neighbourhood of a few students who only keep calling one another and the walker cannot escape. So the social neighbourhoods of these two students are quite different indeed!

Finally, for something entirely different — the sound of criticality. This is simulated (by my student Sara Laurila): what we have is the SIS (Susceptible-Infectious-Susceptible) model on a N=50000 node network, parametrized exactly at criticality — on the boundary between two phases where in one, all activity dies out and in the other, there is persistent activity. (In the model, nodes are S until they are in contact with an I, then they become I and make others I too, until they revert back to being S, to become I again at some point in the future. So this excitement (I) propagates through the network).

In the sonification below, I again use a random sample of sentinel nodes, each assigned their own random pitch. The nodes make a sound whenever they turn I, i.e., whenever the wave of excitation hits them. Here’s what criticality sounds like:

Here’s the same but with drum sounds instead. Sounds like Zappa, but without intention or direction, as a drummer friend of mine remarked.

And finally, criticality from the point of view of one single sentinel node:

Apply Now to Our Master’s Programme in Life Science Technologies!

Our popular Master’s programme in Life Science Technologies at Aalto University, Finland has a major in Complex Systems! The major includes a lot of network science taught by top scientists in the field — yours truly, Mikko Kivelä, and Petter Holme. You’ll also learn some Python programming, data science, machine learning, and nonlinear dynamics — or, if you wish, you can choose a more maths-heavy subset of courses, or combine your studies with, e.g., human neuroscience.

This major has very tight connections to research, and many students have continued toward their doctoral degrees after receiving their M.Sc. Another very popular and successful career path has been that of an industrial data scientist or consultant, e.g., in the health industry. There is a lot of demand for these in the Finnish job market, so a Master’s degree in Life Science Technologies is a great investment in your future.

The application period is open only until 2 Jan 2024, 3:00 PM GMT+2 so be quick & apply now!

Season’s Grant-Writing Tips, Part 2/2

A very, very AI-generated image where money falls down like snow.

In the first part of this grant-writing mini-series, we learned the fundamental secret of grant-writing (and, in fact, any writing): everything revolves around the reader. The only purpose of a grant proposal is to make it easy for the reviewer to recommend funding.

Let’s break that statement down. For the reviewer to recommend funding, she has to feel that what you aim to do is important, novel, and feasible, and that you are exactly the right person/team to do this. In more touchy-feely terms, the reviewer has to like the proposal. And you.

As we discussed in the previous post, this is much more likely to happen if the proposal doesn’t make the reviewer work too hard: it should be focused, clearly written, and provide clear answers to the questions the reviewer must address.

To help with the above, we’ll now address writing at the level of paragraphs and sentences, borrowing some tricks from professional copywriters who craft advertising text. These techniques not only involve gently manipulating the reader—all writing is about manipulating the reader!—but also aim to ensure that the text flows. An ad where the reader gets lost or bored is a failed ad.

Let’s begin at the beginning because it is the most important place. In any writing, the first sentence and the first few words have enormous power—”Call me Ishmael”—and you should tap into this power. This is because they prime the reader’s mind for what is to come. They also set the general mood. Begin your proposal with a few strong sentences that almost win the grant! These sentences should summarize your plan and its impact: why is it important to do the things you plan to do? Why are you in a unique position to do this? If your grant is funded, how will the world become a much better place?

This mini-summary serves a dual purpose in priming the reader. Firstly, on an emotional level, the reviewer should feel excited – “This sounds like a great proposal!” If you achieve this, the reviewer will have a positive bias from the very beginning. However, with a weak or muddled beginning, you’ll need to work hard to win them over. Secondly, it is much easier for the reviewer to follow the text when they know where it is going — easier in terms of both comprehension and how reading the text feels (these two are, in fact, the same).

There is another place of power: endings. The power of endings is different from that of beginnings: whereas beginnings prime the reader, the endings are what the reader remembers. This is because between paragraphs and between sections there is a break in reading, where the stream of input to the reader’s brain temporarily ceases. This leaves more space for whatever the last input was to echo around in the reader’s head.

Saving important bits to the end is a common copywriter trick—ever seen an ad with “click here to buy” in the middle?

However, this trick works best for short sections and well-written text. If you lose your readers along the way, they won’t reach the end. Remember the overworked, sleep-deprived reviewer from the last post? She might be tempted to just skim, you know. To mitigate this risk, write short paragraphs ensuring that the reader makes it through to their end—and write them well. For section endings, a strong recap sentence — perhaps as a separate paragraph—can do wonders. “In summary, my research can be expected to have an enormous impact, because…”

We’ve now covered beginnings and endings. What is left is how to get from the former to the latter. Here, a copywriter’s trick is to understand that while the sentences must deliver information — including enough details of your research plan to judge its feasibility, etc — their task is also to propel the reader forward. In ad copy, the primary task of every sentence is to make the reader read the next!

This means that the sentences should seamlessly flow into one another, which is a general sign of good writing regardless of the genre. This is particularly important for information-dense grant proposals: information is much, much easier to absorb through a narrative than when it is presented as disconnected bits and pieces. The narrative is what keeps the reader going: as humans, we’ve enjoyed stories since the dawn of man, singing around campfires.

For a grant, the narrative is particularly important for sections prone to being dense, taxing, and boring—imagine the sleep-deprived reviewer having to wade through 25 poorly written state-of-the-art sections! This is especially crucial if the section is at the proposal’s beginning, as state-of-the-art sections often are. So next time when writing one, consider the reviewer, and instead of just listing references, write a story of how your field of science has evolved to the point where you can both ask and answer your research question.

Finally, as I mentioned in the previous post, there is one spot in the proposal where you can be slightly difficult to understand on purpose, in particular, if the reviewer is not really in your (sub)field and your proposal involves theory/maths/data analysis/similar.

This is in the methods section, or whatever the section where you describe what you are going to do is called. Whereas the research question and its importance should be written with absolute clarity so that everyone can understand them, here you can show off a bit. The point is to give the impression that you really know your stuff. Even though your proposal should generally be as free of jargon as humanly possible, it doesn’t hurt to have one strategically placed sentence where you flex your claws, show that you can devour your field’s most complicated concepts for breakfast, and instill a bit of fear and awe in the reviewer. Then you can be all nice again, and wait for the gifts to arrive.

I wish you merry grant-writing!

Season’s Grant-Writing Tips, Part 1/2

Grant money falling like snow (a very, very AI-generated image, by craiyon.com)

It is grant-writing season here in snowy Finland, and to keep away from the actual work, I thought I’d write a couple of posts on grant-writing tips. Today we’ll be all nice, but in the next episode, we’ll get a bit naughty because that might in the end bring us more gifts. Ho ho ho.

Let’s start at the very beginning. When writing a grant, the most important thing for you to understand is what is going through the heads of your target audience—the reviewers. You are writing the grant to persuade them to recommend you to get funded. Your one and only task is to make this as easy as possible for them.

This simple rule — to make it as easy as possible for the reviewers to recommend funding the proposal — gives rise to many corollaries.

To arrive at those, consider the situation that the reviewers find themselves in. It is very rare to get a single proposal to review that is spot on in the reviewer’s own subfield. What is more common is that there is a large pile of proposals on the reviewer’s desk, they are almost but not entirely off-topic, the deadline was last week, the reviewer has barely slept because the kids are sick, and even the coffee has gotten cold.

In this situation, the reviewer will be very, very grateful if you make her task easier.

This means, among others, that a) the proposal must be easy to understand, even to a non-expert, b) the proposal’s value and level of ambition must be immediately visible, c) the proposal must contain direct answers to the questions that the reviewer has to answer, and d) the proposal must not contain any more stuff than is necessary to convince the reviewer.

The first corollary requires that you’ve actually given your research plan enough thought so that you can understand it yourself—in other words, you must know what you are doing. It helps a lot to have a clear focus: it is a common beginner’s mistake to try to squeeze all your ideas into one proposal, which then reads like a confusing superposition of several muddled research plans. Focus on a single topic and your best idea to avoid confusing the reviewers because otherwise, they won’t know which of your parallel plans they should be rating. Confused people are rarely happy people, and only happy people give top ratings!

Being easy to understand also means well-written: reading a good grant proposal shouldn’t feel taxing. Avoid jargon and complicated sentences; always err on the side of simplicity. Also, your proposal should not read like lecture notes because the proposal is not about teaching the reviewers. Nothing is as annoying as being lectured to if you only want to get your reviews done!

The proposal should contain enough information to convince the reviewers of how and why you plan to do what you plan to do, but no more than that. Again, think of the poor reviewer who has 20 proposals on her desk: do you think that she is happy to try to become an expert in 20 new topics by reading about a metric ton of intricate details under heavy time pressure, with cold coffee and cranky kids demanding attention? I don’t think so.

That being said, there is one spot in the proposal where you can be a bit difficult to understand on purpose, but let’s leave that for the next part of this series.

Being easy to understand also means no bulls*it: no fluff or fancy-sounding, big words that mean nothing. For god’s sake, no ChatGPT-produced text because it is full of the above, unless you really, really know how to use it. Write the text yourself. Write concisely, simply, and powerfully. Write like you mean it.

The second corollary demands that you make your case clear directly and very early on. Here, my suggestion is to start with a summary paragraph that is almost enough to win the grant for you. More about this later.

The third corollary — the proposal must contain direct answers to the questions that the reviewers have to answer — is hugely important as well. This requires you to do a bit of reconnaissance: the reviewer guidelines and/or review forms of many grant agencies are public. Get them. Study them. Learn them by heart. Find out what specific questions the reviewers are asked, and make sure that your text contains copy-pasteable answers to each, preferably well-highlighted (in italics, or so), so that in a hurry, the reviewers can recycle your text in their statement. Make sure that your answers are winners and that it is easy for the reviewer to give them full points.

Lastly, clarity and readability are often in direct conflict with the amount of stuff in a proposal. Again, a common beginner’s mistake is to cram in as much text as possible, fiddling with the margins or font sizes and using stamp-sized figures, etc. In contrast, the pros choose what elements to include and then focus on those, leaving enough white space and room to breathe. Don’t make the reviewers choke on the amount of stuff they have to ingest! Focus on what matters. Quality instead of quantity.

That’s all for today. In the next episode, we’ll put on our black hats and talk about some Jedi mind tricks, stolen from the evil folks who write ad copy that makes you buy stuff that you don’t need. Stay tuned!

Are you new around here?

Notebooks and a pencil

As there has recently been a surge of visitors coming from Moodles and other learning platforms, I thought I’d say hi — hello there!! — to everyone who is new to this blog, and provide some guidance in the form of a table of contents of sorts.

So, where have you landed at? This is a blog by me, where me = Jari Saramäki, an interdisciplinary physicist and a professor at Aalto University, Finland, dabbling in network science and other complexities, and a big fan of lucid writing. Also, a bass guitar player, because someone has to be.

The blog contains things that students have found useful (which may be why you are here), in particular, advice on how to write scientific papers and how to develop your scientific writing skills:

Welcome again, and I hope you’ll find something in this blog that is either useful or entertaining, or both!

The abstract as a tool for better thinking

Having recently spent considerable time writing abstracts for some papers-in-the-making, I thought I’d share another post on the topic, even though it has been heavily featured on this blog before.

As you may already know, I advocate for writing the abstract before the rest of the paper, contrary to what is advised by some writing guides, e.g., this one (thanks Riitta H for the tip). Why?

To me, writing the abstract is, first and foremost, an exercise in thinking, to the extent that the written abstract itself can feel almost like a byproduct.

This exercise is all about clearly understanding what the paper is about: what the research question being asked is, why it is being asked, what the outcome is, and why should someone be interested in it.

While most of these questions may have been answered when the research was designed – e.g., you don’t build an expensive experimental setup without knowing why and what for – this is not always the case. Sometimes the data lead to unexpected directions, rendering the initial question obsolete. More often than not, your perspective shifts along the way: the initial question becomes something larger or morphs into something else. But what exactly?

To figure this out, you’ll need to give the abstract a go before even considering the rest of the paper. So, how to write the abstract of a research paper? As those who have read my book or attended my writing lectures know, the abstract template that I recommend is the same as the one used by Nature. Not because it’s Nature, but because it does exactly what it should: it forces you to think clearly.

In plain language, the abstract template goes like this (sorry, Nature, for this abuse):

  1. There is an important phenomenon/topic/something.
  2. But within it, there are unknowns that need to be sorted out for achieving X.
  3. In particular, we don’t know Y, because of something that was missing until now.
  4. Here we solve the problem of Y using a clever method/experimental design/something.
  5. We discover Z, which is surprising for some reasons.
  6. Knowing Z advances our scientific field like this.
  7. More broadly, understanding Z makes the world a better place in this way.

This template helps you refine your story and the point of your paper and serves as an acid test: if you cannot write the abstract, you are not ready to write the paper. It also ruthlessly exposes any gaps in your thinking, which is excellent because it’s a template, not Reviewer #2 who gleefully rejects your paper from the journal and taunts you in the process.

Writing the abstract first using the above template helps you improve your paper on your own before it is even written (which is optimal, isn’t it).

In fact, I often try to formulate a mock abstract that follows the template during the very early stages of a research project, often well before the final results materialize. I find that this helps to understand where the project is going, and what might still be required. If I feel confused [narrator’s voice: which happens very frequently], the template sometimes shows the way.

Slides for my NetPLACE@NetSci2023 talk

It was a great pleasure to give a short keynote on writing in Vienna (in a hall with the above text on the wall)! My slides for the talk can be accessed here.

The whole NetSci conference was excellent and it was great to meet many friends and colleagues after so many years. A great many thanks to the organizers!

On scientific writing in the age of AI, part 2: A thought experiment

In the spirit of my post last week, let us continue figuring out the role of AI in scientific writing through a Gedankenexperiment. Where we left off was the use of AI as an assistant — a virtual editor if you’d like — to suggest improvements to one’s text, instead of churning out autogenerated content. Think Grammarly++, or similar. This is, at least to me, perfectly fine. However, I would appreciate it if the text still retains its voice and human touch, lest everything sound exactly the same.

Now, fast forward to the future. If people still write science 25 years from now, how will they use AI tools? What are those tools capable of?

Here is where I feel science — at least natural science — might diverge from more creative forms of writing, as the purpose of written science is ultimately to transmit information. It might even become desirable to have AI write up our results.

Consider the following: suppose that I have carried out an experiment and want to write a paper on its results. I feed my plots, maybe together with a few lines of text about background, impact, etc, to my virtual writing assistant, and off it goes, returning with a complete manuscript. As my virtual assistant has been taught to write in my voice, the manuscript actually sounds like me. I read the manuscript and find that it is factually correct, and submit it to a journal.

Now, if the information in this paper is factually correct and it is written in a way that is appreciated by human readers, how should we feel about this? Is this ethical or unethical? Is this a future we’d like to see or not?

For this to be ethical, it should be done openly and the use of AI acknowledged. Which is of course very easy to do. Maybe this will be common: maybe most papers will be written by AIs that have been fed with original research results.

Beyond ethics, is this good or bad? That, I guess, depends. If all papers sound the same, it is bad. But what if the papers are indistinguishable from human writing, considering that everyone trains their own AI to write in their voice? What might be lost here is the finesse of argumentation, nuances, deep thoughts, and all those things that make famous writers/academics famous. On the other hand, perhaps this loss would be compensated by far fewer crappy, incomprehensible papers… just maybe.

It may also be that written scientific papers will become obsolete, or at least obsolete as stand-alone products (this is already happening with all the Jupyter notebooks and SI data sets and so on). There are also already now paper formats in some fields (e.g., biomedicine) that leave very little room for creative writing—these are mostly just data containers.

Perhaps scientific papers will in the end not be structured for human readers, but for other AIs that can then better pick up their arguments to propose new theories, experiments, and so on — in other words, replace us, scientists. But I have my doubts on this, as I at least hope that science requires creativity that is beyond mere statistics of words. Let us hope that humans can still out-weird AIs in the years to come (is that even a word)!

To be still continued, I think…