Paper Writing for PhD Students, Part 5: Figures



[This post continues the PhD student paper-writing series from here]

At this point, we have covered establishing the focus of your paper: you should already have a clear vision of what your paper is about, and the essence of this vision should be encapsulated in its abstract. You should also have the necessary ingredients at hand: the results to be presented in your paper together with ideas for schematic diagrams, organised into film-script categories according to their function and role in the story (Setup, Confrontation, Resolution, Epilogue).

The next step is to expand the storyline laid out by the abstract, and to outline the different sections of your paper. This begins with choosing what sections make up your paper. Depending on your target journal, you may need to follow strict guidelines—the commonly used Introduction–Methods–Results–Discussion structure for instance—or to come up with a structure of your own. Even for short letter-format papers that may or may not have subheadings, it pays off to have a clear idea of what goes where. Usually, this is not a difficult task: all papers begin with an introduction and end with a discussion, even if these span just a few paragraphs, and the results are sandwiched in between. Methods may be explained before the results, or after the discussion as an appendix of sorts (like in Nature and other glossy magazines).

What is more involved is choosing the order of presentation within the section structure. Here, a solid, tried-and-tested approach is to begin with the figures and their order of appearance. If you have followed the approach of this blog, your figures already come with handy labels (Setup, Confrontation, Resolution, Epilogue) and therefore you already have a good overall idea of their order. If your paper has to follow the standard structure, schematic diagrams and result figures will generally be placed in the Results section, so that figures of the Setup category come first and those of the Epilogue come last; schematic diagrams of the Setup category are an exception as they may belong to Methods or even Introduction. In any case, you’ve already done much of the work before you have even begun outlining the main text: the categories of the figures mostly determine their order. What remains to be done is to choose the order within each section in which the results and schematic diagrams are shown: what figure leads to the next?

The order of figures should tell a clear story, so that each builds on the previous ones. You can use a multi-panel figure to tell a self-contained part of the story, a miniature story arc. You can combine, for example, a schematic diagram that explains your experiment (Setup), some basic statistics of your data (Setup), and a result plot or two that contains an unexpected finding (Confrontation). This mini-story multi-panel figure is a technique that I often employ in letter-format papers where the story has to move fast and get to the point quickly; it already brings the story close to its Resolution and the key result.

When you have chosen the figures, write a draft version of the caption for each figure. You may need to revise the captions later; at this stage you may still be unsure of issues like notation and nomenclature, so don’t pay too much attention to details yet. However, try to write the captions so that they are self-consistent enough for a hasty reader to understand most of your paper just by glancing through the figures. After all, this is exactly what great many readers do; this is what the editor of your journal does too, before deciding whether your paper is worth a closer look or deserves to be rejected outright. How long and how self-consistent the captions should be depends, again, on your journal; in some of the letter-format top-tier journals, captions tend to be very long, while in the lesser journals where us mere mortals publish, figures are discussed at length in the main text and therefore the captions can be shorter. In any case, please make sure that your caption tells what the reader should learn from looking at the figure. A caption that only tells that here we see Y plotted as a function of X is not enough; it is redundant if you have remembered to label your axes in the figure already. Always tell the reader what the message of the figure is.

Because much of the story will be told by your figures, let us talk about figure quality for a while. Figures are tremendously important; those who only skim through the paper won’t see much else. Figures make the first impression and first impressions matter. Clear, high-quality figures with a professional look tell that a lot of effort has been put into the paper, and the reader is more likely to trust its contents. Amateurish-looking figures with a colour scheme that looks like PowerPoint in the 1990’s leave the reader wondering if the results are of the same dubious quality.

So do make sure that your figures look good. How to do this? First, learn the ropes of whatever program you use to generate your figures, whether it is a Python or R library, or a stand-alone piece of software (like Gnuplot that has been around since the dawn of man; it will probably outlast even cockroaches once mankind is no more). In particular, learn how to change fonts, how to increase or decrease font sizes, and how to use proper LaTeX-type fonts wherever appropriate. Learn how to choose and manipulate colours and colour schemes and symbols and shadings. Learn how to produce figures of chosen dimensions, so that you can later assemble them into multi-panel plots of your choice and combine them with schematic diagrams. Learn to match figure sizes to your target journal’s column width; not having to scale the figures takes some guesswork out of choosing font sizes (see below).

Second, do learn to use a vector graphics software to post-process your figures (and do learn the difference between bitmap images—they are made of pixels—and freely scalable vector images, made of lines and arcs and Bezier curves). At the time of writing, the industry standard (design industry, that is) would be Adobe Illustrator; there are many free alternatives such as Inkscape. With a vector graphics editor, it is easy to assemble multi-panel figures that contain schematic diagrams (drawn with the same editor) and result figures saved in vector formats (PDF, SVG). You can also add text, arrows, indicators, and so on, as well as retouch your result plots, changing line widths, colours of symbols, or their overall appearance. Often, this is much faster than trying to get everything right when producing the plots.

A few words on layout: always align things—nothing spells “I am being careless” more clearly than subplots and schematics that are not neatly lined up (it takes just a few seconds to do this). Use white space properly: leave enough white space so that things can breathe, but don’t leave too much white space so that the figures don’t look barren.

Discussing data visualisation at length is beyond the scope of this blog post, but here are a few remarks. Pay attention to your colour schemes. For plot symbols, there are much nicer and much more informative schemes than the pure-RGB red, green, and blue symbols that some programs use as default; on top, your reader might be colour blind and have a hard time distinguishing between red and green. Always use different symbols AND colours for different curves for maximal clarity. If you want a personalised colour scheme, google for colour scheme generators (you have already learned how to set hexadecimal colour values in your program, right?). For heat maps and similar, pay attention to the neutrality of the colour map you use: make sure that it doesn’t artificially highlight some part of your range of values. In all cases, use colours consistently through your figures. If red and blue are categorical indicators of, say, two different data sets in a graph, do not use a heat map where red and blue indicate high and low values: reserve red and blue for the two data sets, and always use them this way. Likewise, if you use a colour map with a gradient from low to high values, reserve its colours for this purpose alone.

Then, labels and fonts. First, always label your axes. This is self-evident, but I still have to explicitly mention it; even though forgetting to label the axes of a plot should feel roughly like forgetting to get dressed when leaving for work in the morning, it still happens. So, I repeat: label your axes, period. At all stages of your work, even if the plot is just a draft for your eyes. And when labelling, please do make sure that the fonts you use are large enough when the figure is scaled to its intended size; if you have chosen the plot’s dimensions so that no scaling is required, use 10 or 12 pt. Not paying attention to font size is a very common beginner’s problem, and there are even many published paper where a magnifying glass is needed to understand what is going on in the figures. I suspect this has to do with the defaults of the commonly used software packages; default font sizes are almost always tiny. I’ve rarely (if ever) seen plots with annoyingly large fonts, so if in doubt, double your font size.


Figure 1: Do avoid these common problems!

Finally, a few words about “having an eye for design”. While coming up with beautiful and impressive figures seems to come more easily for some, every student can learn to produce good-looking visuals. I’ve many times heard someone say “I cannot draw, and therefore my figures look ugly” but—as with any skill—it just takes time and patience; you do not need to go to art school to learn the essentials. Just like learning how to look at things is the key to learning to draw well, the key to producing great-looking figures is knowing how they should look like, instead of stumbling blindly. This is best learned by imitation. So, next time, take one of your plots that you are not entirely satisfied with, and look up a similar figure in some journal article that you like. Look at the two figures side by side, and try to spot the differences in composition, colours, fonts, line widths, and so on. Then modify your figure and keep on modifying it until you are satisfied with the outcome. Next time, you might not even need a reference figure.

PS I am contemplating expanding these blog posts into a full book or ebook. If this sounds like a good idea, let me know, e.g., by commenting below.

Paper Writing for PhD Students, Part 4: Theatrical Cut, Or How To Konmari Your Paper


This post continues the paper-writing self-help series for PhD students directly from where the previous post ended.

Once you have decided the key point of your paper and have settled on its main conclusion, the next step is to choose what results go in. This choice should be made with care: now that you have a point to make, plan your paper so that everything else supports this point. The rest should go! The best papers are often quite minimalistic: they drive their point home with essential ingredients only. Papers that contain tons of unrelated results are difficult to comprehend, because the reader is left wondering where to focus her attention. Clutter reduces clarity. Always konmari your paper! Keep what makes it happy and discard everything else.

Continuing with the film industry analogy, the process of going through your results and deciding what to keep resembles the process of editing of a Hollywood movie. After the movie has been shot, the director and the editor start working with an abundance of raw materials that are to be sculpted into the final product, the theatrical cut. The goal is to assemble the film from the shots and scenes that best support the storyline, cutting out footage that is not essential and that doesn’t have the emotional impact that the director desires.

Your paper is your theatrical cut. Use only the elements it needs, and leave out the rest.

Cutting out material and deciding not to use some of your results may feel difficult and painful – you spent a WEEK on that plot! But, believe me, it is for the best. If you want your work to have impact, it has to be read and understood, which is greatly hindered if there is too much unimportant or unrelated information to absorb. Clutter draws attention away from the point that you want to make, and leaves the reader exhausted.

Perhaps it is because of the pain caused by discarding perfectly-good-yet-unimportant results that most journals nowadays allow for a extended special five-hour-long director’s cut in the shape of a Supplementary Information document with an unrestricted page count. You can dump all those raw materials that didn’t make it to the theatrical release to the SI, so that they can safely be forgotten and ignored by the rest of the world. But now your week spent making that plot means at least something.

But back to your cut – how to choose the results that are to be included and that support the key result?

Let us see how far we can push the film script analogy discussed earlier. A typical film script begins with the Setup phase where the characters and the setting are introduced, then moves on into the Confrontation phase where the characters are put in interesting trouble, and finally there is Resolution (epic fight in space followed by an exploding Death Star or similar). This may be followed by a brief Epilogue (with or without frolicking ewoks). If we divide our results into these four categories, Setup and Confrontation contain results that are needed for getting to the main result, for building up excitement and for leading the storyline to its climax. Resolution is the main conclusion that we discussed in the previous chapter. Epilogue shows what follows from the Resolution.

The Setup category contains plots and results that are required for the reader to make sense of the context, setting, your experiment, and/or your data (like, basic statistics, and so on). Schematic diagrams that visually explain the concepts that your paper works with also fall into this category; always include a schematic diagram or two!

The Confrontation phase brings the story closer to the final revelation that you aim to make; it highlights the open important question that you address. You can do this, for example, by showing empirical results that are surprising and cannot be explained by existing theories, and then providing an explanation as the Resolution of your storyline. You can also build up excitement by presenting a number of competing hypotheses or models, to be then shot down by your results (except for the one model that matches with your data and provides your Resolution). Or, you can begin by displaying some surprising system-level results or statistical observations and then home in on their detailed explanation in the Resolution phase.

The Resolution category should only contain your main result and key point; one to two figures.

The last category of results, the Epilogue, is more important than the last couple of minutes of a blockbuster film. These results are presented after the main result, and serve the purpose of highlighting its significance. One key technique is to think of some application or consequence of the main result, and to illustrate this with, say, a figure that plays the role of an example rather than that of an important stand-alone result.

If you look at some research papers published in the glossy magazines (Nature, Science, and so forth), you’ll see that great many authors apply this technique: out of the four or so plots in those letter-format papers, the first is about Setup/Confrontation, the second is the key result (Resolution), and the rest are there for showing why the key result matters, or what it means (Epilogue). For the kinds of journals that us mere mortals publish in, these figure counts may be larger–the important thing is to decide on clear roles for your results and figures and use them accordingly when telling your story.

Paper Writing for PhD Students, Part 3: The Importance of Focus


[This post continues my “self-help” series on writing papers for PhD students; the previous episode can be found here. This series is an attempt to share some of the conceptual tools that I use with my students. Their point is to structure your thinking and focus your decision-making on a limited number of problems at a given time; having all options open at all times is an enemy of creativity.]

Scientific papers are stories, not just containers of information. The more focused and exciting the story, the more likely it is that it reaches someone. This is because that someone has to decide to invest their time in reading the paper, and as we all know, the world is full of papers, too many for any of us to read.

Thinking of papers as stories is something that doesn’t come naturally to most students or scientists. If we have been taught at all, we have been taught to write (boring) reports, certainly not to develop storylines, or to work with the kinds of higher-level conceptual elements that, say, journalists use.

Good writing starts with careful planning. I usually plan my storyline in three steps.

The first is defining the key point of the paper, the main conclusion that you want to tell the world. The second step is choosing the essential building blocks for the rest of the storyline, leaving out all results that are not necessary. The third step is taking these building blocks and arranging them into a condensed version of the storyline: the abstract of the paper. That’s right – I recommend writing the abstract before the rest of the paper. This is unconventional but it works.

Defining the focus of the paper – its key point and its main conclusion – is the most important step, as it lays the foundation for the rest.

In the best case, the key point is a single important result, but usually things are slightly more complicated than that. In any case, you should be able to explain your point and  main conclusion with one to three sentences. If you think that this is too little, consider these: the Earth rotates around the Sun, and not vice versa. Space-time is curved by mass. The salt of deoxyribose nucleic acid has a structure with two helical chains, suggesting a possible copying mechanism for genetic material. And so on. Clearly a sharp focus doesn’t mean that the result is simplistic – to the contrary, there is usually a lot of depth behind results that can be described with a few words.

Choosing a key point that can be condensed into a few sentences doesn’t imply that your paper has to be narrow in scope. If your work is of an exploratory nature your key result might be that you have mapped out a problem area and your paper provides the map, or perhaps your main conclusion is a broad synthesis of several sub-results that make up the bulk of your paper. The most important thing is that you can make it clear to the reader what your paper is about.

If you can compress your message into a package that can be easily communicated, the higher the likelihood that it reaches its intended target, the reader. This is not limited to primary transmission – say, the reader encountering your abstract on the arXiv and deciding to read on – but secondary transmission is important too: getting the reader to share your paper with colleagues, online or face-to-face. Whatever the type of transmission, it works best if the thing being transmitted is compact and focused.

Communication is always difficult and all communication channels are noisy – a tight focus helps your message to make it through in one piece.

Next: Theatrical Cut, Or How To Konmari Your Paper

Thou Shalt Not Smooth!

This is a very short post for those dabbling in the dark arts of network neuroscience. Everyone else, read this or this, they’re probably more fun anyway.


[Figure from Eur. J. Neurosci, doi: 10.1111/ejn.13717]

Q: When building ROI-level functional brain networks from fMRI data, should I apply spatial smoothing to the voxel time series?

A: No you should not, what were you thinking? See above; it messes up your degrees and links non-uniformly, and in general has weird effects. In any case, you already average your voxel time series to get your ROIs, which is brutal enough. For more, see our recent (open-access) paper in the European Journal of Neuroscience, with @TuomasAlakorkko and @eglerean and @hpsaarimaki and Onerva Korhonen.

Clone Wars – What Happens When You Get A Splinter In Your Toe?


For the last two years or so, I’ve been crunching some numbers on the genetics of T cells together with colleagues from the Dept. of Bacteriology and Immunology at the Haartman Institute, Helsinki. It has turned out that with the help of high-throughput sequencing and the resulting massive amounts of data, immunology is an enormous unexplored playground for complex-systems scientists; I’ve had plenty of fun and we’re currently writing up the results of this first stretch. There are all sorts of marvels out there, and yes, there be power laws too.  I’ll write a series of posts on the topic. To whet your appetite, here’s a small story:

Ever wondered what happens when you get a splinter in your toe?

Here’s a summary. The splinter breaches your first line of defence – your skin – and intruders follow. Once they are in, your so-called innate immune system responds. This response has to be swift; bacteria multiply quickly. Many things happen: the chemicals of the so-called complement system start drilling holes into bacterial cell walls. Macrophages, big eater cells, devour any invaders they meet. They become increasingly vicious and release cytokines, chemicals that call other types of cells to arms. Cytokines also increase the permeability of your blood vessels: that’s why your toe will swell. Now neutrophils that circulate in your blood will exit and follow the scent of battle. Once at the front line, they release a cargo of toxic chemicals, killing invaders. When done, they die and become pus.

The battle rages on. New kinds of soldiers become involved. One class – dendrite cells – picks up some battle debris and quietly exits the front lines. Now things will escalate. Dendrite cells travel to the lymph nodes, where an enormous repertoire of T and B cells awaits. Each has a different type of receptor on their surface, waiting to be triggered. Dendrite cells keep on displaying their cargo – bits of dead bacteria – until a matching receptor is found. When this happens, the cell hosting the receptor begins to proliferate, producing a massive army of clones.

Next comes the decisive strike. The clone army enters battle, armed with homing devices targeted at the specific type of invader. B cells begin to sprout and release receptors, producing enormous amounts of antibodies that find the invaders, coat their surfaces, and mark them for destruction by macrophages and other killers. T cells enter the front line, directing the battle, releasing more cytokines, and making sure that all bactericidal cells are fully engaged in battle. All weapons of the immune system are now deployed: the invader is being hit from all directions.

With all likelihood, the invader will now yield. It cannot hold against the combined response of the innate and the adaptive system. The battle winds down. T-cells command the foot soldiers to disengage, macrophages clear the battlefield of wreckage, blood vessels no longer leak fluid. Neutrophils stop pouring out of the blood stream; they move on and look for signs of new trouble elsewhere. Pain, redness, and swelling will cease.

If the invader ever returns, it is dealt with swiftly, without you even noticing. This is because your body remembers the invader: some of the B and T cells that saw battle have become memory cells that can quickly mount an overwhelming defence.

But with many bacteria and viruses, evolution runs fast. Next time you meet them, they might have changed already; your body has won one battle but will be at war forever.

Coming up next: how your immune system does gradient-descent Monte Carlo with zillions of threads in parallel, starting from a massive repertoire of initial conditions.

(In the meantime, if you want a longer, detailed version of the above story, see “How the Immune System Works” by Lauren M. Sompayrac; it’s a textbook that even those of us who don’t have much biomedical background can follow).



Functional brain networks: the problem of node definition

consistency[Figures from our paper in Network Neuroscience]

The human brain is a complex network of neurons. The problem is that there are about 10^12 of them with ~10^5 outgoing connections each; mapping out a network of this scale is not possible. Therefore, one needs to zoom out and look at the coarse-grained picture. This coarse-grained picture can be anatomical – a map of the large-scale wiring diagram between parts of the brain – or functional, indicating which parts of the brain tend to become active together under a given task.

But how should this coarse-graining be done in practice? What should the nodes of a brain network represent? In functional magnetic resonance imaging (fMRI), the highest level of detail is determined by the imaging technology. In an fMRI experiment, subjects are put inside a scanner that measures the dynamics of blood oxygenation in a 3D representation of the brain, divided into around 10,000 volume elements (voxels). Blood oxygenation is thought to correlate with the level of neural activity in the area. As each voxel contains about 5.5 million neurons, the network of voxels is significantly smaller than the network of neurons. However, it is still too large for many analysis tasks, and further coarse-graining is needed.

A typical way in the fMRI community is to group voxels into larger brain regions that are for historical reasons known as Regions of Interest (ROIs). This can be done in many ways, and there are many pre-defined maps (“brain atlases”) that define ROIs; these maps are based on anatomy, histology, or data-driven methods. It is common to use ROIs as the nodes of a functional network. The first step in constructing the network is to assign to each ROI a time series that is the average of the time series of its voxels measured in the imaging experiment. Then, to get the links, similarities between the ROI time series are calculated, usually with the Pearson correlation coefficient. The correlation between two ROIs becomes their link weight. Often, only the strongest correlations are retained, and weak links are pruned from the network.

If the ROI approach is to work, the ROIs should be functionally homogeneous: their underlying voxels should behave approximately similarly. Otherwise, it is not clear what the network represents. Because this assumption hasn’t really been tested properly and because it is fundamentally important, we recently set out to explore whether it really holds.

We used resting-state data – data recorded with subjects who are just resting in the scanner, instructed to do nothing – to construct functional ROI-level networks based on some available atlases. We defined a measure of ROI consistency that has a value of one if all the voxels that make up the ROI have identical time series (making the ROI functionally homogeneous, which is good), and a value of zero if the voxels do not correlate at all (making that ROI a bad idea, in general).

We found that consistency varied broadly between ROIs. While a few ROIs were quite consistent (values around 0.6), many were not (values around 0.2).  There were many low-consistency ROIs in three commonly used brain atlases.

From the viewpoint of network analysis, the existence of many low-consistency ROIs is a bit alarming.  We also observed strong links between low-consistency ROIs – how should this be interpreted? These links may be an artefact, as they disappear if we look at the voxel-level signals. This means that the source of the problem is probably the averaging of voxel signals into ROI time series. While this averaging can reduce noise, it can also remove the signal: at one extreme, if one subpopulation of voxels goes up while another goes down, the average signal is flat. More generally, if a ROI consists of many functionally different subareas, their average signal is not necessarily representative of anything.

In conclusion, we would recommend to be careful with functional brain networks constructed using ROIs; at least, it would be good to go back to the voxel-level data to verify that the obtained results are indeed meaningful.

For details, see our recent paper in Network Neuroscience.

This post was co-written by Onerva Korhonen, Enrico Glerean & Jari Saramäki.

[PS: The definition of nodes is not the only complicated issue in  the study of functional brain networks. Even before one has to worry about node selection, a possible distortion has already taken place: preprocessing of the measurement data. We’ll continue this story soon.]

How to write a great abstract

[This post continues my “self-help” series on writing for PhD students; the first post is here]

The first thing that you should write when starting a new paper is its abstract. I also recommend spending a lot of time on it. This will pay off later.

Writing the abstract first may seem unconventional, but it makes sense. This is because the abstract is the storyline of the paper in miniature form. It determines the rest. Once you have composed your abstract, you have decided on your story, and the paper is much easier to write.

So what’s a great abstract like? One common mistake is to view the abstract as an information container, whose only aim is to let the reader know what the author has done. An abstract written in this way reads like “we did X and the result was Y. Then we did Z and …” It becomes a boring list of results. Two important things are missing: context and excitement!

Think of a Hollywood movie. It begins with the setup phase, where the setting and the characters are introduced. You can only follow the story if you understand the setting and know the characters (context), and you will only care about the story if you care about the characters (excitement). The same applies to any research paper and its abstract: the reader must understand the context and care enough about the problem to read on and to find out how the problem was solved.

After the setup, a typical film script continues to the confrontation phase. There is trouble; there an issue that the characters have to solve. The resolution of the confrontation marks the high point of excitement in the story. After outworking the story, some brief epilogue may follow, providing closure.

Great papers and their abstracts follow a similar arc: from setup to confrontation and from resolution to closure. The storyline can be seen as hourglass-shaped: presenting the broad setting, introducing a more narrow problem and its solution, and returning to the broader picture again. It is not a coincidence that this is exactly how every Nature abstract reads.


The script that every Nature abstract has to follow, sentence by sentence, begins with a few sentences on general context and the broader topic. Then, the abstract narrows down to more specific context (again a few sentences), before funnelling to its narrowest point: the exact research question addressed by the paper. This has to be followed by the solution of the problem: the key result. Then the abstract broadens again, first addressing the implications of the result to the paper’s field of science, and then discussing the impact beyond that particular field. Setup, confrontation, resolution, closure.

Even if you are not writing a Nature paper (and you probably aren’t), the above is still a great recipe for a successful abstract, and my suggestion is to always follow its spirit.

Of course, depending on your field and the chosen journal, the breadth of the top and the bottom of the hourglass may need to be adjusted. Instead of a context where your result contributes to solving mankind’s most pressing problems, your playing field may be just your particular field of science, or its subfield. For a specialist journal, you don’t need to begin your abstract with a sentence on the importance of your field – the readers already know it. Nevertheless, it pays off to consider the broadest context you can honestly think of. Don’t exaggerate, but try to take a broader perspective. Why is your research question important –– why does it matter? The answer to this question is your context; it should directly translate to the first and last sentences of your abstract.

Next: The Importance of Focus