Why can writing a paper be such a pain?

This is the first in a series of “self-help” posts for PhD students on how to write a scientific paper.

Writing a scientific paper

Show me a researcher who has never struggled with writing, and I’ll show you someone who hasn’t written anything, or who doesn’t care about the quality of the output. Science is hard, and so is writing. Together they are harder. Now add in lack of experience as a researcher and as a writer, together with the usual time pressure, and it’s no wonder that the blank document in front of you looks like the north face of Mount Everest. We’ve all been there, staring at that wall.

While no mountaineer would risk climbing Everest without a route plan, an inexperienced writer tends to neglect the importance of planning. Having no plan, she tries to do everything at once. She opens the blank document in her editor, stares at it, tries to decide what to make of her results, what the first sentence of the first paragraph should be, what the point of the first paragraph should be, and what the point of the whole paper should be.

It’s no wonder that this feels impossible. No-one can solve that many problems in parallel. Problems are best solved one at a time.

Writing becomes easier if one separates the process of thinking from the process of writing. To write clearly is to think clearly, and thinking precedes writing. Writing becomes a lot less of a struggle when you think through the right things in the right order, before putting down a single word.  A successful software project begins with the big picture: what functions and classes are needed, and for what purpose. It doesn’t begin with developing code for the internal bits of these functions and classes. A writing project should also begin with addressing the overall point and structure of the paper, before moving to details such as words or sentences.

Another way of looking at the problem is linearity versus modularity. The fear of the blank page arises out of linearity: the feeling that the only way to fill the page is to start with the first word and proceed towards the last, word by word. This is not so. Whereas reading is usually linear, writing doesn’t have to be. The process of writing should be modular – first, sculpt your raw materials into rough blocks that form your text, and then start working on the blocks, filling in more and more details, so that entire sentences only begin appearing towards the end of this process.

The approach I try to teach my students is splitting the writing process into a series of hierarchical tasks. This way, getting from a pile of results to a polished research paper is a bit less painful.

This approach begins by identifying the key point of the paper and then moving on to structuring the material that supports this point into a storyline. This storyline is then condensed into the abstract of the paper. My advice is to always write the abstract first, not last! This serves as an acid test: if you cannot do it, you haven’t developed your storyline enough.

After that, there are many steps to be taken before writing any more complete sentences: planning the order of presentation, including figures, and for each section of the paper, mapping the arc of the storyline into paragraphs so that the point addressed by each of the paragraphs is decided in advance. Then, the paragraph contents are expanded into rough sketches, and these sketches are finally transformed into whole sentences. At this point, there is no fear of the blank page, because there are no blank pages: for each section, for each paragraph, there is a map, a route plan, and the only decision that is needed is how to best transform that plan into series of words. Often, this feels almost effortless.

[Next in the series on how to write a scientific paper: how to write a great abstract]

There is now an ebook based on this series, available from a number of stores (Kindle Store, Apple Books, Kobo, Tolino, etc!)

Mitä matkapuheluidemme ajoitukset kertovat meistä

[This post is in Finnish in case you are wondering; the original English-language version can be found here. The rest of the posts in this blog are in English.]

Tämä postaus on tarkoitettu taustamateriaaliksi tiedetoimittajille, liittyen Akatemian tiedeaamiaiseen 27.4. Mutta sinun ei toki tarvitse olla toimittaja lukeaksesi eteenpäin!

Tutkimusryhmäni on tutkinut matkapuhelindataa yli vuosikymmenen ajan. Se, miksi tutkimustamme kutsutaan, on muuttunut tällä välin verkostoanalyysistä datatieteeksi ja laskennalliseksi ihmistieteeksi. Miksi tahansa sitä kutsutaankin, tutkimuksessamme tarkastellaan ihmisten käyttäytymistä laskennallisin keinoin, ja aineistot voivat sisältää jopa miljoonia henkilöitä!

Käytämme automaattisesti kerättyä, anonymisoitua, aikaleimattua dataa, joka on peräisin teleoperaattoreiden laskutusjärjestelmistä. Tämän lisäksi tutkimme dataa joka on kerätty vapaaehtoisilta koehenkilöiltä esimerkiksi älypuhelinapplikaatioilla. Matkapuhelintietojen (kuka soitti kenelle ja milloin) avulla voimme rekonstruoida sosiaalisten verkostojen kytköksiä ja tarkastella myös puheluiden aikasarjoja. Nämä aikasarjat ovat osoittautuneet erittäin mielenkiintoisiksi!

Tarkastellaan ensin hyvin lyhyitä aikaskaaloja, sekunneista minuutteihin. Jos katsomme yksittäisen henkilön puheluita, ja piirrämme aikajanalle viivan aina kun henkilö puhuu puhelimessa, saamme tällaisen kuvan:

Puheluiden purskeisuus

Tämä aikasarja on purskeinen – se on satunnainen mutta ei tasaisen satunnainen! Se sisältää hyvin lyhyillä aikavälillä tapahtuvien puheluiden purskeita (kymmenistä sekunneista pariin minuuttiin), ja pidempiä taukoja näiden purskeiden välillä. Ihmisten viestintä ja muukin toiminta on usein purskeista – eikä kukaan oikeastaan tiedä, miksi. Muuten, hermosolujen laukomisen aikasarjat näyttävät varsin samanlaisilta! Ehkä me kaikki olemme vain hermosoluja koko maailman kattavassa sosiaalisessa verkostossa… no, jätetään tämä tieteiskirjailijoille.

Mennäänpä kohti pidempiä ajanjaksoja, tunteja ja päiviä. Sieltä löydämme vuorokausirytmit, jotka ymmärretään huomattavasti paremmin. Meidän päivittäinen toimintamme seuraa päivän ja yön vaihtelua 24 tunnin jaksoissa. Poimitaanpa pari henkilöä datasta ja katsotaan, paljonko he soittavat puheluita kuhunkin kellonaikaan:

Puheluiden vuorokausirytmejä
Tästä nähdään että vaikka ihmiset yleensä nukkuvat yöllä ja valvovat päivällä, vuorokausirytmeissä on silti selkeitä eroja, mikä näkyy myös puheluiden määrässä. On aamuvirkkuja, jotka soittavat puheluita jo toisten nukkuessa, ja iltaihmisiä jotka soittelevat myöhään illalla (varmaankin toisille iltaihmisille). Me olemme kaikki erilaisia!

Vuorokausirytmeihin liittyy muutakin kuin puhelumäärien vaihtelu: esimerkiksi iltaisin puhelut kohdistuvat usein harvoille (ja läheisille) ystäville, ja päivällä ne ovat satunnaisempia.

Siirrytäänpä sitten kohti vielä pidempiä ajanjaksoja – kuukausia ja vuosia. Nyt yksittäisten puheluiden tarkoilla ajoituksilla ei ole enää väliä. Lasketaan siis koehenkilöllemme, montako soittoa hän tekee kullekin ystävistään (ja sukulaisistaan), ja katsotaan miten tämä kuvio muuttuu ajassa! Saadaan tämäntapainen kuvio:

Egosentrinen verkosto

Tämä jakauma kertoo mikä osuus henkilön puheluista suunnataan tämän eniten puheluita saavalle ystävälle, mikä toiseksi eniten, jne. Eli se vastaa kysymykseen kuinka suosittu suosituin ystävä on, ja kuinka tasa-arvoisesti me ystäviämme kohtelemme (yleensä varsin epätasa-arvoisesti, kolme suosituinta voi saada yli puolet puheluista!) Tämä heijastaa tapaa, jolla rakennamme sosiaalisen maailmamme: meillä on vain muutama hyvin läheinen ystävä ja paljon ystäviä jotka eivät kuulu tähän rajattuun sisäpiiriin. Suurin osa siteistämme on heikkoja, ja ne muutamat vahvat siteet ovat hyvin merkityksellisiä.

Tällaiset puheluiden jakaumat ovat hieman erilaisia kaikille, ja ne ovat osoittautuneet hyvin pysyviksi silloikin, kun verkostossa on suurta vaihtuvuutta. Jos tapanasi on keskittyä 1-2 läheiseen ystävään, tulet tekemään näin silloinkin, jos nämä ystävät korvautuvat joillakin muilla vaikkapa paikkakunnalta muuton takia. Vastaavasti jos jaat aikasi tasan ystäviesi kesken, teet varmaan näin jatkossakin.

Puhelujakaumilla sekä verkoston vaihtuvuudella on yhteys luonteenpiirteisiin; jos tämä kiinnostaa, kollegani Simone Centellegher on kirjoittanut blogipostauksen aihepiiristä äsken julkaistun artikkelimme pohjalta.

Onko tästä kaikesta tiedosta sitten muutakin hyötyä kuin että se on mielenkiintoista? Todennäköisesti. Käyttäjästä kerättyyn dataan perustuvat hyvinvointisovellukset ovat yksi mahdollisuus, kunhan niiden toiminta varmennetaan tieteellisesti. Tutkimusryhmälläni onkin käynnissä Helsingin yliopiston Psykiatrian osaston kanssa pilottihanke, jossa pyritään löytämään mielialapotilaiden hyvinvointia ennustavia tekijöitä sovellusten keräämästä datavirrasta.

Lopuksi vielä linkkejä alkuperäisiin tieteellisiin julkaisuihin:

  • Small But Slow World [Phys. Rev. E | arXiv] (2011)
  • Daily Rhythms in Mobile Telephone Communication [PLoS One] (2015)
  • Persistence of Social Signatures in Human Communication [PNAS | arXiv] (2014)
  • Personality Traits and Ego-Network Dynamics [PLoS One] (2017)
  • Effects of time window size and placement on the structure of an aggregated communication network [EPJ Data Science] (2012)
  • From Seconds to Months: the Multi-scale Dynamics of Mobile Telephone Calls [EPJB | arXiv] (2015)

Ant supercolonies: networks of nests

An ant (F. Aquilonia)

Ant colonies are complex systems par excellence. It’s almost as if the colony is the organism, not the ant. Ants follow simple behavioural patterns, depositing pheromones as they go and following trails of scent laid down by others. Because of their collective actions, the colony seems to have a life of its own, sprouting its foraging trails towards food sources much like a slime mold grows its branches along the shortest path to food. The colony appears to have its own reproductive cycle too: queens and males mate during the nuptial flight, and the impregnated queens then land to give birth to new colonies, like fertilized eggs. Ordinary workers play no role in reproduction; they are outside the germline.

But some species of ants behave in ways that are even more complex: they form supercolonies, networks of interconnected nests with hundreds of reproductive queens. In these supercolonies, queens and workers move freely between nests without eliciting aggression; they cooperate across nest boundaries. Ant supercolonies are the largest cooperative units known in nature: for some ants, they can extend for hundreds of kilometres.  They are also among the strangest: their existence is difficult to explain from the point of view of gene-centric evolutionary theory. This has to do with altruism: relatedness among nestmates can be low, and workers will end up helping unrelated individuals that carry a different set of genes. It may even be that ant supercolonies represent an evolutionary dead end.

Recently, I had a chance to have some fun with the genetics of ant supercolonies. My colleagues Eva Schultner and Heikki Helanterä who work on ants had collected a number of samples from tens of nests of F. Aquilonia in southern Finland. As Eva and Heikki wanted to understand the genetic structure of F. Aquilonia supercolonies, the sampled ants were genotyped for estimating genetic similarities between the nests (for technical details, scroll down). From a network-science point of view, the nests and their similarities span a weighted spatial network: nests are nodes and pairwise genetic similarities are mapped to link weights. The resulting similarity network looks like this:

2016_MY_LA_new

There are two supercolonies, one to the NE and one to the SW – the link weights inside the colonies are higher than between them, much like you would have for two communities in a social network. A closer look inside these two supercolonies (with methods more advanced than bare-bones network thresholding) revealed that there is a faint hint of substructure, of subclusters inside supercolonies. And because queens, workers, and pupae were genotyped separately and sampled at two time points, we could see that the genetic relationships between nests are not the same in terms of queens as they are in terms of workers, and not the same in spring as they are in summer when workers have started migrating.

This means that there may an extra layer of complexity in the genetics of ant supercolonies – fine structure in time and space, and in terms of class.

This work was published in Molecular Ecology last year. If you are interested in toying around with ant genetics, the data are available on Datadryad and my Python scripts can be found here: github.com/jsaramak/ants.

[Technical details: the ants were sequenced at 8 polymorphic microsatellite loci; microsatellites are nonsensical bits of DNA where a random sequence is repeated 5-50 times. They do not do anything and there is no selection pressure, and therefore microsatellite alleles are great for just seeing how close or far two populations are genetically. There are various measures for quantifying this: the simplest would be to see how often the same alleles appear in populations. In social-insect studies, the typical measure is the so-called relatedness (Queller & Goodnight 1989) and we used it in this work.]

A Neuroscience Conference On Twitter

Brain Twitter Conference ad

My colleagues at the Department of Neuroscience and Biomedical Engineering at Aalto University are organizing the Brain Twitter Conference. It takes place on Twitter on the 20th of April, with an impressive list of speakers. Talks and keynotes will be delivered under the #brainTC hashtag.

While the idea of a Twitter conference may sound like a gimmick, it should be taken seriously – not as a substitute but as something new. There are no coffee breaks or conference dinners for socializing, but anyone can attend for free. And, even better, because the tweets will remain available, a kind of time travel becomes possible – one can revisit any talk, any discussion, and any debate at will. The conference becomes frozen in time.

Networks are everywhere, and they are beautiful

This is another popular-science post, for anyone out there who wants to see the light of network science! It could be considered a network sermon of sorts. This is also the way how I begin my course on complex networks and most of my pop science talks.

networks from genetic regulation to the Internet

So, why are networks so fundamentally important? Because they exist on all levels of the living universe.

Let’s begin by having a look at what’s inside our cells. First, there is some (messy) software written into double-stranded DNA, where the functional subunits are called genes. They like to talk to other genes, upregulating or downregulating their activity. This network of genetic regulation determines what happens inside our cells. In particular, it determines which proteins are to be built. Mirroring the genes that code for them, the proteins also like to interact, again forming a network (which is coupled to the network of genes…). To make all this happen, some fuel and some building blocks are needed, and this is taken care of by a network of chemical reactions: the metabolic network that is responsible for the logistics of energy and matter.

Our cells are full of networks. We are full of networks.

As it often happens in nature, similar kinds of structures emerge on multiple levels. If we now zoom out from inside the cells and look around, we again see networks. This time, the networks are those of cells talking to one another and influencing each other’s actions, the immune system being a most beautiful example (I’ll return to it in a later post). But there are no cells that are more fond of networking than neurons, the nerve cells. I am typing this paragraph because of spike trains transmitted by each of my ten billion neurons to about ten thousand other recipient neurons (this much can be said, but no-one knows how these spike trains actually encode the words that I am typing). These neurons are the fundamental building blocks of my brain, a network of enormous complexity (and by that I don’t mean my brain but any human brain).

Let us continue zooming out to get a broader view. Just like the neurons inside them, brains also like to network! Practically speaking, we do not even exist in isolation. Our brains have evolved into social supercomputers: most of the concepts inside our heads exist because larger networks of brains have agreed that they are meaningful, and developed a language to describe them. And boy have we come a long way from those times when these larger networks were of the size of a tribe (of about 150 members, it has been claimed). Now our connections transcend space and time through social media and the Internet, and we are all part of a gargatuan social network that spans the entire planet.  What is happening right here, right now, is a long-distance connection between two brains. One brain talks to another, mine to yours, across time and distance. Hi, brain!

But these networks of brains do not only exist for talking. We humans like to build things: systems of trade, structures of power, grids that transmit energy, webs that transmit information, organizations that exist for making things that didn’t exist before. We dig up raw materials and transform them into parts that are brought elsewhere and merged with other parts to make larger parts, over and over again, until this complex weave of logistics and manufactoring spits out cars and cloud servers. We connect cities with ships and trains and airlines; we connect minds with phones and computers.

We really, really like to build networks. But we cannot do it alone, so we connect with others. We form networks that build networks. The same thing that happens on all levels. From genes to cells, from brains to people. Networks building networks.

This is why networks are cool, essential, and beautiful.

(I’ll stop here to end on a high note. I’ll talk about applications and other earthly things in later posts).

Mobile phone calls in time

This is a popular-science post; I am giving a talk to science journalists at the Academy of Finland on 27.4., and this post provides background material if anyone wants to write a story. But you don’t have to be a science journalist to enjoy this, so please read on!

My research group has worked on mobile phone data for over a decade. How our research is called has evolved over this time span, from social network analysis to data science and computational social science. Within these fields, mobile phone data analysis has emerged as its own subfield. Whatever you choose to call what we do, we quantify the behavioural patterns of up to millions of individuals.

We use auto-recorded, anonymized, time-stamped call detail records, provided by teleoperators for purposes of science, or collected by other means, like smart-phone apps. These records (who called whom and when) allow us to reconstruct social networks and also to look at patterns in the times of calls. It turns out that a ton of useful information is contained in such patterns!

Let us first look at very short time scales – seconds or minutes. If we take an (anonymous) person, or a pair of persons, and for each call, draw a line on an axis that represents the flow of time, it will look like this:

Burstiness of phone calls

The pattern above is bursty – it is random but not uniformly so! Rather, there are bursts of mobile phone calls within very short times, and longer gaps between these bursts. It turns out that human activity patterns are very often bursty – and I think it’s safe to say that no-one really knows why. Interestingly, this is also how the firing patterns of nerve cells look like! So maybe we are just neurons in the great, self-aware social network that spans the entire planet… Well, let’s leave that idea for writers of science fiction!

Let us move to somewhat longer timescales (hours, days) from the rapid bursts. What we encounter next is something that is far better understood – and very naturally so: circadian rhythms, our daily patterns of activity that follow a 24-hour cycle. If we pick a few people, and count how many calls they make at each hour of day (averaging over longer periods), we see something like this:

Chronotypes and call rhythms

So even though people mostly sleep at night and are awake during the day, their rhythms are different, and this is clearly reflected in their calling behaviour too! There are individuals who are early birds, already making calls while others still sleep, and there are individuals who talk a lot at night with others like them. We are all different!

However, there’s much more to this story. Our rhythms differ not only in call frequencies: who we call also depends on the time of day, with evenings being often reserved to our closest ones. Further, it seems that our rhythms correlate with many other behavioural patterns  – but you’ll have to wait a bit to hear that story because we haven’t written it up yet, so consider this as a teaser trailer only. (Update: see this preprint)

Now move on to even longer time scales – months and years! Here, the times of single calls don’t really matter that much, so for every person, let’s sum up the number of calls to everyone they know and see how these patterns change in time! For each person (“ego”) that we look at, we’ll get something like this:

Social signature

This social signature measures what fraction of an ego’s communication is targeted at the person they call the most, 2nd most, and so on. So basically a signature measures how evenly or unevenly one’s communication is distributed – usually pretty unevenly: your top three gets up to 50% of your calls! This clearly reflects the way how we tend to shape our social networks: we keep only a handful of individuals very close to us, and have larger numbers of friends and acquaintances who do not belong to this restricted inner circle. Most of our links are weak, but the few strong ones are very important to us.

Social signatures are slightly different for everyone and that’s in fact why we decided to call them signatures. They are also very stable in time and their shapes tend to persist even when there is a lot of network turnover – if you are someone who likes to really focus on 1-2 best friends, you are likely to do that even if your old best friends are replaced by new ones because you move to another city, and if you maintain a more flat signature, you will probably do that in the future too.

The stability of one’s signature, and the rate of changes of one’s network have to do with personality. My coauthor Simone Centellegher has written an excellent blog post on this topic, so I won’t repeat our results here.

For further reading, here are some original publications (and their open-access versions if published behind paywall):

  • Small But Slow World [Phys. Rev. E | arXiv] (2011)
  • Daily Rhythms in Mobile Telephone Communication [PLoS One] (2015)
  • Persistence of Social Signatures in Human Communication [PNAS | arXiv] (2014)
  • Personality Traits and Ego-Network Dynamics [PLoS One] (2017)
  • Effects of time window size and placement on the structure of an aggregated communication network [EPJ Data Science] (2012)
  • From Seconds to Months: the Multi-scale Dynamics of Mobile Telephone Calls [EPJB | arXiv] (2015)

Great network analysis tutorial (iPython notebook)

A very short post: this Python network analysis tutorial by Vincent Traag, written as an iPython notebook and available on GitHub, is absolutely brilliant and I strongly recommend it to anyone interested in social networks and network analysis with Python. I’ll certainly use this in my teaching.

Interested in networks and network science? Click here to read more!

Hello World

Laptop

…and thank you for being here today! You might have arrived here because you are part of the community of network and complex systems researchers, or because you’ve come across my name for some other reason, or because you have just googled for interesting things and Google has magically transported you here! Well probably not, yet, since there isn’t that much content yet… Anyway, let’s get started.

I intend to post stories on science of networks and complex systems and maybe talk a bit about some of my own research. Let’s also see if I can come up with something beyond the usual “look-here-I-advertise-my-research-paper” stuff (hey I’ll nevertheless do that as well :-), like interviewing interesting people or writing speculative essays, or posting data and code!

Anyway, some keywords for future posts (also for motivation see “Google” in the above paragraphs): complex systems, complex networks, temporal networks, network neuroscience, social networks, human immune system, computational immunology (nice term, I hope no-one else has invented it before).