The scientific style virus - The scientific writing style - The reading toolkit

Scientific writing 3.0: A reader and writer's guide - Jean-Luc Lebrun, Justin Lebrun 2021

The scientific style virus
The scientific writing style
The reading toolkit

Take a look at the following sentence:

To get a ballpark figure, the research team, aided by their collaborators at the German Health Policy Institute which they partnered with the year before, adopted the maximized survey-derived daily intake method for the evaluation of the per capita intake of the monosaccharides.

If reading that sentence made you feel a little ill at ease, congratulations, you are a healthy reader. If however you thought to yourself, “my, what a wonderfully written sentence!” I’m afraid I have bad news for you. The sentence above is plagued with errors common to the scientific writing style: word haze, sentence meanders, deactivated verbs, and word spikes! If you felt comfortable in their presence, it is because you have been infected by the scientific style virus. Your writing may be plagued by its symptoms, and your prognosis is poor. But don’t despair! The disease has not yet progressed to its terminal stages, and it’s nothing a little diagnosis and writing therapy can’t fix.

Sentence Meanders

To get a ballpark figure, the research team, aided by their collaborators at the German Health Policy Institute which they partnered with the year before, adopted the […]

Let’s diagnose the first symptom of the scientific style virus: a meander in a sentence. Here are three versions of the same sentence, one of which is less readable than the others. Which one and why? For now, don’t focus too much on analyzing the sentence, just go with your gut feeling.

Because he feared being caught with a fake license that had been purchased in an online marketplace, the drunk driver fled the scene of the accident.

The drunk driver, who feared being caught with a fake license that had been purchased in an online marketplace, fled the scene of the accident.

The drunk driver fled the scene of the accident because he feared being caught with a fake license that had been purchased in an online marketplace.

The least readable sentence is the second one. Why? Let’s visualize all three using the earlier example of standing and kneeling men to represent main clauses and subclauses.

Through decades of reading, readers have been trained to expect that a subject is soon followed by a verb. When the subject and verb are separated by too great a distance, the reader needs to lock up precious memory resources to keep the subject in mind until a verb is found. While the reader hunts for the verb, anything else read is deprioritized.

Image

Free up this unnecessary allocation of memory by keeping subject and verb together.

Word Spikes

Ah, Valentine’s Day. Love is in the air, chocolates are in ribbonwrapped boxes, and roses abound in cellophane bouquet sleeves. But these roses aren’t like those found in nature — an essential element has been removed from them: the thorns. Rose thorns may not be conducive to Valentine’s Day sales, but they certainly are conducive to a rose’s survival in the wild. Just as their bright colors and sweet scent attract humans to them, roses also attract animals and insects such as caterpillars which would do it harm7. Each thorn serves as a defense mechanism for the rose. To caterpillars, a thorn is fatal if the poor insect impales its soft underbelly on it, stopping its upwards progress towards the petals. Just like rose stems, your writing can also contain thorns — word spikes that impale the minds of readers in their arduous uphill journey to the end of a sentence. Fortunately for writers, the average reader is not as fragile as a caterpillar, and can withstand a spike or two. But as the brain gets stopped again and again by word spikes, it loses much momentum and clarity, and finds itself wondering “what did I just read?” Let’s analyze these words spikes one at a time.

Colloquialisms

To get a ballpark figure, the research team […]

Americans know what a ballpark figure is. But if you’re a researcher from another country, the odds of knowing the expression drop dramatically. You know what a figure is, but what is a ballpark figure? Try as you might, you can’t make sense of the expression. Is it an image of a ballpark? You head to google image search to look up “ballpark”, and the screen fills up with photos of stadiums, raising more questions! Actually, in this context, a ballpark figure means an approximate figure.

In today’s world of internationally-shared research, colloquialisms are alienating. You no longer write for readers from your country. Scientific writing needs to be accessible to all8.

Long compound nouns

To get a ballpark figure, the research team [...] adopted the maximized survey-derived daily intake method for the evaluation of the per capita intake of the monosaccharides.

We’ve seen that compound nouns can be useful for removing excessive prepositions, as in turning the giver of gifts into the gift giver. This removal seems to help clarify the sentence, but could one go too far? A history of complications in relationships which are romantic could be shortened to a romantic relationship complication history, but such a dense phrase is difficult to digest. It would be better to keep at least one preposition: A history of romantic relationship complications. In the example above, “the maximized survey-derived daily intake method” could be decompressed into a full, long sentence: The method which used the maximized values of survey-derived answers about daily intakes [of monosaccharides]. Compressing the 13 words of the previous sentence into the 5 words of the compound noun is impressive in terms of being concise, but markedly unimpressive from the perspective of clarity. If ever conciseness and clarity clash, prioritize clarity. Better for you to be understood at the end than have your reader confused from the start!

To understand why long compound nouns are complicated to understand, I will quote an excerpt from another of our books on writing, Think Reader9, which delves into this topic in great depth:

“Compound nouns often seen together like dinner plate or swimming pool are unambiguous and easy to understand. Some have even become single nouns like firewall or toothpaste. When compound nouns have more than two nouns, ambiguities arise. “Bomb threat” and “threat detection” combine to form bomb threat detection. But where do you put the invisible brackets? Between [bomb] and [threat detection], or between [bomb threat] and [detection]? To clarify the phrase, add the preposition of: the detection of bomb threats, not the threat detection of bombs. For a reader, left-to-right bracketing that follows natural reading order is the easiest to unfold.

[Bomb]

[Bomb Threat]

[Bomb Threat] [Detection]

[Bomb Threat Detection] [Squad]

However, the unfolding can be more intricate as in the “Turin football club,” where the brackets are between [Turin] and [football club]: the football club of Turin, not the club of Turin football. The next compound noun, like complex origami figures, has to be unfolded and refolded differently, which takes more brain processing power.

[Australian]

[Australian football] — the football played in Australia.

[Australian football fan] — The reader is not sure whether to unfold as [fan] of [Australian football] or a [football fan] from [Australia]. I chose the [Australian] [football fan].

[Australian football fan club] — The appearance of “club” creates a doubt.

Do I unfold and refold into the [fan club] of [Australian football]? After all, there might be such a thing as “Australian football” with rules differing from those of regular football. Or do I keep my original unfolding and go with [Australian] [football fan club]? It is clearly ambiguous. A quick look online confirmed that Australian football is indeed different. But what if your reader does not spend the time to search online and continues reading? With insufficient background to accurately bracket the compound noun, the chance of misunderstanding is high (50%). As a writer, you cannot afford to gamble on reader understanding.”

Latin or Greek

[...] for the evaluation of the per capita intake of the monosaccharides.

Like colloquialisms, Latin or Greek words can cause readers unfamiliar with the vocabulary to stumble. Why use per capita instead of the english equivalent, per person? Per capita isn’t more concise, or more precise. It means the same thing. Unlike colloquialisms which demand that the reader share a cultural point of reference with the author, the unnecessary use of Latin terminology demands that the reader shares knowledge of a dead language.

I am not denigrating the use of all Latin. Its use in botany, for example, is encouraged. Because a single plant may have many names in different regions (such as a popular vegetable dish called kangkong, a.k.a water morning glory, water spinach, river spinach, water convolvulus, ong-choy, or swamp cabbage), it can be more precise to identify it by its latin name, Ipomoea aquatica. But replacing per person with per capita serves no clear purpose.

Jargon

[...] intake of the monosaccharides.

Jargon and the scientific writing style go hand-in-hand. Not only is their relationship natural, it is also constantly evolving. As science deals with discovery, researchers need to give names to things, concepts, or elements that were previously indescribable. How does one describe something never before described? How does one choose how to name the unnamed? Some discoveries take on the name of their discoverers, such as the Dunning-Kruger effect or Moore’s law. Scientists may also choose to turn towards Latin and Ancient Greek for inspiration: the material of human nails and rhinoceros horns is identical: keratin, from the Greek word keras (horn).

Unfortunately, neither one of these naming schemes is helpful to the reader. For the first scheme to be effective, the reader would have to recognize the name of the researcher and be familiar with their body of work. In a very small or niche field, this could be sustainable… at least at first. But as the field and number of published authors grows, no one could be expected to keep up. The second scheme, Latin or Greek-based naming, is just as ineffective. Were we all fluent in Latin and Ancient Greek, our knowledge would have allowed us to imply the meaning of new scientific terms. But few of us are fluent in dead languages.

With no logical or intuitive way of understanding jargon, readers need to rely on their knowledge and contextual reasoning to understand a passage with jargon in it. Imagine you encounter the following sentence in a text:

The internet is a great resource for researchers and ailurophiles.

You may be unfamiliar with the word ailurophile, so you very rapidly attempt to use logic to deduce a meaning to the word. You may recognize that the word ends with “phile”, a fairly common latin-based suffix that indicates attraction to something, such as in audiophile or cinephile. But unlike audio- and cine-, what ailuro- represents as a prefix draws a blank. Perhaps looking at the rest of the sentence will shed some light? Researchers and ailurophiles — default behaviour would suggest that somehow researchers and ailurophiles are logically connected. Is an ailurophile a subtype of researcher who is interested in the topic represented by the (Greek) prefix ailuro-? Finally giving up (the previous thought process only took a couple of seconds), you turn to the dictionary and find out that an ailurophile is simply a person who loves cats, something that would have been impossible to guess otherwise.

Every time you write with jargon, you add locks to the sentence. Only readers with the right keys, prior knowledge, can access the sentence’s full contents. The more jargon you use, the more locks you add, and the higher the likelihood that someone gets locked out.

The jargon of a field should be considered a different language. Casual English is not the same as business English, legal English, medical English or engineering English. Each field brings with it an entire vocabulary of jargon. And much like learning a language, gaining fluency in Scientific English is a task that takes a long time to master. Should your readers learn a new language — or should you write in theirs? The question has no easy answer.

We are not advocating to ban the use of jargon — simply recommending that you tailor it to the level of the reader. How do you determine that level? Which publication medium are you targeting? Is it an internal report read only by people in your research team who are intimately familiar with the jargon? Is it a niche journal read only by field experts also familiar with your jargon? Or are you targeting a broader journal where jargon will lock out potential readers and prevent them from using your work or citing you? Finally, if you are aiming for the top tier journals with the widest readership such as Science, realize that since their readers come from diverse fields, the use of jargon needs to be minimized, or at the very least, explained as it is introduced.

Acronyms as Jargon

Acronyms are an interesting subcategory of jargon.

The man’s BP rose dramatically when his ex-wife entered the room.

Did the previous sentence make sense to you? It would, if you knew that BP stands for blood pressure. While any reader would have understood the non-abridged version of the sentence, it is likely that only those in the medical field (for whom BP is a common abbreviation) had full comprehension. Inversely, I work for International Business Machines probably doesn’t say much to you, but you would instantly understand if I instead told you I work for IBM.

Acronyms are useful shortcuts to condense many words into a compact form. For example, using CRISPR instead of clustered regularly interspaced short palindromic repeats makes sense. But using BP instead of blood pressure to save one word and introduce a potential lock makes little sense. As the number of acronyms in a text grows, the difficulty in reading grows not additively but multiplicatively. Acronyms create conciseness, but at a high memory and knowledge cost. Use them only if absolutely necessary.