The need to compare like-with-like in treatment comparisons
Allocation bias results when trials fail to ensure that, apart from the treatments being compared, ‘like will be compared with like'.Key Concepts addressed:
Allocation bias results when treatment comparisons fail to ensure that, apart from the treatments being compared, ‘like will be compared with like’.
Comparing different treatments given to groups of people
Treatment comparisons usually entail comparing the experiences of groups of people who have received different treatments. If these comparisons are to be fair, the composition of the groups must be similar – so that like will be compared with like. If those who receive one treatment are more likely anyway to do well (or badly) than those receiving an alternative treatment, this allocation bias makes it impossible to be confident that outcomes reflect differential effects of the treatments, rather than the effects of nature and the passage of time.
The 18th century surgeon William Cheselden was aware of the ‘dissimilar groups’ problem when surgeons were comparing their respective mortality rates after operations to remove bladder stones. Cheselden pointed out that it was important to take account of the ages of the people treated by different surgeons. He drew attention to the fact that mortality rates varied with the patients’ ages (Cheselden 1740) – older patients were more likely than younger patients to die. This meant that, if one wished to compare the frequency of deaths in groups of patients who had undergone different types of operation, one had to take account of differences in the ages of the patients in the comparison groups.
Comparing the experiences and outcomes of patients who happened to have received different treatments in the past is still used today as a way of trying to assess the effects of treatments. The challenge is to know whether the comparison groups were sufficiently alike before receiving treatment. This is illustrated by attempts to assess the effects of hormone replacement therapy (HRT) by comparing the illness experiences of women who had used HRT with those of other women who had not used it. As subsequent analysis of fair tests of HRT showed, trying to assess the effects of treatments in retrospect in this way can sometimes be dangerously misleading (McPherson 2004).
It is rarely possible to be completely confident that comparison groups selected from people who have been given one treatment in the past are comparable in all the respects that matter with people who have more recently received an alternative treatment. This is the case even if some information about the patients who have received different treatments is available (such as their ages, or their past history of illness). Other information that may be of great importance (such as the likelihood of spontaneous recovery) may simply not be available.
A better approach is to plan the treatment comparisons before starting treatment. For example, before beginning his comparison of six treatments for scurvy on board HMS Salisbury in 1747, James Lind took care to select patients who were at a similar stage of this often fatal disease. He also ensured that they had the same basic diet and were accommodated in similar conditions. These were factors, other than treatment, that might have influenced their likelihood of recovering (Lind 1753). Comparable efforts must be made to try to ensure that treatment comparison groups are composed of similar people.
Unbiased assembly of treatment comparison groups using alternation or randomisation
Although Lind took care to ensure that the sailors in his six comparison groups were alike, he didn’t describe how he decided which sailors would receive which of the six treatments. There is only one way to ensure that treatment comparison groups are set up in such a way that they are similar in all the ways that matter, known and unknown. This is by using some form of chance process to assemble treatment comparison groups, so avoiding biased selection for different treatments before starting treatment.
One hundred years after Lind, an army doctor, Graham Balfour, illustrated how this could be done in a test to see whether belladonna prevented scarlet fever in children. In the military orphanage for which he was responsible, he used alternation – “to prevent the imputation of selection” – to decide which boys would receive and which would not receive belladonna (Balfour 1854). Alternation is one of several unbiased methods for assembling similar treatment comparison groups before giving the treatments being compared. During the first half of the 20th century, there are many examples of treatment comparison groups being assembled using alternation or rotation (for example MRC 1944), or by drawing lots (Colebrook 1929) – for example, using dice (Doull et al. 1931), coloured beads (Theobald 1937), or random sampling numbers (Bell 1941 ; MRC 1948; MRC 1950; MRC 1951). This ‘random allocation’ is the sole, but crucially important, feature of the category of fair tests referred to as ‘randomized’. A random (as distinct from haphazard) allocation means that the chances of something happening are known, but the results cannot be anticipated on any particular occasion. So for example, if a coin is used to randomize, the chance of getting heads is 50%, but it is impossible to know what the result of a particular toss will be.
Casting or drawing lots is a time-honoured way of making fair decisions (Silverman and Chalmers 2002). These methods help to ensure that comparison groups are not composed of different types of people. Known and measured factors of importance, like age, can be checked. However unmeasured factors that may influence recovery from illness, such as diet, occupation, and anxiety, can be expected to balance out on average. If you would like to see how random allocation generates similar groups of people (click here for a demonstration).
As experience of using alternation and random allocation for unbiased assembly of groups of patients for comparing different treatments became more widespread, it became clear that strict adherence to unbiased allocation schedules was required to avoid biased creation of treatment comparison groups (MRC 1934). The risk of biased allocation can be abolished if treatment allocation schedules are concealed from those making decisions about participation in treatment comparisons – in brief, to prevent them cheating, and thus biasing the comparisons (MRC 1944; MRC 1948; MRC 1950; MRC 1951). The principle of random allocation to comparison groups can be applied both to individuals and to existing groups (for example, hospital wards, or general practices. The latter referred to as cluster randomization).
Avoiding biased losses from treatment comparison groups
After taking the trouble to ensure that treatment comparison groups are assembled in ways that ensure that like will be compared with like, it is important to avoid bias being introduced as a result of selective withdrawal of patients from the comparison groups. As far as possible, group similarity should be maintained by ensuring that all the people allocated to the treatment comparison groups are followed up and included in the main analysis of the test results – a so-called ‘intention-to-treat’ analysis (Bell 1941).
Failure to do this can result in unfair tests of treatments. Take, for example, two very different ways of treating people experiencing dizzy spells because of partially blocked blood vessels supplying their brains. Treatment for this condition can be important because people experiencing dizzy spells for this reason are at increased risk of suffering a stroke, which may leave them disabled, or even kill them. One of the treatments for the dizzy spells involves taking aspirin to stop the blockage getting worse; the other involves a surgical operation to try to remove the blockage in the blood vessel.
A fair comparison of these two approaches to treating dizzy spells would involve creating two groups of people using an unbiased allocation method (like randomization), and then treating patients in one group with surgery and patients in the other group with aspirin. The comparison would thus begin by comparing two groups of patients who were alike, and go on to compare their respective frequencies of subsequent strokes. But if the frequency of strokes in the surgically treated group was only recorded among patients who had survived the immediate effects of the operation, the important fact that the operation itself can cause stroke and death would be missed. This would result in an unfair comparison of the two treatments, resulting in a biased and misleadingly optimistic picture of the effects of the operation. Like would not be being compared with like.
The principal comparison (in trials) must be based, as far as possible, on all the people assigned to receive each of the treatments compared, without exceptions, and in the groups to which they were originally assigned. If this principle is not observed, people may receive biased information about the overall effects of treatments.
The text in these essays may be copied and used for non-commercial purposes on condition that explicit acknowledgement is made to The James Lind Library (www.jameslindlibrary.org).
Balfour TG (1854). Quoted in West C. Lectures on the Diseases of Infancy and Childhood. London, Longman, Brown, Green and Longmans, p 600.
Bell JA (1941). Pertussis prophylaxis with two doses of alum-precipitated vaccine. Public Health Reports 56:1535-1546.
Cheselden W (1740). The anatomy of the human body. 5th edition. London: William Bowyer.
Colebrook D (1929). Irradiation and health. Medical Research Council Special Report Series No.131.
Doull JA, Hardy M, Clark JH, Herman NB (1931). The effect of irradiation with ultra-violet light on the frequency of attacks of upper respiratory disease (common colds). American Journal of Hygiene 13:460-77.
Lind J (1753). A treatise of the scurvy. In three parts. Containing an inquiry into the nature, causes and cure, of that disease. Together with a critical and chronological view of what has been published on the subject. Edinburgh: Printed by Sands, Murray and Cochran for A Kincaid and A Donaldson.
McPherson K (2004). Where are we now with hormone replacement therapy? BMJ 328:357-358.
Medical Research Council Therapeutic Trials Committee (1934). The serum treatment of lobar pneumonia. BMJ 1:241-245.
Medical Research Council (1944). Clinical trial of patulin in the common cold. Lancet 2:373-5.
Medical Research Council (1948). Streptomycin treatment of pulmonary tuberculosis: a Medical Research Council investigation. BMJ 2:769-782.
Medical Research Council (1950). Clinical trials of antihistaminic drugs in the prevention and treatment of the common cold. BMJ 2:425-431.
Medical Research Council (1951). The prevention of whooping-cough by vaccination. BMJ 1:1463-1471
Parry CH (1786). Experiments relative to the medical effects of Turkey Rhubarb, and of the English Rhubarbs, No. I and No. II made on patients of the Pauper Charity. Letters and Papers of the Bath Society III:407-422.
Silverman WA, Chalmers I (2002). Casting and drawing lots: a time-honoured way of dealing with uncertainty and for ensuring fairness. JLL Bulletin: Commentaries on the history of treatment evaluation (http://jameslindlibrary.org/articles/casting-and-drawing-lots-a-time-honoured-way-of-dealing-with-uncertainty-and-for-ensuring-fairness/)
Theobald GW (1937). Effect of calcium and vitamin A and D on incidence of pregnancy toxaemia. Lancet 2:1397-1399.
Browse Key ConceptsBack to Library
GET-IT Jargon Buster
GET-IT provides plain language definitions of health research terms