Randomistas Read online

Page 2


  These were depressing findings, but they prompted researchers to look into other ways to help disadvantaged youth. A Chicago ‘Parent Academy’, which pays parents for attending workshops with early childhood experts, boosts performance for white and Hispanic students (though not for black students).28 Mentoring programs for disadvantaged high schoolers help reduce absenteeism (though do less for academic outcomes).29 Upbeat text messages sent to adult education students lower dropout rates by one-third.30

  To evaluate a policy is simply to ask, ‘Does it work?’ The catch is, we need to know what would have happened if the program hadn’t been put in place. As if we are entering the world of sci-fi (cue minor chords), we need to know something that never happened.

  In the film Sliding Doors, we follow the life of Gwyneth Paltrow’s character, Helen, according to whether or not she manages to catch a train. In one plotline, she catches the train, finds her boyfriend in bed with another woman, dumps him and starts her own public relations company. In another, Helen just misses the train, gets robbed in the street and juggles two badly paid jobs – oblivious to her boyfriend’s infidelity. What makes Sliding Doors a fun movie is that we get to see both pathways – like rereading a Choose Your Own Adventure book. We get to see what economists call the ‘counterfactual’ – the road not taken.

  We can’t truly see the counterfactual in real life, but sometimes it’s pretty darn obvious. If you want to know how good winning the school raffle feels, just compare the facial expression of the winner with the faces of everyone else. If you want to know what a hailstorm does to a car, compare vehicles in a suburb hit by a storm with those in a part of town the storm missed.

  But sometimes counterfactuals aren’t as obvious. Suppose you decide to treat a bad headache by taking a painkiller and going to bed. If you wake up in the morning feeling better, you’d be unwise to give the tablet all the credit. Perhaps the headache would have gone away by itself. Or maybe the act of taking a pill was enough – the placebo effect. The problem gets more difficult when you realise that we sometimes seek help when we’re at a low point. Most sick people recover by themselves – so if you want to find out the effect of going to the doctor, it would be ridiculous to assume that the counterfactual is having a runny nose for the rest of your life. Similarly, most people who lose a job ultimately find another, so if you want to find out the impact of job training, it would be a mistake to assume that the participants would otherwise have remained jobless forever.31

  Researchers have spent years thinking about how best to come up with credible comparison groups, but the benchmark to which they keep returning is the randomised trial. There’s simply no better way to determine the counterfactual than to randomly allocate participants into two groups: one that gets the treatment, and another that does not.

  In practice, participants can be allocated to random groups by drawing slips of paper from a hat, tossing a coin or using a random number generator. Suppose that we asked everyone in the world to toss a coin. We would end up with nearly 4 billion people in the heads group, and nearly 4 billion in the tails group. Both groups would be comparable in terms of things we can easily measure. For example, each group would have similar numbers of men, millionaires and migrants. The groups would also be alike in ways that are impossible to measure. Each group would contain similar numbers of people with undiagnosed brain cancer, and similar numbers of people who will win tomorrow’s lottery. Now imagine we asked the heads group to get an extra hour’s sleep that evening, and then surveyed people the next night, asking them to rate how happy they were with their lives on a scale from 1 to 10. If we found that the heads group were happier than the tails group, it would be reasonable to conclude that a little more snooze helps lose the blues.

  The beauty of a randomised trial is that it gets around problems that might plague an observational analysis. Suppose I told you that surveys typically find that people who slumber longer are happier. You might reasonably respond that this is because happiness causes sleep – good-tempered people tend to hit the pillow early. Or you might argue that both happiness and sleep are products of something else – like being in a stable relationship. Either way, an observational study falls prey to the old critique: correlation doesn’t imply causation.

  Misleading correlations are all around us.32 Ice-cream sales are correlated with shark attacks, but that doesn’t mean you should boycott Mr Whippy. Shoe size is correlated with exam performance, but buying adult shoes for kindergarteners isn’t going to help. Countries with higher chocolate consumption win more Nobel prizes, but chomping Cadbury won’t make you a genius.33

  By contrast, a randomised trial uses the power of chance to assign the groups. That’s why farmers use randomised trials to assess the quality of seeds and fertilisers, and why medical researchers use them to test new drugs. In most cases, randomised trials provide evidence that’s both stronger and simpler. The results are not only more likely to stand up to scrutiny, but also easier to explain to your uncle. As one social researcher recalls from learning about random assignment: ‘I was struck by the power of this novel technique to cut through the clouds of confusing correlations that make the inference of causality so hazardous . . . I have never lost my sense of wonder at this amazing fact.’34 There are some limits to randomised trials – which I’ll explore in Chapter 11 – but in most cases we’re doing too few randomised trials, not too many.

  When Angus Deaton used the term ‘randomistas’, he didn’t mean it as a compliment. In his own field of development economics, he felt that randomised trials were being used to answer questions to which they were poorly suited. As the saying goes, to the person with a hammer, everything looks like a nail. In development economics, Deaton felt his colleagues were whacking too many problems with randomised trials.35

  It’s certainly true that not every intervention can – or should – be randomised. A famous article in the British Medical Journal searched the literature for randomised trials of parachute effectiveness.36 Finding none, the researchers concluded (tongue-in-cheek): ‘the apparent protective effect of parachutes may be merely an example of the “healthy cohort” effect . . . The widespread use of the parachute may just be another example of doctors’ obsession with disease prevention.’ Using similar critiques to those levelled at non-randomised studies in other fields, the article pointed out the absurdity of expecting everything to be randomly trialled.

  The parachute study has been widely quoted by critics of randomised evaluations. But it turns out that experiments on parachute efficacy and safety are widespread. The US military use crash dummies to test the impact of high-altitude and low-altitude jumps, and soldiers have conducted randomised parachute experiments to improve equipment and techniques.37 One randomised study, conducted at Fort Benning in Georgia, sought to reduce sprained ankles, the most common parachuting injury. The experiment found that wearing an ankle brace reduced the risk of ankle sprains among paratroopers by a factor of six.38

  The same goes for other instances where it would be wrong to take away the treatment entirely. It would be absurd to withhold pain medication in surgery, but anaesthetists frequently conduct randomised trials to see which painkiller works best. In fiscal policy, no reasonable government would ignore an impending recession, but it might quite reasonably roll out household payments on a random schedule, which would then teach us how much of the money got spent.39 The control group in a randomised trial doesn’t have to get zilch. In many cases, it may get an alternative treatment, or the same treatment a bit later. Sometimes randomised trials are best used to tweak an existing treatment. At other times, they can be used to tackle the biggest questions, like how to avert disaster.

  The claim that randomised trials are unethical isn’t the only criticism levelled at them. Detractors also argue that randomised trials are too narrow, too expensive and too slow. Again, these are fundamental challenges, but they aren’t fatal. Admittedly, specific studies can be narrow, but that’s a reminder t
o be careful in interpreting the results. For instance, if a drug works for women, we shouldn’t assume it will work for men.40 In the case of scale, it’s true that some studies can cost millions and take decades. But there has been a proliferation of fast, cheap randomised trials. Businesses are increasingly using trials to tweak processes, while government agencies are employing administrative data to conduct low-cost experiments.

  *

  Steels Creek resident John O’Neill said that as the fire came towards him, it ‘sounded like ten or twenty steam trains’. The sky turned red, black and purple. He screamed at his children to get inside the house, and they lay on the floor, wet cloths on their faces to keep the smoke at bay. Embers hit the windows. It was, O’Neill said, ‘like being inside a washing machine on spin cycle and full of fire’.41

  O’Neill and his family lived through Black Saturday, the deadliest fires ever to hit Australia. In February 2009 Victoria experienced a blistering summer heatwave after a decade of drought. Temperatures were at record highs, and the winds were strong and dry. As one expert later noted, these conditions created an extraordinary inferno. The flames were up to 100 metres tall, and fire temperatures as high as 1200°c. Aluminium road signs melted. Eucalyptus oil from the gum trees ignited in the canopy, creating ‘fireballs’ – parcels of flammable gases that were blown up to 30 kilometres ahead of the fire front.

  The fire produced its own atmospheric conditions, with the convection column creating an internal thunderstorm. Like an angry beast, it shot out lightning, creating new fires. As one firefighter put it, ‘This thing was huge, absolutely huge . . . just full of ember, ash, burning materials. This thing was absolutely alive.’42 The total amount of energy released by the fire was equal to 1500 Hiroshima atomic bombs.

  In the Murrindindi Scenic Reserve, a team of firefighters came across nineteen frightened campers, but were unable to evacuate them before the fire cut the exit road. They gathered the campers up and drove the fire truck into the nearby river. For the next ninety minutes, they sprayed water on the truck roof as the fire raged around.43

  By the end of Black Saturday, 173 people had died and thousands of homes had been destroyed. Reviewing the disaster and its aftermath, a royal commission recommended that more research be done to understand the unpredictable behaviour of fires in extreme conditions.

  In the Canberra suburb of Yarralumla, CSIRO researcher Andrew Sullivan is standing in front of a 25-metre-long wind tunnel. At one end is a fan the size of a jet engine, capable of sucking in as much air in a single second as the volume of a small swimming pool. At the other end is a glass-walled section. ‘Here’s where we light the fires,’ he tells me. When he tells me it’s called ‘the Pyrotron’, I’m reminded of a character from the X-Men.

  Opened the year before the Black Saturday bushfires, the Pyrotron is where Sullivan and his team conduct experiments on fire behaviour. What makes some species of trees burn faster than others? How do spot fires combine to form a single fire front? How effective are powder suppressants compared to simply spraying water? If they didn’t randomise their experiments, Sullivan points out, researchers could easily go astray. Like Scouts practising for their fire lighting badge, scientists who spend a day setting fires in the Pyrotron will probably get steadily better at the task. So if researchers were to gradually dial up the air flow with each successive experiment, they might end up thinking they were learning about air flow when in fact they were measuring the effect of a better prepared fire. By randomising the order of their experiments, scientists are more likely to unveil the truth.

  Randomised fire experiments are also an important part of the response to climate change. Nearly one-quarter of global greenhouse-gas emissions come from fires, so reducing the carbon emitted from bushfires could be a cost-effective way of addressing climate change. Experiments in the Pyrotron by Sullivan and his fellow researchers found that low-intensity fires emit less carbon dioxide and carbon monoxide, suggesting that backburning operations could be a valuable way of cutting down greenhouse-gas emissions.44

  Randomised fire experiments aren’t just conducted inside the safety of the Pyrotron. One of Australia’s greatest bushfire researchers, Alan McArthur, ignited and observed over 1200 experimental fires during the 1950s and ’60s. Many of McArthur’s fires were lit on the aptly named Black Mountain in Canberra – within sight of the Pyrotron. These experiments helped provide insights as to how quickly fires move through grasslands, eucalypt forests and pine plantations. For firefighters, McArthur’s work showed the risk of fighting on hilly surfaces, because fires move more quickly uphill than on the flat.

  For the general public, McArthur’s legacy is in producing the first fire danger ratings systems, which converted weather data into an easy-to-understand five-point risk scale.45 In bushfire-prone areas across Australia, these signs stand by the roadside today. After the Black Saturday fires, they were updated to add a sixth risk category: ‘Catastrophic’. By carrying out randomised experiments, McArthur came up with a straightforward way to convert complex weather data into a simple fire risk index.

  *

  In 1769, sixteen years after the publication of Lind’s randomised trial on scurvy, a surgeon by the name of William Stark decided to experiment on himself to see how different foods affected scurvy.46 He began with a month on just bread and water, then began supplementing this with foods one at a time, including olive oil, milk, goose and beef. Two months in, he contracted scurvy. Stark continued to document his precise food intake and his own medical condition, as he added in more foods, including butter, figs and veal. Seven months into the experiment, he died, aged twenty-nine. At the time, Stark had been considering adding fresh fruit and green vegetables to his diet, but was still working through bacon and cheese.

  Stark has been described as one of history’s ‘martyrs of nutrition’.47 But if he had read Lind’s treatise, he might have saved himself significant pain, not to mention an early death. Lind stands as a reminder not only of the value of carrying out high-quality evaluations, but also of the importance of making sure the results are acted upon.

  2

  FROM BLOODLETTING TO PLACEBO SURGERY

  Standing in a sparkling white operating theatre, I’m watching my first surgery. On the bed is a 71-year-old patient having her hip replaced. Around the room are nurses, an anaesthetist, a representative of the company that makes the artificial hip, and an observing doctor. In the centre, gently slicing open the woman’s hip with a scalpel is Melbourne surgeon Peter Choong. Easy-listening music plays from the stereo. The atmosphere in the room couldn’t be calmer. It’s a familiar operation, and the team know each other well.

  First incision made, Peter puts down his scalpel, and picks up a bipolar diathermy machine. Now he’s burning the flesh instead of slicing it, a technique that reduces bleeding and speeds recovery. The room smells like a barbecue. Back to the scalpel, and a few minutes later Peter is in to the hip joint. To clean it out, he uses a device like a power drill. On the end is a metal sphere the size of a ping-pong ball, its rough surface designed to shave the hip socket until it’s perfectly smooth. When he pulls it out, the ball is covered in bone and blood. Not for the first time, I’m glad I ate a light breakfast.

  Modern surgery is a curious combination of brawn, technology and teamwork. One moment, Peter is swinging a hammer or lifting a knee joint. Next, he is fitting a prosthesis, watching a computer screen as crosshairs indicate precisely the angle of fit. The tension rises when the bone cement is mixed. As soon as the two compounds are combined, a nurse begins calling time: ‘Thirty seconds . . . one minute . . . one minute 30.’ At four minutes, it’s inserted into the patient. At five minutes, the artificial joint is attached. Everyone in the room knows the cement will be hard at ten minutes. After that, the only way to change the angle of the prosthesis is to chip the hardened cement out from inside the bone.

  In an operating theatre, the surgeon is in command. And yet for all his expertise, Peter is surpris
ingly willing to admit what he doesn’t know. Is it better to perform hip surgery from the front (anterolateral) or the back (posterolateral)? Should we encourage obese patients to get a lap-banding operation before knee surgery? How early should we get patients out of bed after a joint replacement? For antiseptic, is it best to use iodine, or does the cheaper chlorhexidine perform equally well?

  Over the coming years, Peter hopes to answer each of these questions. His main tool: the randomised trial. A few years ago he led a team that conducted a randomised trial to test whether total knee replacement was better done the conventional way, or with computer guidance to help line up the implant.1 Across 115 patients, the study showed that computer assistance led to a more accurate placement of the artificial knee, and higher quality of life for patients. In other studies, he has randomised particular surgical techniques and strategies for pain management post-surgery.2

  Most controversially, Peter Choong is a strong supporter of evaluating surgical operations against a control group who receive ‘placebo surgery’. For control group patients, this typically means that the surgeon makes an incision and then sews them up again.

  Placebo surgery – also known as ‘sham surgery’ – is used when researchers are uncertain whether or not an operation helps patients. In one famous study, surgeons tested whether keyhole knee surgery helped patients with osteoarthritis.3 At the time, the operation was performed more than a million times a year around the world. But some surgeons had doubts about its effectiveness. So in the late 1990s a group of surgeons in Houston conducted an experiment in which some patients received keyhole surgery, while others simply received an incision on their knee. Only when the surgeon entered the operating suite did an assistant hand them an envelope saying whether the surgery would be real or sham. Because the patients were under a local anaesthetic, the surgeons made sure patients were kept in the operating theatre for the same length of time as in the real operation, and manipulated the knee as they would for a surgery. Two years later, patients who received sham surgery experienced the same levels of pain and knee function as patients who had real surgery.