Introduction

What the Blood Already Knows

The Hidden Library

In the United States, millions of routine blood tests are drawn every year. A complete blood count. A basic metabolic panel. A lipid profile. A hemoglobin A1c for someone whose sugars have been a little high. The results are printed out, filed into the electronic chart, and stored. The doctor scans the page for anything that explains today’s complaint. If nothing jumps out, the results go into the record and the patient goes home.

This happens day after day, in every hospital and clinic in the country, for a lifetime of visits. By the time a person reaches 70, their electronic medical record contains thousands of numbers. Most of those numbers have never been looked at by a human being—not because anyone failed to do their job, but because the tools to make sense of them, across time and across dozens of values at once, simply did not exist. The file is a kind of library. And the library, it turns out, has been keeping notes on us.

Hidden in those numbers, for decades before any symptom appears, are the early signatures of the diseases that will eventually kill us. Cancer, most of all. But also heart failure. Kidney disease. Diabetes. Stroke. Aortic aneurysm. Liver failure. Alzheimer’s. Each one leaves a faint fingerprint in the routine bloodwork, years before a doctor would ordinarily order the test that would confirm it. The fingerprints are too subtle for a human to read. They involve small shifts across a dozen values, tracked over years, combined with information from the medication list, the vital signs, the problem list. No clinician can hold all of that in mind. A computer can.

This book is about what happens when we finally read the library.

Late Detection Is the Default

American medicine is built around late detection. We wait for symptoms. We order tests when something hurts, when something breaks, when something bleeds. For cancer, we have added a handful of dedicated screening programs: mammograms for breast, colonoscopies for colon, low-dose CT scans for a narrow slice of lung cancer patients, PSA for prostate. We call them early detection. They are not. By the time a mammogram finds a lump, the tumor has usually been present for five to eight years. By the time a colonoscopy finds a polyp that has turned malignant, the cellular disease has been smoldering for a decade. For most other cancers, we have no screening program at all. We find them when the patient walks in unable to breathe, unable to eat, or unable to pass a stool.

The price of this policy is enormous, and almost entirely invisible. Roughly 600,000 Americans die of cancer every year. Another 700,000 die of heart disease. A further 165,000 die of stroke. The overwhelming majority of these deaths involve diseases that had been quietly present in the body for years, undetected, while the patient went about their life and visited their doctor and had their blood drawn.

None of this reflects a failure of medicine, or of the devoted physicians, researchers, and scientists who have spent their careers practicing and advancing it. Doctors reading a blood panel have always done exactly what the best available science and training asked of them: check whether values fall within normal ranges, and act on what stands out. The patient does what the doctor suggests. The lab does what it is asked. The record system stores what it is fed. Each piece works. What the system could not do—until very recently—was read the data across time, because no tool existed to do it. The signal that would warn us is distributed across a dozen values and a dozen visits, each one individually unremarkable, each one well within normal limits. No human being can hold all of that in mind simultaneously. There used to be no machine that could either. That is no longer true.

What Has Changed

Three things have happened in the last fifteen years, quietly enough that most Americans have not noticed, and they have changed what is possible in medicine.

First, almost every US hospital and clinic has moved onto an electronic medical record. The records are imperfect, fragmented, and famously annoying to use. But for the first time in the history of medicine, the longitudinal health data of the American population exists as searchable, linkable, machine-readable files. The raw material is there.

Second, a class of machine learning algorithms called gradient boosting, and a set of techniques for making their predictions explainable, has turned out to be exceptionally well-suited to structured medical data. These are not the dramatic algorithms of the newspapers: not the ones that write essays or draw pictures. They are duller than that, and more useful. They look at tables of numbers and find patterns in them. Given enough examples, they can learn which combinations of ordinary blood values, measured over time, distinguish the patient who will develop colorectal cancer in four years from the one who will not.

Third, a small number of research groups and companies have actually done the work. They have trained and validated and published these algorithms. A company in Israel has built one for colorectal cancer, using only the complete blood count. Another for lung cancer. Another for abdominal aortic aneurysm. Kaiser Permanente has built risk models for heart failure. Geisinger Health System has deployed several of them into live clinical practice. Maccabi Healthcare in Israel has scaled them across a population of two million patients. The peer-reviewed evidence is rigorous. The performance is good. The costs are negligible, because the blood has already been drawn.

The pattern repeats across disease after disease. The same kind of routine data. The same kind of algorithm. The same kind of lead time. The same kind of result: a list of patients who should be scanned, scoped, or seen, produced for a marginal cost that rounds to zero.

What the rest of this book does is take that pattern apart, disease by disease, and show you how it works. We begin with cancer, because cancer is where the evidence is deepest, where the stakes are highest for the largest number of people, and where the gap between what is possible and what is practiced is the most painful. But the argument generalizes. By the end of the book, you should understand why the same approach that catches an early colon cancer also catches an early aneurysm, an early case of heart failure, an early diabetes, an early kidney disease. The biology is different. The method is the same.

A Working Example

This is a book of argument, and you have a right to be skeptical of arguments until they are made concrete. So let me close this introduction with a single case, already in production, that shows you the shape of everything that follows.

The aorta is the main pipe carrying blood from your heart to the rest of your body. It is about the diameter of a garden hose. When a section of it weakens, it balloons outward into what doctors call an aneurysm. The bulge makes no sound. It shows up on no physical exam. It grows for years. Then one day the wall tears, and the patient bleeds to death internally before the ambulance arrives.

Aneurysm ruptures kill roughly 25,000 Americans every year, and another 125,000 people worldwide. Eight out of ten victims do not survive. Half never reach a hospital alive. More than two-thirds had no idea they were at risk. They had visited doctors. They had had blood drawn. The bulge in the aorta, which had been growing for a decade, went unlooked for because nobody thought to look—and because no routine tool existed to find it.¹

The official response, issued by the US Preventive Services Task Force in 2005, was a one-time ultrasound for men aged 65 to 75 who had ever smoked. Fewer than one in three eligible men gets one. The rule excludes all women, all nonsmokers, and everyone under 65. The Task Force reviewed the question again in 2014 and in 2019, and each time left the rule essentially unchanged. By its own accounting, forty-three percent of Americans who die of a ruptured abdominal aneurysm fall outside its criteria.²

But the aorta, long before it fails, has already been telling us it is in trouble. The warning sits in blood samples that have already been drawn. A study that followed 15,800 Americans for two decades found that six ordinary blood values, every one of them on any hospital menu, could pick out who would later develop an aneurysm. People with four or more of the six markers elevated had roughly ten times the risk of those with none. The paper appeared in 2015.

Someone eventually built the obvious tool. Medial EarlySign, an Israeli company focused on finding disease signals in routine clinical data, trained a machine learning algorithm on exactly this kind of blood panel and called it AAA-Flag. In validation, it tripled the rate at which aneurysms were caught compared with current screening guidelines. A patient who would never have qualified under existing rules—too young, the wrong sex, a nonsmoker—could appear near the top of a risk list generated entirely from tests already sitting in the record. The blood had known for years. The algorithm learned to listen.³

Hold that in mind. A disease that kills more Americans every year than drunk drivers. A warning signal already present in routine blood tests, years before the wall tears. An algorithm that reads what was always there. This is not a future technology. It exists. The blood was speaking. We now have something that can hear it.

The aorta is only the beginning. The same pattern runs through colorectal cancer, lung cancer, heart failure, kidney disease, and diabetes. In every case, the signal was present in routine blood for years before the diagnosis. We now have the tools to read it. The chapters ahead lay out the evidence, disease by disease, and make the case for what comes next. The library has been waiting a long time. It is time to see what it says.

Bruce Ratner and the REDI Team
April 2026

Chapter 1

The Death of George Washington

What Two Thousand Years of Wrong Medicine Can Teach Us About Cancer

On the evening of December 13, 1799, George Washington rode in from his Mount Vernon plantation after two days in the saddle, his coat soaked through by a steady, freezing rain. He was 67 years old, still vigorous enough to manage his five-farm estate, still the most respected man in America. By the following morning he had a sore throat. By evening it had worsened into something more alarming, a swelling of the throat so severe he could barely swallow or speak. He called for his doctors.

Three of the finest physicians in the young republic arrived within hours. What followed was not a failure of medical intelligence or of personal dedication. These were serious, learned men who were doing exactly what their training demanded and what centuries of received wisdom required. They bled him. Over the course of roughly twelve hours, they removed somewhere between five and seven pints of blood from George Washington's body, nearly half his total blood volume, using lancets and ceramic bowls and the settled confidence of men who knew, beyond any doubt, that they were doing the right thing.

Washington died on the evening of December 14, 1799. He almost certainly died of epiglottitis, a bacterial infection of the throat that modern medicine treats with antibiotics and, in severe cases, a brief intubation. The disease itself, in 1799, might well have killed him anyway. But the bloodletting did not help. It may have killed him faster.

The physicians who bled George Washington were not ignorant men practicing folk medicine. They were trained in the best tradition available to them, a tradition that stretched back to ancient Greece and the teachings of Galen, who had codified the theory of the four humors in the second century AD. Blood, phlegm, yellow bile, and black bile: the body was healthy when these four substances were in balance and sick when they were not. The cure for almost any imbalance was removal of the offending excess. For fever, for infection, for the kinds of swelling that blocked Washington's throat, the prescribed remedy was bleeding. It had been the prescribed remedy for roughly fifteen hundred years.¹

What is remarkable about the story of bloodletting is not that it was wrong. Medical history is full of wrong ideas. What is remarkable is how long it persisted after serious people began to doubt it, how many physicians continued to practice it out of habit and institutional inertia even as evidence accumulated against it, and how viciously the medical establishment defended it when reformers proposed alternatives. Pierre Louis, a French physician who studied death records in the 1820s and 1830s and found that patients who were bled died at higher rates than patients who were not, was dismissed and attacked. His methodology was considered an affront to clinical judgment. Numbers, his critics argued, could not capture the subtlety of individual cases.

Bloodletting was not fully abandoned in mainstream Western medicine until the late nineteenth century, more than fifty years after Louis published his findings. Fifty years. During which patients continued to be bled. During which the best physicians in the world continued to practice a treatment that the evidence had already shown was killing people.²

I think about George Washington's physicians when I think about cancer.

Not because the parallel is perfect. It is not. The doctors who killed Washington with their lancets were operating with no good alternatives and on the basis of the best theory their era had produced. The doctors and scientists and health policy makers who continue to pour the vast majority of cancer research funding into the treatment of late-stage metastatic disease are not working with bad theory. They are working with brilliant science, deployed against the wrong problem.

But the pattern is the same. A paradigm takes hold. Institutions form around it. Careers are built on it. Billions of dollars flow toward it. And the evidence that something better exists accumulates quietly in the peer-reviewed literature, validated in populations of hundreds of thousands of patients, deployed in clinical settings where it is already saving lives, while the mainstream moves slowly, carefully, bureaucratically, toward the future that the data has already shown is possible.

That future is early early detection. And this book is about why we are not yet living in it, and how we get there.

The War on the Wrong Front

In December 1971, President Richard Nixon signed the National Cancer Act and declared war on cancer. The country had recently put men on the moon, and the spirit of that achievement was very much in the air. If we could do that, the thinking went, surely we could cure cancer. Congress authorized $1.6 billion over three years, an enormous sum for medical research at the time. The National Cancer Institute was restructured and given a mandate to bring the full force of American scientific capability to bear on the disease. The newspapers called it a moonshot. Oncologists called it a revolution.³

More than fifty years later, we have spent hundreds of billions of dollars on that revolution. We have produced extraordinary science. We have mapped the genome of cancer cells, identified the mutations that drive their uncontrolled growth, developed targeted therapies that attack specific molecular pathways, and pioneered immunotherapies that attempt to enlist the body's own immune system in the fight. The scientific achievement is genuine and should not be minimized.

But the survival rates for advanced, metastatic cancer tell a different story. From 1974 to 1985, 1 percent of patients diagnosed with late-stage lung cancer survived five years or more. Between 2011 and 2017, that number had risen to 8 percent. A meaningful improvement for individual patients, and we should be grateful for it. But it is not a revolution. It is not even close to a revolution. Lung cancer still kills approximately 127,000 Americans every year, and the overwhelming majority of them are diagnosed after the disease has already spread beyond the lung, at which point the five-year survival rate is 8 percent regardless of what drugs we throw at it.⁴

The story repeats for cancer after cancer. Late-stage colorectal cancer had a five-year survival rate of 14 percent in 1974. In 2017, it was still 14 percent. Late-stage pancreatic cancer: 3 percent survival, essentially unchanged over four decades. Ovarian cancer, which kills roughly 13,000 American women every year almost entirely because there is no screening test and the disease is found only after it has spread: 31 percent five-year survival at Stage IV. The honest accounting is brutal. Fifty years of the war on cancer, and we have barely moved the needle on the cancers that kill the most people.⁵

This is not a criticism of the scientists and physicians who have devoted their careers to this work. I have sat on the boards of Memorial Sloan Kettering Cancer Center and Weill Cornell Medicine for many years, and I have seen the caliber of the people engaged in this fight. They are extraordinary. The failure is not of talent or dedication. The failure is of priority. We have been fighting on the wrong front.

Siddhartha Mukherjee, whose Pulitzer Prize-winning book The Emperor of All Maladies remains the most penetrating account of cancer's history and biology, put it precisely in his description of what cancer actually is: not a single disease but an almost incomprehensible array of diseases, each with its own genetic signature, its own mechanisms of growth and spread, its own strategies for evading the body's defenses. Cancer cells, Mukherjee wrote, are more perfect versions of ourselves, growing faster, adapting better, using all the biological capabilities of healthy cells but without the constraints that keep healthy cells in check.

Once cancer has metastasized, it is fighting us with billions of cells across multiple organ systems, each accumulating new mutations, each developing new strategies for survival. The probability of defeating that level of biological complexity with drugs, however ingenious, is vanishingly small for most cancers. This is not pessimism. It is arithmetic. And the arithmetic has barely changed in fifty years, despite everything we have spent and learned.⁶

What the Numbers Actually Say

There is a set of numbers in oncology that does not get nearly enough attention, because it does not generate drug approvals or research grants or the kind of dramatic narrative that drives philanthropy. These are the survival rates for early-stage cancer, and they are, depending on the cancer, anywhere from five to thirty times better than the survival rates for late-stage disease.

Lung cancer caught at the localized stage, before it has spread beyond the lung, has a five-year survival rate of 60 percent. Found after it has spread to distant organs, that rate is 8 percent. The disease is the same disease. The biology is the same biology. The difference is entirely when it is found.

Colorectal cancer detected at Stage I: 91 percent five-year survival. At Stage IV: 14 percent. Pancreatic cancer, the disease that has one of the most fearsome reputations in oncology, actually has a 50 percent five-year survival rate when caught while it is still confined to the pancreas. After it has spread, that number collapses to 3 percent. Ovarian cancer found early: 93 percent. Found at Stage IV, where most women find it: 31 percent.

The difference between those numbers has nothing to do with better drugs or more aggressive treatment. It has everything to do with when the cancer is found. This is the central fact of cancer medicine, and it has been the central fact of cancer medicine for as long as we have been keeping statistics. Early detection does not merely improve outcomes. For most cancers, it is the only reliable path to a cure.⁷

And yet, as of this writing, roughly 60 percent of all cancer research funding in the United States goes toward treatment of advanced disease. Early detection receives a fraction of that. This is the equivalent of the Army of the Potomac spending 90 percent of its budget on battlefield medicine and 10 percent on strategy. The casualties keep coming, the treatments get marginally more sophisticated, and the death toll barely budges.

The Immunotherapy Illusion

Nothing better illustrates the seductive power of the wrong paradigm than the story of immunotherapy, and specifically the story of Keytruda, Merck's pembrolizumab, which has become the best-selling cancer drug in the world.

The science behind immunotherapy is genuinely interesting and, in limited contexts, genuinely effective. The idea is to use the body's own immune system against cancer, removing the molecular brakes that cancer cells use to hide from immune surveillance and allowing T-cells to recognize and attack tumors. For a subset of patients with specific cancers and specific genetic profiles, the results have been striking. There are people alive today who would not be alive without immunotherapy, and that matters.

But Keytruda's record on the cancer that kills the most Americans, lung cancer, tells a more sobering story. In 2016, the FDA approved pembrolizumab for previously treated non-small cell lung cancer patients whose tumors express PD-L1 at high levels. The median overall survival benefit in the pivotal trial was approximately 10.4 months compared to chemotherapy. Merck had revenues from Keytruda exceeding $25 billion in 2023. The median survival extension was less than a year, bought with significant toxicity: fatigue, immune-related adverse events, pneumonitis, colitis, and the other consequences of a system designed to attack the body's own cells when dysregulated.⁸

I want to be precise about what I am saying and what I am not saying. I am not saying immunotherapy has no value. I am saying that a drug which extends median survival by less than a year, at enormous cost and with significant side effects, while generating $25 billion in annual revenue, represents a profound misallocation of medicine's attention and resources. If a fraction of that attention had been directed toward finding lung cancer before it had metastasized, when surgery alone can cure it in 60 percent of patients, the arithmetic of lives saved would look completely different.

CAR-T cell therapy, the other great hope of the immunotherapy revolution, has shown genuine promise in certain blood cancers. It has shown almost nothing in solid tumors, which is where the vast majority of cancer deaths occur. The biological reasons for this are becoming clearer: solid tumors create immunosuppressive microenvironments that defeat T-cell infiltration, accumulate mutations that allow tumor cells to escape immune recognition, and develop physical barriers that prevent effective drug penetration. These are not problems that clever engineering is going to solve in any short period of time. They may not be solvable at all for many cancers.

In a 2019 analysis published in JAMA Oncology, researchers examined the clinical benefit of 207 FDA cancer drug approvals between 2006 and 2017. They found that only 43 percent of those approvals were based on evidence of improved overall survival. The median overall survival benefit for the drugs that did show survival improvement was 2.4 months. The median cost of a course of cancer treatment had risen to $150,000 per year.⁹

Two and a half months. At $150,000 a year. With side effects that frequently make those months among the most difficult of a patient's life. This is the war on cancer in 2024. It is a war we are losing on the terms we have chosen to fight it, and the evidence has been telling us so for decades.

The Pattern That Killed George Washington

I want to return to those physicians around George Washington's bed, because the lesson I draw from them is not the obvious one.

The obvious lesson is that they were wrong, and that being confidently wrong in medicine costs lives. That lesson is true, and important. But the deeper lesson is about the mechanism by which intelligent, dedicated people remain confidently wrong for so long, even after the evidence against them has accumulated.

Bloodletting persisted not because physicians were stupid but because the entire infrastructure of medicine was organized around it. Medical schools taught it. Textbooks codified it. Professional reputations were built on its practice. Lancets and bleeding bowls were standard equipment in every physician's bag. The humoral theory gave practitioners a coherent explanatory framework that made the treatment feel logical, even when patients died. And the alternative—doing nothing, or doing something different—required not just a change of treatment but a wholesale repudiation of the theory that gave medicine its intellectual structure.

Pierre Louis, the French physician who first used statistical analysis to demonstrate that bleeding did not improve outcomes, published his findings in 1828. He was ridiculed. His critics argued that statistics could not capture clinical reality, that the numbers missed the nuance of individual cases, that experienced clinical judgment was more reliable than population-level data. Sound familiar? These are the same arguments made today against machine learning algorithms that can detect cancer from routine blood work with greater accuracy than any individual physician examining the same data. The arguments were wrong then. They are wrong now.¹⁰

The pattern is this: a new approach demonstrates clear evidence of superiority over the existing approach. The existing approach has deep institutional roots, large financial interests, and the comfort of familiarity. The new approach is resisted, dismissed, subjected to demands for more evidence, more validation, more prospective studies, more regulatory review. People die in the interim. Eventually, often decades later, the new approach becomes the standard. And the physicians of that future era look back at the resistance with the same incredulity that we feel when we read about bleeding George Washington to death.

Someday, I am convinced, oncologists will look back at our era and feel that same incredulity. They will read about how we had access to routine blood tests that machine learning algorithms could analyze to detect cancer six to twenty-four months before any tumor was visible on imaging, and how we deployed those algorithms in only a handful of clinical settings while hundreds of thousands of patients died of cancers that could have been found and cured. They will feel about our era the way we feel about those physicians in Mount Vernon in December 1799.

That day does not have to be fifty years away. The science exists right now. The algorithms have been built and validated. Some are already deployed and saving lives. What we lack is not the technology. What we lack is the will to treat early detection as the priority it has always deserved to be.

A Personal Reckoning

I come to this subject not as a physician or a scientist but as someone who has watched cancer take people I loved, over and over again, across a lifetime. My grandmother died of stomach cancer when I was five years old. My mother died of colon cancer at 56. My sister-in-law died of breast cancer at 42, leaving behind a six-year-old daughter. My brother Michael, my closest friend and the moral compass of our family, was diagnosed with a brain tumor in 2015 and died eight months later.

I sat on the boards of Memorial Sloan Kettering Cancer Center and Weill Cornell Medicine while my family was decimated by cancer. I had access to the best physicians in the world. I called in every favor, pursued every avenue, sought out every promising treatment. And what I learned, through each of these experiences, is that by the time cancer announces itself with symptoms, by the time it shows up on a scan or causes a blockage or begins pressing on a nerve, it has usually already won.

My mother's colon cancer was found only because it had caused a complete intestinal blockage requiring emergency surgery. If that cancer had been found during a routine colonoscopy three years earlier, she very likely would have lived to see her grandchildren. My sister-in-law's breast cancer was already Stage III when it was diagnosed. Had she been tested for the BRCA1 mutation that ran in her family, she might have caught the tumor while it was still localized and curable. My brother's brain tumor was the metastatic spread of a primary cancer that had never been found, that had grown silently somewhere in his body until it was everywhere.

I wrote my first book, Early Detection: Catching Cancer When It's Curable, to make the case for existing screening technologies: colonoscopy, low-dose CT for lung cancer, mammography, Pap smears. Those technologies save hundreds of thousands of lives every year when people actually use them, and the persistent failure to extend them to underserved communities is a moral catastrophe that I wrote about at length.

But something has happened since that book was published. The science has moved forward in a way that changes the terms of the entire argument. The approach I described in my earlier book required patients to actively seek out specific tests for specific cancers, and it required the healthcare system to identify the right patients and send them for the right tests at the right intervals. It was a system that worked when it worked, and failed when it failed, and for communities with limited healthcare access it failed far more often than it worked.

What I am describing in this book is different in kind, not just in degree. It requires nothing new from the patient. The blood test is already being drawn. The data is already sitting in the electronic medical record. The only thing required is the application of machine learning algorithms to data we already have, to find the signal that is already there, and to act on it before the cancer has grown beyond the reach of a cure.

This is the revolution. Not immunotherapy. Not CAR-T cells. Not targeted therapies that extend median survival by two and a half months at a cost of $150,000 a year. The revolution is using what we already know, with tools we already have, on data we already collect, to find cancer before it has had the chance to become the thing that has been killing people I love my entire life.

What This Book Will Show You

In the chapters that follow, I am going to take you through the science, the evidence, the clinical deployments, and the economic case for a fundamental shift in how we approach cancer.

We will start with the biology: how cancer, even when it is too small to see on any scan, announces itself in the routine blood draw through changes in dozens of blood values that no human eye can detect but that machine learning algorithms can read with accuracy scores that match or exceed the best cancer screening tests we have ever developed.

We will look at the proof: the ColonFlag algorithm, trained on 600,000 patients and already deployed at health systems in Israel, the United States, and the United Kingdom, where it achieves an eightfold improvement in cancer detection over standard screening. The LungFlag algorithm, validated across nearly 200,000 patients at Kaiser Permanente, which identifies 40 percent of future lung cancer patients nine to twelve months before clinical diagnosis. The algorithms for liver cancer, gastric cancer, ovarian cancer, myeloma, kidney cancer, and nine more of the most lethal cancers we face, published in peer-reviewed journals, validated in populations of hundreds of thousands, and waiting for deployment.

We will look at the opportunity beyond cancer: how the same routine blood draw that carries the signatures of thirteen cancers also carries the early warning signals of heart failure, Type 2 diabetes, chronic kidney disease, sepsis, and a dozen other conditions that kill hundreds of thousands of Americans every year before conventional medicine detects them. The combined opportunity—saving 400,000 to 675,000 American lives annually—represents 13 to 22 percent of all annual deaths in this country.

We will look at the economics, which are so lopsided in favor of early detection that the only rational explanation for our failure to act is institutional inertia of exactly the kind that kept bloodletting alive for decades after the evidence had turned against it.

And we will look at the path forward: the consortium of great health systems in New York and beyond that could serve as the proving ground for the algorithms that already exist, the regulatory pathway that is already open, the data that is already waiting in the electronic medical records of tens of millions of patients.

The blood tests are being drawn right now. Today. Two hundred million times this year, in doctors' offices and hospital labs across this country. In each of those blood draws, for a significant number of those patients, the signal is already there. Cancer is already announcing itself. And we are not listening.

This book is about learning to listen.

Chapter 2

The Long Road to the Pap Smear

How Medicine Has Always Resisted Its Own Best Ideas

In the winter of 1928, a Greek-born researcher named George Papanicolaou stood before an audience in Battle Creek, Michigan, and presented what should have been one of the most important medical announcements of the twentieth century. He had discovered a simple, inexpensive test that could detect cervical cancer in women who felt perfectly healthy, before any symptom had appeared, at a stage when the disease was almost always curable. The test required nothing more than scraping a small number of cells from a woman's cervix and examining them under a microscope. It cost almost nothing. It required no surgery, no radiation, no hospitalization. It could be performed in a primary care office in minutes.

The audience was unimpressed.

The conference itself did not help. It was organized by a eugenics foundation, and the association tainted the proceedings from the start. Papanicolaou's colleagues in pathology were skeptical that examining loose, scraped cells rather than intact tissue samples could reliably identify malignancy. A leading German pathologist of the era had recently declared that the malignant tumor cell had nothing absolutely characteristic that distinguished it from its healthy neighbors. The published report of Papanicolaou's findings was riddled with typos and accompanied by blurry photographs. He returned to his laboratory at what was then Cornell University Medical College, largely set the idea aside, and spent the next eleven years studying the female reproductive cycle instead.¹

The story of the Pap smear is the story that this book is really about. Not the science of cervical cancer, which is well understood and largely conquered. The story is about what happens between the moment a scientist discovers something that could save hundreds of thousands of lives and the moment that discovery actually reaches the patients who need it. That gap, which in Papanicolaou's case stretched across two decades and cost an uncountable number of women their lives, is the central problem of early detection medicine. It is the problem we are living through right now, with tools far more powerful than anything Papanicolaou had in 1928.

Twenty Years to a Second Chance

It was not until 1939, when a new department chairman at Cornell reviewed his staff's research files and came across the abandoned 1928 report, that Papanicolaou was encouraged to return to his discovery. The chairman, Joseph Hinsey, understood immediately what he was reading. He arranged a collaboration with a gynecological pathologist, gave Papanicolaou access to patients admitted to New York Hospital for gynecological care, and secured funding to support the work. Within two years, Papanicolaou and his collaborator H.F. Traut had validated the technique in large patient cohorts and published their findings in the American Journal of Obstetrics and Gynecology. By the late 1940s, the American Cancer Society had embraced the test and convened the First National Cytology Conference to promote it. Twenty years after his initial presentation, Papanicolaou's discovery had finally won the formal acceptance of the medical establishment.²

But acceptance by the medical establishment was not the same as reaching patients. Two entirely separate obstacles now stood between the Pap smear and the women whose lives it could save, and both of them are worth examining carefully because they reappear, in different forms, every time early detection medicine tries to move from the laboratory into the clinic.

The first obstacle was a shortage of trained personnel. Reading a Pap smear slide required a cytology technician who could examine hundreds of thousands of cells spread across a glass slide and identify the subtle morphological abnormalities that distinguished precancerous cells from healthy ones. There were nowhere near enough such technicians. Papanicolaou himself recognized the problem before the 1948 conference and had spent years training pathologists in his laboratory, hosting courses and workshops, trying to build the professional infrastructure that a national screening program would require. A 1959 Department of Defense appropriations hearing reported a tremendous shortage of trained personnel in the entire field of cytology. The NCI allocated $100,000 that year to train roughly thirty technicians and sixty pathologists. Against a national need that would eventually reach millions of tests per year, that was a gesture, not a program.

The second obstacle was economic, and it was uglier. As Pap smear testing spread through the 1950s and 1960s, laboratories discovered that offering the test at artificially low prices was an effective way to attract physicians who would then send more profitable laboratory work their way. The practice, known as pass-through billing, created a perverse incentive structure: laboratories competed on price rather than quality, drove their costs down by overworking their technicians, and produced readings of declining reliability. By the early 1970s, some cytology technicians were being paid fifty cents per slide and working multiple jobs to make a living wage. There were reports of technicians reading two hundred to three hundred slides per day, a pace at which careful examination is physically impossible. Under good conditions, even trained technicians miss cervical abnormalities between ten and twenty percent of the time. Under those conditions, the miss rate was certainly higher, and no one was measuring it.³

The consequences fell on real patients. A federal investigation of a laboratory contracted to read Pap smears for the Air Force found that seven women whose smears had been analyzed by that lab later died of cervical cancer. Similar cases emerged at laboratories across the country. A series of investigative articles by Wall Street Journal reporter Walter Bogdanich in 1987, which won the Pulitzer Prize, brought the full dimensions of the problem into public view: overworked technicians, sloppy readings, laboratories racing to the bottom on price while women paid the cost. Congressional hearings followed. The Clinical Laboratory Improvement Amendments of 1988, which capped the number of slides a technician could read at one hundred per day and established quality control standards for testing laboratories, came sixty years after Papanicolaou's original presentation and forty years after the medical establishment had formally embraced his test.⁴

Sixty years. Between the discovery and the regulatory framework that made the test reliable at scale, six decades elapsed. During those years, cervical cancer killed tens of thousands of American women who might have been saved. The disease went from one of the leading causes of cancer death in women to less than two percent of female cancer deaths in the United States today, a transformation that is one of the genuine triumphs of preventive medicine. But the transformation took far longer than it needed to, and the delay was not a failure of science. The science was complete and validated in the early 1940s. The delay was a failure of institutional will, economic incentives, and the grinding resistance that greets any new approach in medicine, however compelling the evidence behind it.

Lung Cancer: The Same Story, Faster and Worse

If the Pap smear story is a cautionary tale with a delayed happy ending, the story of lung cancer screening is a cautionary tale still in progress, and its lessons are even more directly relevant to the argument this book makes.

In 1999, researchers at what is now Weill Cornell Medicine published a study in The Lancet showing that low-dose computed tomography, LDCT, could detect lung cancers at early stages with a sensitivity far exceeding that of conventional chest X-rays. The early results from their Early Lung Cancer Action Program were striking: of the malignancies detected, the great majority were at Stage I, the stage at which surgical removal is curative in the majority of patients. The New York Times covered the findings on its front page. The National Cancer Institute convened scientists to discuss the implications. There was real excitement, the kind that precedes either rapid progress or prolonged disappointment.⁵

The NCI then launched the National Lung Screening Trial, the largest and most expensive randomized controlled trial for a cancer screening technology in American history, enrolling more than fifty thousand participants at thirty-three medical centers at a cost of $250 million. The trial ran from 2002 to 2010. Its results, published in The New England Journal of Medicine in 2011, confirmed what the Cornell researchers had argued for a decade: LDCT screening reduced lung cancer mortality by at least twenty percent in high-risk patients. The researchers themselves acknowledged that the true benefit was likely substantially higher, because the trial's design stopped screening after three rounds and followed patients for an additional six and a half years without further scans, almost certainly understating what continuous annual screening would achieve.⁶

Twenty percent. Even accepting the most conservative interpretation, that translated to tens of thousands of preventable deaths per year if the test were widely deployed. Lung cancer kills approximately 127,000 Americans annually. A twenty percent reduction means roughly 25,000 lives saved every year. A more realistic estimate of the true benefit would push that number considerably higher.

As of the most recent available data, approximately five to six percent of Americans who qualify for lung cancer screening under current guidelines are actually receiving it. Five to six percent. More than a decade after the definitive trial proved the test works, after professional societies endorsed it, after the U.S. Preventive Services Task Force recommended it, after Medicare agreed to cover it, fewer than one in fifteen eligible patients is being screened. The disease remains the leading cause of cancer death in the United States, killing more Americans each year than colorectal, breast, and prostate cancer combined.⁷

The reasons for this failure are by now familiar to anyone who has studied early detection medicine. Primary care physicians were not adequately trained to identify eligible patients and order the scan. Radiology departments lacked standardized protocols for reading results. Patients in rural areas and underserved communities had limited access to imaging centers. Insurers created prior authorization hurdles. No national media campaign was ever funded to tell eligible patients that the test existed and could save their lives. And beneath all of it, the same structural bias that has always worked against early detection: a healthcare system that generates revenue from treating disease and has no financial incentive whatsoever to prevent it.

This is the pattern. A test is developed. Evidence accumulates. Trials are demanded. Trials are run at enormous cost. Results are published. Debate continues. Guidelines are issued. Coverage is slowly approved. Implementation lags by years or decades. Patients die in the gap between what medicine knows and what medicine actually does.

The Three Laws of Early Detection Resistance

After a lifetime of watching this pattern repeat, I have come to think of it as having three consistent laws, as reliable as anything in physics, that govern how medicine responds to early detection innovation.

The First Law is that the evidence is always declared insufficient. No matter how compelling the data, critics will demand more. The Pap smear needed twenty years and a national conference before the medical establishment would accept it. Lung cancer screening needed a quarter-billion-dollar trial. After the trial, critics argued that the twenty percent mortality reduction figure understated the harms of false positives and overdiagnosis. Peter Bach, a respected researcher at Memorial Sloan Kettering, wrote publicly after the NLST results were published that the test was not even close to a panacea, that four out of five lung cancer deaths had sneaked through despite screening. He was technically correct and strategically devastating. By focusing on what the test missed rather than the tens of thousands of lives it could save, he gave institutional hesitancy exactly the intellectual cover it needed to move slowly.

The Second Law is that implementation is treated as someone else's problem. The scientists who validate a test consider their work complete when the paper is published. The professional societies that endorse it consider their work complete when the guideline is issued. The insurers who cover it consider their work complete when the benefit is added to the plan. No single institution owns the problem of actually getting the test to the patients who need it, and so no single institution is accountable when the test sits unused while people die. The result is a system in which every stakeholder can point to something it has done while collectively achieving a fraction of what the science makes possible.

The Third Law is that the financial incentives always run the wrong direction. Prevention generates no revenue. A colonoscopy that finds and removes a precancerous polyp, preventing a colon cancer that would have cost hundreds of thousands of dollars to treat, saves the healthcare system significant money. The physician who performed the colonoscopy is paid for the procedure. No one is paid for the cancer that was never diagnosed, the chemotherapy that was never administered, the hospitalization that never happened. The oncologist who treats late-stage cancer generates more revenue per patient than the primary care physician who prevented it. This is not a moral failing of individuals. It is a structural failure embedded in how American medicine is organized and paid for, and it works against early detection at every step.

These three laws have governed the history of early detection medicine from the Pap smear through lung cancer screening. They are governing it right now, with respect to the blood test algorithms that are the central subject of this book. Understanding them is not an academic exercise. It is a prerequisite for changing them.

What Changes and What Does Not

It would be a misreading of this history to conclude that nothing ever changes, or that resistance to early detection is permanent and absolute. Cervical cancer mortality has fallen by more than ninety percent in the United States since the 1940s. That is a genuine triumph, bought with decades of accumulated effort. Colorectal cancer mortality has been cut nearly in half since colonoscopy became widespread, saving hundreds of thousands of lives. Breast cancer mortality has declined significantly, driven in large part by mammography catching tumors at earlier stages. Early detection works. The tragedies documented in this chapter are not arguments against it. They are arguments for doing it faster, better, and more equitably.

What changes is the balance between scientific discovery and clinical deployment, when pushed hard enough by evidence and sustained advocacy. The HPV vaccine, introduced in the United States in 2006, has reduced HPV infection rates among young women by more than eighty percent, raising the realistic possibility of eliminating most cervical cancers within a generation. That is what a combination of scientific breakthrough, institutional commitment, and genuine public health investment can accomplish. It took many years and required concerted effort. But it is real, and it demonstrates that the pattern of delay is not inevitable.⁸

What does not change, without deliberate structural intervention, is the default behavior of medical institutions. Left to their own momentum, they move slowly. They demand more evidence. They implement partially. They reach the populations with the easiest access and leave behind the populations with the greatest need. The communities that bear the highest burden of cancer mortality—the poor, communities of color, rural populations with limited healthcare access—are consistently the last to benefit from early detection advances and the first to die from cancers those advances could have found.

A county-level CDC analysis of colorectal cancer screening rates published in 2018 found rates ranging from a low of forty percent in parts of Alaska to a high of eighty percent in parts of Florida, with a national average of sixty-seven percent. Lung cancer screening, as already noted, reaches five to six percent of the eligible population. For pancreatic cancer, ovarian cancer, and the other diseases that kill most efficiently precisely because they are almost never found early, the screening rate is effectively zero, because no population-level test exists. These are not statistics. They are the conditions under which hundreds of thousands of Americans will die this year from cancers that, had they been found six months or a year earlier, would not have killed them.⁹

The Test That Is Already Being Run

I want to tell you about a test that already exists, that is already being performed on virtually every American adult who sees a doctor regularly, and that contains information capable of detecting thirteen different cancers six to twenty-four months before any conventional method would find them.

You have almost certainly had this test. You have probably had it multiple times. Your doctor reviews the results at your annual physical, compares each value against standard reference ranges, and tells you everything looks fine. And for most values, on most days, everything probably does look fine.

But cancer does not push a single blood value dramatically out of range. It changes the entire landscape of the blood in ways too subtle for any human to perceive: a slight upward drift in white blood cell count, a fractional decline in hemoglobin, a small increase in the variation of red blood cell sizes, a barely perceptible rise in platelet count. Individually, every value passes inspection. No alarm sounds. No flag is raised. The results are filed, and you go home.

Together, those values form a pattern. Machine learning algorithms, trained on the blood test records of hundreds of thousands of patients who later developed cancer, can read that pattern six months, a year, two years before the tumor is large enough to see on any scan.

This is not a hypothesis. It is a demonstrated clinical fact. The ColonFlag algorithm, developed by Israeli scientists using gradient-boosted machine learning trained on the records of more than 600,000 patients, detects colorectal cancer from routine complete blood count values with an accuracy of 0.82 on a scale where 0.5 represents a coin flip and 1.0 represents perfect prediction. It has been externally validated in 30,000 patients in the United Kingdom. It is deployed clinically in Israel, the United States, and the United Kingdom, where at Geisinger Health System in Pennsylvania it achieved an eightfold improvement in cancer detection compared to standard screening among flagged patients who completed colonoscopy. The blood tests it reads are the same tests your doctor ordered at your last annual physical.¹⁰

The LungFlag algorithm, validated across nearly 200,000 patients at Kaiser Permanente Southern California, identifies forty percent of future lung cancer patients nine to twelve months before clinical diagnosis at a false positive rate of just five percent. It outperforms both the U.S. Preventive Services Task Force categorical screening criteria and the best quantitative risk model previously available. The blood tests it reads are, again, the same tests your doctor ordered at your last annual physical.¹¹

Published machine learning models with accuracy scores ranging from 0.80 to 0.97 exist for eleven additional cancers, among them pancreatic, ovarian, kidney, and myeloma—the cancers that kill most efficiently because they are almost never found while still curable. These models are peer-reviewed, published in leading medical journals, and waiting for the prospective validation and clinical deployment that would bring them to patients.

The data exists. The algorithms exist. The blood is already being drawn, two hundred million times a year in this country. We are not running the analysis. We are filing the results and sending patients home.

If Papanicolaou were alive today, he would recognize this moment. He would recognize the peer-reviewed publications that nobody is acting on. He would recognize the validated algorithms sitting unused in the literature while patients die of cancers they could not have known were growing. He would recognize the calls for more evidence, more validation, more prospective trials, more regulatory review, as though the evidence already in hand were somehow insufficient to justify urgency.

He spent eleven years doing something else while his discovery gathered dust. We do not have eleven years. The algorithms are built. The data is waiting. The only thing missing is the institutional will to deploy them.

Why This Time Is Different

I do not want to close this chapter on a note of pure frustration, because there is a genuine reason for optimism that did not exist during the long delays of the Pap smear era or the lung cancer screening debates.

The Pap smear required an entirely new professional infrastructure: trained cytology technicians, equipped laboratories, standardized slide preparation protocols, quality control frameworks. Building that infrastructure from scratch took decades, and rushing it produced the exploitation and poor quality that plagued the test for years. Lung cancer screening requires specialized imaging equipment, trained radiology readers, nodule management protocols, and a system for identifying and reaching the eligible population—a genuinely complex undertaking even today.

The blood test algorithms require none of that new infrastructure. The complete blood count and the comprehensive metabolic panel are already being analyzed by laboratory equipment in virtually every hospital and clinic in the country. The data is already flowing into electronic medical records. What the algorithms require is not new equipment, new clinical personnel, or new patient behavior. They require a software integration: a trained model that reads the output of the laboratory analysis that is already happening and flags patients whose patterns indicate elevated risk for follow-up.

The computational cost of running a trained algorithm on a routine blood panel is measured in fractions of a cent per patient. The clinical cost of the blood draw is zero, because it is already happening. The workflow disruption is minimal: a flag in the electronic medical record, a note to the ordering physician, a recommendation for follow-up testing. No new procedures. No new equipment. No new burden on the patient.

This is what makes the current moment categorically different from any previous moment in early detection history. The barrier to deployment is not infrastructure. It is not cost. It is not patient behavior or access. The barrier is the decision by health systems to integrate these algorithms into existing workflows, and that is a decision that a handful of committed institutions could make and execute within a few years.

I sit on the boards of Weill Cornell Medicine and Memorial Sloan Kettering Cancer Center. I have watched for more than two decades how slowly great institutions move when left to their own momentum, and how quickly they can move when the right leaders decide something matters enough to push. The algorithms to detect thirteen cancers and twelve additional non-cancer diseases from routine blood work are published, validated, and in several cases already deployed in other health systems. What is needed is a consortium of four or five great institutions committing together to prospective validation and clinical integration, working in parallel rather than sequentially, compressing what would otherwise be another decade of delay into two or three years. The science does not need to wait for more evidence. The evidence is already in.¹²

George Papanicolaou waited eleven years for someone to read his abandoned report and tell him it was worth pursuing. The woman who died of cervical cancer in 1935, whose cancer a Pap smear would have found, did not get eleven years. She got the consequence of the delay.

The patients whose blood is being drawn today, whose routine blood count contains the signature of a pancreatic cancer that will not be diagnosed for another eighteen months, do not get eleven years either. They get whatever time remains between now and the moment that cancer announces itself with symptoms, when the odds will have shifted decisively against them.

The Pap smear took sixty years from discovery to reliable, regulated deployment. Lung cancer screening has taken more than twenty years from validation and still reaches only five percent of eligible patients. We cannot afford sixty more years. We cannot afford twenty. The question this book asks, and the chapters ahead answer, is whether we have learned enough from these histories to move decisively faster this time.

I believe we have. But belief without action is just another form of delay.

Chapter 3

The Signal in the Blood

What Your Annual Blood Test Has Always Known and Never Been Asked

Let us begin with what a routine blood draw actually measures, because most people, including most physicians, have never thought carefully about the answer.

When your doctor orders a standard blood panel at your annual physical, the laboratory analyzes two primary sets of values. The complete blood count, or CBC, measures the cellular components of blood: red blood cells, white blood cells, platelets, hemoglobin, and a collection of derived measurements that describe the size, shape, and population distribution of those cells. The comprehensive metabolic panel, or CMP, measures the chemical composition of the blood: glucose, calcium, sodium, potassium, creatinine, liver enzymes, albumin, and a range of other markers that reflect how the body's major organ systems are functioning.

Together, these two panels generate more than sixty distinct values. Your doctor reviews them against published reference ranges, the statistically defined boundaries of what is considered normal for a healthy adult population. If your hemoglobin is above the lower threshold and below the upper one, it is flagged as normal. If your glucose is within range, it is flagged as normal. If all sixty-plus values fall within their respective ranges, the conclusion is that you are healthy, and you go home.

This approach to interpreting blood work has been standard practice for decades. It is also, from the perspective of early cancer detection, deeply inadequate. Not because the reference ranges are wrong, but because evaluating each value in isolation, against a fixed threshold, is the wrong way to read the data. It is like trying to understand a symphony by listening to each instrument separately and asking only whether it is playing in tune.

Cancer does not push a single blood value dramatically out of range. That is not how it works biologically, and it is not what the published evidence shows. What cancer does, over months and years before it is large enough to see on any imaging study, is alter the entire landscape of the blood in dozens of small, correlated ways. It shifts the balance between white blood cell populations. It depresses hemoglobin by fractions. It changes the distribution of red blood cell sizes. It elevates platelet counts through inflammatory signaling. It alters liver enzyme ratios. It affects glucose metabolism. Every one of these individual changes is too small to trigger a clinical alarm. Taken together, across sixty-plus values, read as a pattern rather than a checklist, they form what researchers have come to call a biological fingerprint—a signature that is detectable in the bloodstream far earlier than any tumor becomes visible to a radiologist.¹

This is the central insight behind the algorithms described in this book. It is not a technological breakthrough in the sense of a new kind of test or a new kind of instrument. The blood draw is the same blood draw. The laboratory analysis is the same analysis. What is new is the way we read the result.

Why Tumors Speak Through Blood

To understand why routine blood work contains cancer signals, you need to understand that a tumor is not a passive mass sitting quietly in a corner of the body. From its earliest stages, a growing tumor is an active participant in the body's biology, recruiting resources, evading defenses, and leaving traces of its activity throughout the bloodstream.

The first thing a growing tumor needs is a blood supply. Without one, it cannot grow beyond a few millimeters in diameter. To solve this problem, tumor cells release signaling molecules, including vascular endothelial growth factor and related proteins, that stimulate the formation of new blood vessels, a process called angiogenesis. This vascular recruitment is systemically detectable: the platelet count rises, because platelets are mobilized to support new vessel formation and because tumors stimulate thrombopoietin production through inflammatory cytokines. The platelet elevation is subtle, entirely within the normal reference range for most patients, but it is measurable and it is consistent. Researchers studying the blood records of patients who later developed colorectal cancer have found that platelet counts begin trending upward two to four years before diagnosis, at a rate so gradual that no single measurement would raise concern.²

The second biological process that leaves blood signatures is inflammation. The immune system recognizes a growing tumor as a threat and responds. Neutrophils, the most abundant white blood cells and the immune system's first responders, are mobilized in elevated numbers. Lymphocytes, the adaptive immune cells that provide targeted responses, are simultaneously suppressed as the tumor actively manipulates the immune environment to protect itself. The ratio of neutrophils to lymphocytes, known as the NLR, rises. This is not a dramatic rise: a patient with an NLR of 2.1 rather than the more typical 1.8 will generate no clinical concern. But across a population of thousands of patients, the upward trend in NLR precedes lung cancer diagnosis by a year or more, rising at roughly two and a half percent annually in cancer patients compared to a fraction of a percent in healthy controls. Machine learning detects this trajectory. A physician reviewing a single annual blood panel does not.³

The third process is iron depletion. Colorectal, gastric, and other gastrointestinal cancers bleed into the gut, often invisibly, over months and years before diagnosis. The cumulative blood loss gradually depletes iron stores, which manifests in the CBC as slowly declining hemoglobin, shrinking red blood cells measured by mean corpuscular volume, and increasing variation in red blood cell size measured by red cell distribution width, or RDW. In one study of more than a million Scandinavian blood donors tracked over many years, hemoglobin began declining measurably in patients who later developed colorectal cancer three to four years before their diagnosis, declining at a rate of roughly 0.28 grams per deciliter per six-month period while remaining within the normal reference range throughout. The signal was real and consistent. No alarm sounded. No physician noticed. Because the evaluation framework—each value against its individual threshold—was not designed to see it.⁴

The fourth process is metabolic disruption. Pancreatic cancer destroys insulin-producing cells as it grows, causing fasting blood glucose to rise gradually over the years preceding diagnosis. Liver cancer disrupts the organ's production of albumin, its processing of bilirubin, and its synthesis of clotting factors, all of which appear on the comprehensive metabolic panel. Kidney cancer alters creatinine and inflammatory markers. Multiple myeloma produces excess immunoglobulins that raise total protein while bone destruction raises calcium and the abnormal proteins themselves damage the kidneys, elevating creatinine. Each cancer has its own metabolic signature, its own characteristic pattern of perturbations across the sixty-plus values of the CBC and CMP, reflecting the specific biological damage the tumor is causing to the organs and systems around it.⁵

What all of these signatures share is that they are too small and too distributed to be legible to a human physician reading a standard blood report. They are not hidden. They are present in the data, in black and white, in the same report that lands in the electronic medical record after every annual physical. What has been missing, until recently, is the analytical framework capable of reading them.

The Detective and the Fingerprint

Consider how a skilled detective approaches a crime scene. No single piece of evidence, taken alone, solves the case. A footprint outside a window means nothing by itself. Neither does a partial fingerprint on a glass, or a witness who noticed an unfamiliar car on the street, or a receipt from a gas station two miles away. But the detective who integrates all of these observations—who sees that the footprint matches a size twelve shoe, that the car matches the description of a vehicle registered to someone with a connection to the victim, that the receipt was timestamped an hour before the crime—is reading a pattern that no individual clue reveals.

This is precisely how cancer detection algorithms work with blood data. No single blood value flags cancer. The combination of values, read simultaneously and compared against the patterns learned from hundreds of thousands of patients who later developed cancer, produces a risk assessment that no individual measurement can provide.

The analogy extends further. A detective's most powerful tool is not any single piece of evidence but the ability to compare the current crime scene against a vast mental library of past cases. What looks unusual here? What matches patterns seen before? The machine learning algorithms trained on large patient populations are doing exactly this: they carry a learned model of what the blood looks like in patients who develop each of the thirteen target cancers, at various intervals before diagnosis, and they compare each new patient's blood values against that model.

The comparison is not simple. The algorithms are not looking for a single threshold. They are looking for the specific combination and trajectory of values that, in the training data, preceded cancer. A patient whose hemoglobin is declining gradually while platelet count is trending upward and RDW is rising, all within normal ranges individually, presents a pattern that the ColonFlag algorithm recognizes from 606,000 training examples as characteristic of developing colorectal cancer. A patient whose neutrophil-to-lymphocyte ratio has been rising steadily while hemoglobin drifts downward and red cell distribution width increases presents a pattern the LungFlag algorithm recognizes from nearly 200,000 training examples. The individual values tell one story. The pattern across values, read over time, tells another.⁶

How the Algorithms Learn

Readers who do not have a background in machine learning sometimes find the technology intimidating, so let me describe it in plain terms, because the underlying idea is actually straightforward.

The ColonFlag algorithm was built by giving a computer access to the complete blood test records of more than 600,000 patients enrolled in Maccabi Healthcare Services, one of Israel's largest health systems. Some of those patients had gone on to develop colorectal cancer. The computer's task was to find the patterns in the blood data that distinguished patients who later developed cancer from patients who did not, and to learn those patterns well enough to recognize them in new patients whose outcomes were not yet known.

The technique used to accomplish this, called gradient boosting, works by building a large ensemble of simple decision rules and combining them into a single powerful prediction. Think of it this way. Imagine you are trying to predict which students in a class will struggle on an exam. A single rule—students who scored below seventy on the last quiz—is informative but not very accurate on its own. Add a second rule: students who scored below seventy on the last quiz and who missed two or more classes. Better. Add a third: students who also reported spending fewer than three hours studying. Better still. Gradient boosting does this automatically, generating hundreds of these simple rules, each one correcting for the errors left by the previous ones, and combining them into a prediction that is far more accurate than any individual rule could achieve.

Applied to blood data, the gradient boosting algorithm might learn rules like: patients whose hemoglobin declined by more than 0.5 grams per deciliter over two years, and whose RDW increased by more than one percentage point, and whose platelet count trended upward by more than fifteen units, face a substantially elevated risk of colorectal cancer in the next eighteen months. Each component of that rule is individually unremarkable. The combination is highly informative. And because the algorithm has learned from 600,000 real patient records, it has had the opportunity to discover combinations and trajectories that no human researcher could have identified through manual examination of the data.⁷

The output of the algorithm is a risk score for each patient, typically expressed as a probability or a percentile ranking within the population. Patients above a certain risk threshold are flagged for follow-up. The threshold can be adjusted based on the clinical context: a more sensitive threshold catches more cancers but generates more false positives; a more specific threshold reduces false positives but misses some early cancers. This tradeoff is not unique to machine learning algorithms. It is inherent in every cancer screening test ever developed, and clinicians navigate it routinely.

What makes the machine learning approach distinctive is the scale and complexity of the patterns it can detect. The human visual system is extraordinarily good at recognizing faces and objects, tasks it evolved to perform. It is not good at detecting subtle, correlated trends across sixty numerical variables measured at irregular intervals over several years. That is not a criticism of physicians. It is a statement about the nature of the problem. No human, however skilled, can hold 600,000 patient histories in memory and use them to calibrate a risk assessment for a new patient in real time. A trained algorithm can.⁸

What the Detection Window Actually Means

When we say that these algorithms can detect cancer before conventional diagnosis, it is worth being precise about what that means and why it matters so enormously for patient outcomes. The detection window varies considerably by cancer type, from months in some cases to several years in others, depending on how early and how consistently each cancer leaves its signature in the blood.

For colorectal cancer, the ColonFlag algorithm maintains meaningful predictive accuracy for blood tests drawn up to two years before clinical diagnosis. The hemoglobin decline that precedes colorectal cancer, as documented in the Scandinavian blood donor study, begins three to four years before diagnosis in many patients. For multiple myeloma, the characteristic cross-panel signature—rising total protein, rising calcium, declining hemoglobin, and rising creatinine—has been documented in patient records two to five years before clinical diagnosis. For chronic lymphocytic leukemia, researchers analyzing seven years of longitudinal CBC data found that the lymphocyte count trajectory begins diverging from healthy controls several years before the disease is diagnosed. The detection window is not a single number. It is a range that reflects both the biology of each cancer and the sensitivity of the algorithm trained to detect it.⁹

Why does this matter so profoundly? Because the relationship between detection timing and survival is not linear. It is not the case that finding a cancer six months earlier merely gives the patient six more months of knowing they are ill. For most of the thirteen cancers on this list, finding the disease at an earlier stage crosses a clinical threshold that changes the entire character of what medicine can offer.

Colorectal cancer found at Stage I has a five-year survival rate of approximately ninety-one percent. Found at Stage IV, it is fourteen percent. That is not a marginal difference. Those are two entirely different clinical situations. Stage I colorectal cancer is managed with surgical resection, often performed laparoscopically, with a recovery measured in weeks and a cure rate that should by any reasonable standard be described as excellent. Stage IV colorectal cancer is managed with combinations of surgery, chemotherapy, targeted therapy, and immunotherapy, with a median survival measured in months and a cure rate that, despite extraordinary medical effort, remains devastatingly low.

Pancreatic cancer illustrates the same principle with even starker numbers. When detected while still confined to the pancreas, the five-year survival rate is approximately fifty percent. After the cancer has spread to distant organs, which is the stage at which the vast majority of patients are diagnosed because the disease produces no symptoms until it is far advanced, the five-year survival rate is three percent. The difference between those two numbers is not a difference in treatment. The same chemotherapy regimens exist in both cases. The difference is entirely whether surgery was possible, and surgery is possible only in the earlier stages. Finding pancreatic cancer a year or two earlier than conventional diagnosis, which the published machine learning models suggest is achievable in a meaningful fraction of patients, is not a marginal improvement in care. It is the difference between a realistic chance of survival and almost certain death.¹⁰

Ovarian cancer follows the same pattern. Found at Stage I, when it is still confined to the ovary, the five-year survival rate is ninety-three percent. Found at Stage IV, which is when most women discover they have the disease because there is currently no established screening test, the five-year survival rate is thirty-one percent. There is no treatment advance that will close that gap. Only earlier detection can.

The Limits of the Approach

Intellectual honesty requires acknowledging the limitations of this approach alongside its extraordinary potential, and I want to do that directly.

First, not all patients with early cancer show detectable blood signatures at the same interval before diagnosis. The algorithms identify a risk score, not a certainty. A high-risk flag means the patient has a substantially elevated probability of developing or harboring cancer relative to the general population. It does not mean the patient has cancer. Most patients who are flagged will not have cancer, and appropriate follow-up testing is required to confirm or rule out the finding. This is true of every cancer screening test that has ever been developed. Mammography flags abnormalities that turn out to be benign in the majority of cases. Low-dose CT detects lung nodules that are not malignant in most patients. The question is not whether the algorithm is perfect but whether it is useful, and the clinical evidence from deployed systems answers that question clearly in the affirmative.

Second, the accuracy of the algorithms varies by cancer type. ColonFlag and LungFlag, the most validated and deployed algorithms in this family, achieve area under the curve scores of 0.82 and 0.856 respectively. Published models for ovarian cancer achieve 0.95 to 0.97. For multiple myeloma, published models achieve 0.957 to 0.968. For breast cancer, the largest published study achieves a more modest 0.64, reflecting the fact that breast cancer's primary detection mechanism is imaging rather than blood chemistry, and the blood signature is correspondingly weaker. The algorithms are not equally powerful for all cancers, and that variation should inform how they are deployed and how their outputs are communicated to clinicians and patients.¹¹

Third, all of the performance figures cited in this book come from studies conducted on specific populations at specific health systems with specific laboratory protocols. Algorithm performance can degrade when applied to populations with different demographic characteristics, different laboratory equipment, or different blood drawing schedules. This is why prospective validation at multiple health systems, across diverse populations, is the necessary next step before broad clinical deployment. It is not a reason to delay indefinitely. The ColonFlag algorithm was developed on an Israeli population and validated in the United Kingdom with comparable performance. The LungFlag algorithm was developed and validated entirely within a single large American health system. The evidence of cross-population generalizability is encouraging, but it needs to be confirmed, and that confirmation requires institutional commitment to the validation work.¹²

These limitations are real and should be taken seriously. They are also the normal limitations of any early-stage clinical tool—limitations that have been managed successfully for every cancer screening technology that is now considered standard of care. The Pap smear had high error rates in its early clinical deployment. Low-dose CT generates false positives that require follow-up imaging. PSA testing produces both false positives and the overdiagnosis of clinically insignificant cancers. None of these limitations prevented those tools from saving hundreds of thousands of lives. They were managed through clinical protocol development, threshold calibration, and the accumulation of real-world experience. The same path is available for the blood test algorithms.

Sixty Values, Sixty Million Stories

I want to close this chapter with a thought that I find both humbling and galvanizing.

Every year, roughly two hundred million routine blood panels are drawn in the United States. Each of those panels generates sixty or more values. Each of those values reflects something real about the biological state of the person whose blood is in the tube. In aggregate, those two hundred million blood draws represent the most comprehensive ongoing biological surveillance of a human population that has ever existed. We collect this data every year, at enormous collective cost, as part of the routine infrastructure of primary care medicine.

And then we read each value against a threshold and file the result.

The algorithms described in this book, and in the peer-reviewed literature on which this book draws, represent the first serious attempt to read this data the way it deserves to be read: as a system, as a pattern, as a longitudinal record of biological change that precedes disease by months or years and contains, for a meaningful fraction of those two hundred million patients, an early warning that medicine has the tools to act on.

We are not talking about a new test. We are not talking about a new blood draw. We are not talking about new equipment or new patient behavior or new clinical infrastructure. We are talking about a new way of reading data that already exists, using tools that already work, to find cancers that are already announcing themselves in language that, until recently, we did not know how to read.

The ColonFlag algorithm, deployed at Geisinger Health System in Pennsylvania, flagged 706 patients from a pool of 25,610 who were overdue for colorectal cancer screening. Among the 104 who completed colonoscopy, eight percent had colorectal cancer, compared to the roughly one percent detection rate in standard screening. That is an eightfold improvement, achieved with no new blood draw, no new laboratory equipment, and no change in patient behavior. The patients whose cancers were found in that deployment were not the patients who would have been found through conventional screening alone. They were patients who would have gone home with a normal blood test result, returned the following year with a normal blood test result, and eventually presented with symptoms of advanced disease.¹³

That is the cost of reading the data the old way. That is what the new way prevents.

The biology is settled. The algorithms work. The data is waiting. What happens next is a question not of science but of will, and that is where the rest of this book turns.

Chapter 4

Thirteen Cancers

The Diseases, the Stakes, and What Finding Them Earlier Changes

First, a Number You Need to Understand

Before we look at the thirteen cancers, there is one piece of scientific vocabulary that appears in the research behind all of them. Understanding it takes about two minutes, and once you do, you will be able to judge for yourself how good these algorithms actually are.

The number is called the AUC, which stands for area under the curve. It measures how well a test separates people who have a disease from people who do not. Think of it as the test's batting average: how often does it get the call right?

The scale runs from 0.5 to 1.0. A score of 0.5 means the test is useless, no better than guessing. A score of 1.0 means the test is perfect, catching every case while never falsely alarming a healthy person. No test in medicine ever reaches 1.0. The question is always how close you can get, and how that compares to the tests we already use and trust.

Here is a reference that puts the numbers in context:

AUC Score	What It Means	Familiar Comparison
0.50	No better than flipping a coin	Random guessing
0.70–0.75	Useful, but limited	PSA test for prostate cancer
0.80–0.85	Good—clinically meaningful	Mammogram for breast cancer
0.85–0.90	Very good—stronger than most standard tests	LungFlag blood algorithm
0.90–0.97	Excellent—exceptional for any screening test	Ovarian, myeloma blood algorithms
1.00	Perfect—never achieved in medicine	Does not exist

A mammogram, which has been the foundation of breast cancer screening for decades and is accepted worldwide as a standard of care, typically achieves an AUC of around 0.80 to 0.85. When you see the blood test algorithms in this chapter hitting scores of 0.85, 0.90, or higher, you are reading numbers that match or exceed that standard, from a routine blood draw that the patient was already going to have anyway.

One more piece of vocabulary, used specifically for ColonFlag because it is the most completely studied algorithm in this group with real clinical deployment data. When any screening test flags a patient as high risk, four questions matter:

Sensitivity asks: of all the patients who actually have cancer, what percentage did the test catch? A sensitivity of 88 percent means the test finds 88 out of every 100 true cancer cases.

Specificity asks: of all the patients who do not have cancer, what percentage did the test correctly clear? A specificity of 71 percent means the test correctly reassures 71 out of every 100 healthy patients.

Positive predictive value, or PPV, asks: of all the patients the test flagged as high risk, what percentage actually had cancer? For a cancer affecting roughly one percent of the adult population, even an excellent test will have a modest PPV, perhaps eight to ten percent, because most people in any screened group are healthy. This is not a failure of the test. It is arithmetic. Mammography has a PPV of around ten percent in average-risk women. ColonFlag in its UK clinical deployment had a PPV of 9.15 percent, right in line with mammography.

Negative predictive value, or NPV, asks: of all the patients the test cleared, what percentage were truly cancer-free? This is the number that gives patients real peace of mind. ColonFlag's NPV in clinical deployment was 99.45 percent, meaning that of all the patients it cleared, fewer than one in two hundred turned out to have cancer.

With that framework in hand, here are the thirteen cancers.

The Thirteen Cancers

1. Colorectal Cancer

Colorectal cancer kills roughly 53,000 Americans every year. That number does not have to be anything close to what it is, because colorectal cancer, when found early, is among the most curable diseases in medicine. Catch it at Stage I and more than ninety percent of patients survive five years or more. Wait until it has spread to other organs and that number falls to fourteen percent. That fourteen percent figure has not meaningfully improved in forty years, despite billions of dollars in treatment research. The problem is not the drugs. The problem is that too many patients are still being found too late.²

Colorectal tumors bleed slowly into the intestinal tract, often invisibly, over months and years before diagnosis. That gradual blood loss leaves a trail in the routine complete blood count: hemoglobin drifts downward, red blood cells gradually shrink, and a measurement called red cell distribution width, which reflects the variation in red blood cell size, begins to rise. Meanwhile, platelet counts edge upward as the tumor sends inflammatory signals through the bloodstream. None of these changes, on its own, would concern any physician reviewing a standard blood panel. Each value stays within the normal range. But the pattern across all of them, read simultaneously by a machine learning algorithm trained on the records of more than 600,000 patients, is recognizable as the fingerprint of developing colorectal cancer.

The algorithm built from that training data, called ColonFlag, achieves an AUC of 0.82, which puts it in the same performance range as a mammogram, and it can detect the disease from blood tests drawn up to two years before a conventional diagnosis would be made. In clinical deployment in the United Kingdom, it achieved sensitivity of 88 percent, meaning it caught 88 of every 100 true cancer cases. Its NPV was 99.45 percent, meaning that patients it cleared had less than a one in two hundred chance of having missed cancer. Its PPV of 9.15 percent, finding cancer in roughly nine of every hundred patients it flagged, is comparable to mammography in average-risk women, a standard medicine already considers acceptable and worthwhile. When deployed at a health system in Pennsylvania, it identified high-risk patients among those overdue for standard screening. Of the flagged patients who went on to have a colonoscopy, eight percent had colorectal cancer, compared to roughly one percent in standard screening. That is an eightfold improvement, achieved with no new blood draw, no new equipment, and no change in what the patient did. ColonFlag is already live in Israel, the United Kingdom, and the United States.³

2. Lung Cancer

Lung cancer is the deadliest cancer in America. It kills approximately 127,000 people every year, more than colorectal, breast, and prostate cancer combined. Eighty-five percent of patients are diagnosed after the cancer has already spread beyond the lung, at which point five-year survival is eight percent. Find it while it is still confined to the lung, and that rate rises to sixty percent. Seven and a half times as many people survive when the disease is caught early. The treatment is the same. Only the timing is different.⁴

Lung cancer's blood signature is primarily inflammatory. As the tumor grows, it manipulates the immune system, pushing neutrophil counts upward while suppressing lymphocytes, two types of white blood cells measured on every routine blood count. The ratio between them climbs steadily in patients who will later develop lung cancer, at a rate that machine learning algorithms can track and distinguish from normal year-to-year variation in healthy people. These changes are particularly clear in the months approaching diagnosis, though earlier signals are present in many patients.

The LungFlag algorithm, validated on nearly 200,000 patients at Kaiser Permanente in Southern California, achieves an AUC of 0.856, which puts it comfortably above the mammogram benchmark and into the very good range on the scale above. At that threshold, it identifies forty percent of future lung cancer patients before their clinical diagnosis, at a false positive rate of just five percent. Put differently, for every hundred patients it flags, ninety-five are genuinely at elevated risk. It also catches a significant proportion of patients who would not qualify for a low-dose CT scan under current screening guidelines, reaching people the existing system misses entirely. LungFlag has completed large-scale validation and is positioned for broader clinical deployment.⁵

3. Liver Cancer

Liver cancer kills approximately 26,000 Americans per year. The liver can keep functioning even as disease progresses, which is precisely what makes liver cancer so dangerous: by the time a patient feels anything, the disease has usually spread. Five-year survival at the localized stage is about thirty-eight percent. After it has spread, survival falls to three percent.⁶

The liver cannot hide what it is doing from a routine blood panel. The organ produces clotting factors, processes bilirubin, and synthesizes the protein albumin. When cancer disrupts these functions, the disruption shows up in liver function tests and blood count values that physicians have been ordering for decades. A calculation called the FIB-4 index, derived from four standard values that any lab already reports, tracks the progression of liver damage and cancer risk over time. The trajectory of FIB-4 across multiple blood draws is among the strongest predictive signals in the liver cancer literature.

A study conducted across an entire health system in Hong Kong, involving more than 75,000 patients, used a machine learning algorithm trained on routine blood values to detect liver cancer with an AUC of 0.894, which falls in the very good to excellent range on the scale above, and outperforms the current standard tumor marker test used for liver cancer surveillance. A separate study drawing on more than 900,000 individuals from two major research databases in the United States and United Kingdom achieved an AUC of 0.88, demonstrating that the approach works across diverse populations. Neither algorithm is yet deployed in routine clinical practice. The validation has been done. The deployment has not.⁷

4. Gastric Cancer

Gastric cancer kills approximately 11,000 Americans per year and is far more lethal globally. Like colorectal cancer, it bleeds into the digestive tract, producing a gradual iron deficiency that the blood count captures as declining hemoglobin and rising red cell size variation. In published research, blood count markers outperform conventional tumor marker blood tests for early-stage gastric cancer, which is significant because conventional markers are what physicians currently rely on.⁸

Machine learning models trained on routine blood values have achieved AUC scores between 0.90 and 0.97 in published studies, scores that fall in the excellent range on our reference scale. One model was particularly strong for patients whose conventional tumor markers tested normal, precisely the patients who would otherwise be reassured and sent home undetected. A 193,000-patient validation study found the algorithm correctly identified between eight and nine out of ten future gastric cancer cases. The primary remaining step before broad deployment is validation in Western patient populations, where the disease is less common but no less deadly when it occurs.⁹

5. Pancreatic Cancer

Pancreatic cancer is the disease that most starkly illustrates what this book is about. It is feared, it is studied, it is abundantly funded, and it kills almost everyone it reaches. Approximately 51,000 Americans die from it every year. The overall five-year survival rate is around twelve percent, one of the lowest of any cancer.

And yet, caught while still confined to the pancreas, the five-year survival rate is fifty percent. Surgical removal at that stage offers a realistic chance of cure. The problem is that pancreatic cancer almost never announces itself until it is far too late. The pancreas sits deep in the abdomen, surrounded by other organs. Tumors can grow for years without pressing on anything that causes pain or visible symptoms. More than eighty percent of patients are diagnosed at Stage III or IV, at which point surgery is no longer possible and the five-year survival rate collapses to three percent. The gap between fifty percent and three percent is not a gap in treatment options. It is a gap in timing.¹⁰

What makes the blood test approach so important for pancreatic cancer is that the disease does leave early traces, even though the tumor itself is invisible on any scan for years. As it grows, the tumor compresses the bile duct, causing certain liver enzymes to rise gradually on the comprehensive metabolic panel months before the obstruction becomes symptomatic. Simultaneously, it damages the insulin-producing cells of the pancreas, causing fasting blood glucose to drift upward in a pattern that diverges from healthy patients more than two years before diagnosis. These are values your doctor already orders and reviews at every annual physical.

Machine learning models trained on these blood patterns have identified more than half of future late-stage pancreatic cancer patients two years or more before their eventual diagnosis, at a stage where surgery was still possible. The algorithms exist. The validation studies have been published. What is missing is the institutional commitment to integrate them into clinical workflows.¹¹

6. Ovarian Cancer

Ovarian cancer is among the clearest examples of what the absence of early detection costs in human lives. There is no established population screening test. The disease causes no reliable early symptoms. Most women are diagnosed at Stage III or IV, when the cancer has spread throughout the abdominal cavity.

Found at Stage I, when still confined to the ovary, the five-year survival rate is ninety-three percent. Found at Stage IV, where the majority of women discover they have it, survival drops to thirty-one percent. Approximately 13,000 American women die of ovarian cancer every year, and the overwhelming majority of those deaths occur in women who had no way to know the disease was developing.¹²

Ovarian tumors trigger a well-documented biological response: they stimulate the production of a protein that causes the bone marrow to manufacture excess platelets. This shows up in the routine blood count as a slow, steady rise in platelet count over the eighteen months or more preceding diagnosis, followed by a more rapid rise in the final six months. Published data shows that an abnormally elevated platelet count on a routine blood test is associated with a more than twentyfold increase in ovarian cancer risk. This signal is not a new discovery. It has been published and validated. What has not happened is its systematic use in clinical practice.

Machine learning models trained on routine blood values have achieved AUC scores of 0.95 to 0.97 for ovarian cancer detection, scores that land squarely in the excellent range on the reference scale, performance levels that would be considered extraordinary for any cancer screening test in medicine. A multicenter study involving eleven thousand patients maintained strong performance on external validation, outperforming the standard CA-125 tumor marker test for early-stage diagnosis. Ovarian cancer is among the highest-priority candidates for consortium deployment, precisely because the unmet clinical need is so large and the published evidence so strong.¹³

7. Kidney Cancer

Kidney cancer kills approximately 15,000 Americans per year. It has no established population screening test and is typically found by accident, when a scan performed for an unrelated reason happens to catch a tumor, or after symptoms develop. Found at the localized stage, the five-year survival rate is ninety-three percent. Found after it has spread, survival falls to nineteen percent.¹⁴

The blood signature includes declining hemoglobin, rising inflammatory markers, and gradually increasing creatinine as the tumor impairs kidney function, all values present on a standard blood panel. A study of thousands of patients confirmed that these routine blood test abnormalities increase measurably in the months before kidney cancer diagnosis, providing a detection window that algorithmic analysis can exploit.

A machine learning model using just eight standard blood markers achieved an AUC of 0.932, which places it solidly in the excellent range, with sensitivity and specificity both above eighty-six percent. That level of performance from eight values already present on every routine blood panel is striking. The primary remaining step is validation in larger and more diverse patient populations.¹⁵

8. Multiple Myeloma

Multiple myeloma is a cancer of the plasma cells, the immune cells that produce antibodies. It progresses slowly, often through a pre-cancerous stage that can last years, which makes it a particularly well-suited target for early detection. Approximately 12,000 Americans die from it annually.

The blood signature of myeloma is among the most distinctive and longest-preceding of any cancer in this group. The excess immunoglobulins the cancer produces raise total protein levels. The bone destruction it causes raises calcium. The kidney damage from abnormal proteins raises creatinine. Hemoglobin and albumin decline. These changes appear in the standard blood panel two to five years before clinical diagnosis in many patients, a detection window that is longer than for almost any other cancer in this group. In a large study of myeloma cases from United Kingdom primary care records, hemoglobin began declining measurably three years before diagnosis, and elevated calcium carried an odds ratio of more than eleven for subsequent myeloma diagnosis, meaning a patient with elevated calcium was eleven times more likely than average to develop myeloma.¹⁶

Machine learning models trained on these routine blood markers have achieved AUC scores of 0.957 to 0.968, placing them in the excellent range, among the highest-performing models across any cancer in this series. Treated at the pre-malignant or early stage, myeloma is a manageable chronic disease. Treated at late stage, it is often fatal.¹⁷

9. Leukemia

The leukemias are cancers of blood-forming cells, killing approximately 24,000 Americans per year. Because they originate in the blood itself, the routine blood count is already the primary tool physicians use to diagnose them. The algorithmic opportunity here is to read the signals already present in the blood count earlier and more systematically than current clinical practice achieves, catching the disease before it has progressed to a stage that is harder to treat.¹⁸

For the most common adult leukemias, specific patterns in the blood count differential—the breakdown of white blood cell subtypes—begin to diverge from healthy patterns well before the counts cross the thresholds that trigger clinical concern. One particularly telling marker is a cell type called basophils, which begins to rise in certain leukemia patients before total white blood cell count reaches the threshold a physician would act on.

Machine learning models have achieved AUC scores between 0.87 and 0.96, a range that spans good to excellent on our reference scale, with one study training on more than one million patients over seven years of serial blood count data and achieving an AUC of 0.92. Among all the cancers in this group, leukemia algorithms are among the closest to clinical deployment readiness, in part because the blood count infrastructure required to run them is already universal.¹⁹

10. Lymphoma

Lymphoma, cancer of the lymph nodes and lymphatic system, kills approximately 22,000 Americans per year across Hodgkin and non-Hodgkin forms. Both produce characteristic changes in the blood count and metabolic panel that precede diagnosis by months to years. Inflammatory markers rise. Lymphocyte counts shift. LDH, a standard metabolic panel value that reflects cell turnover, is abnormal in more than half of non-Hodgkin lymphoma patients at diagnosis.²⁰

A machine learning model trained on blood count data from more than 663,000 patients achieved an AUC of 0.84 at six months before diagnosis, a score in the good to very good range, rising to 0.85 when incorporating five years of prior blood test history. That improvement with longer history is an important finding: the blood signal for lymphoma accumulates over time, and algorithms with access to serial blood test records outperform those reading a single snapshot. Perhaps most tellingly, an analysis of the same patient database found that blood testing activity increased noticeably five or more years before hematological cancer diagnosis, meaning physicians were already detecting something unusual in their patients' blood but lacked the framework to act on it systematically.²¹

11. Bladder Cancer

Bladder cancer kills approximately 17,000 Americans per year and has no established population screening test. Found at the localized stage, the five-year survival rate is around seventy percent. After it has spread, survival falls to eight percent. Chronic inflammation from the tumor produces measurable changes in blood chemistry months before diagnosis, including rising inflammatory markers, declining hemoglobin, and shifts in certain metabolic panel values.²²

A machine learning model trained on routine laboratory data achieved an AUC of 0.88 to 0.92, landing in the very good to excellent range, for bladder cancer detection. A separate study tracking thousands of patients confirmed that routine blood test abnormalities increase measurably in the six to eight months before bladder cancer diagnosis. The signal is present in the data we already collect. The analytical framework to act on it at scale does not yet exist in most clinical settings.²³

12. Esophageal Cancer

Esophageal cancer kills approximately 16,000 Americans per year and has one of the lowest survival rates in this group: five percent at distant stage. Like the other gastrointestinal cancers, it bleeds, produces chronic inflammation, and disrupts metabolic function in ways that appear in the blood count and metabolic panel before symptoms develop. The blood signature parallels that of colorectal and gastric cancers: declining hemoglobin, rising red cell size variation, and elevated inflammatory ratios.²⁴

The biological case for algorithmic detection of esophageal cancer from routine blood work is well established, consistent with what has been demonstrated for the other gastrointestinal cancers. What esophageal cancer needs, and does not yet have, is the large-population machine learning study of the kind that produced ColonFlag and LungFlag. That study is the priority. The signal is there to find.²⁵

13. Thyroid Cancer

Thyroid cancer is diagnosed in roughly 44,000 Americans per year and kills approximately 2,000, a much smaller mortality burden than others on this list, because the most common forms are slow-growing and highly treatable. But aggressive thyroid cancer subtypes carry a much worse prognosis, and earlier detection of any form allows for simpler, less invasive treatment. Thyroid-stimulating hormone, or TSH, is already a standard value on many annual blood panels, and its trajectory over time, combined with changes in cholesterol and blood count values, reflects the metabolic disruption of thyroid dysfunction and malignancy.²⁶

Machine learning models using TSH trajectory combined with routine blood count and metabolic panel values have achieved AUC scores of approximately 0.91, placing them in the excellent range on our reference scale, in published studies. The existing infrastructure for TSH monitoring, already routine for broad segments of the population, makes thyroid cancer one of the more straightforward integration opportunities in this group.²⁷

What the Thirteen Tell Us

Read together, the thirteen cancers in this chapter share a pattern that is both scientifically compelling and morally urgent.

For every one of them, there is a large gap between survival when the disease is found early and survival when it is found late. For most of them, that gap is not a matter of better drugs or more aggressive treatment. It is a matter of timing. The same cancer, the same biology, treated with the same medicine, produces radically different outcomes depending only on when it is found.

Figure 1. Five-year survival rates by stage at diagnosis. Early = Stage I/II. Late = Stage III/IV. Data derived from SEER national cancer surveillance database.

For every one of them, the disease leaves traces in the bloodstream before it is detectable by conventional means. The traces vary by cancer, from the slow iron depletion of colorectal and gastric tumors, to the inflammatory signaling of lung cancer, to the distinctive protein and calcium changes of myeloma. But every one of these cancers announces itself in the data that is already being collected at every annual physical, in every doctor's office and hospital lab in the country.

For every one of them, machine learning algorithms capable of reading those traces have been developed and validated with AUC scores that range from the good range comparable to a mammogram, to the excellent range that would be considered extraordinary for any screening test in medicine. Two of those algorithms are already deployed in clinical practice. Several more are validated and ready. The remainder need the kind of large-population prospective studies that a committed consortium of health systems could conduct within a few years.

Figure 2. Current development status of cancer detection algorithms for 13 target cancers.

The combined annual death toll from these thirteen cancers in the United States is approximately 393,000 people. Conservative projections, assuming that algorithmic blood test analysis reaches half the adult population and performs at the level already demonstrated by the deployed ColonFlag and LungFlag systems, estimate that 100,000 to 175,000 of those deaths could be prevented each year. That estimate is deliberately cautious. It does not assume perfect algorithm performance. It does not assume universal adoption. It simply asks: what happens if we deploy what we already have, to half the people who are already getting blood tests, and it works as well as the deployed systems already show it can?²⁸

Figure 3. Conservative estimates of lives saveable annually per cancer type. Total: ~100,000 to 175,000 lives per year. Assumes 50% population penetration.

One hundred thousand to 175,000 lives. Every year. Saved not by a new drug, not by a scientific breakthrough still years away, but by reading the data we already collect in a way we already know how to read it.

The next chapter describes what two of those algorithms look like in clinical practice: the patients who were found, what their experience reveals, and what it tells us about what needs to happen next.

Chapter 5

ColonFlag and LungFlag

What Happens When the Algorithm Actually Runs

Picture a woman in her late fifties living in rural Pennsylvania. She sees her primary care doctor once a year, gets her blood drawn as part of the routine annual physical, and goes home. Her doctor reviews the results, notes that everything looks normal, and files them in the electronic health record. This has happened every year for several years running. The numbers are unremarkable. Nothing flags. Nothing worries anyone.

Except that something is growing.

Colorectal cancer does not announce itself. It does not cause pain in its early stages. It does not produce symptoms that would send a patient rushing to a doctor. It grows slowly and silently, and the blood it sheds into the intestinal tract does so in amounts so small that no one notices. What a physician sees when reviewing the annual blood panel is a hemoglobin reading that is, technically, within the normal range. Red blood cells that are, technically, normal in size. Platelet counts that are, technically, unremarkable.

What the physician cannot see, because no human eye is capable of seeing it, is that this woman's hemoglobin has been declining by a small but consistent amount every six months for the past three years. Her red blood cells have been gradually shrinking. Her platelet count has been slowly climbing. Each individual reading looks fine. The trend across all of them, read together, is not fine at all.

ColonFlag sees it.

How ColonFlag Works in Practice

ColonFlag is not a new blood test. It does not require a separate appointment, a different laboratory, or any additional needle sticks. It runs silently in the background, analyzing the same blood panel that has always been drawn, looking at the same numbers the doctor has always reviewed, and asking a question that no human reviewing those numbers has ever been trained to ask: does the pattern across all of these values, read over time, match what we know precedes colorectal cancer?

The algorithm was built by a team of Israeli data scientists who spent years studying the blood records of more than 600,000 patients enrolled in Maccabi Healthcare Services, one of the largest health systems in Israel. Some of those patients had gone on to develop colorectal cancer. The scientists trained a computer to find the patterns in the blood data that distinguished those patients from the ones who stayed healthy, and to learn those patterns well enough to recognize them in future patients whose outcomes were not yet known. The result was an algorithm that could look at a standard blood count result and, with accuracy comparable to a mammogram, identify patients whose blood was telling a story their doctors had not yet heard.¹

The way it works in a clinical setting is straightforward. When a patient's blood is drawn and analyzed by the laboratory, the results flow into the electronic health record, as they always have. ColonFlag runs on those results automatically, in the background, requiring nothing additional from the patient or the physician. If the patient's blood pattern falls into the high-risk category, an alert goes to the treating physician: this patient's blood values suggest an elevated risk of colorectal cancer and warrant follow-up. The physician then decides what to do, typically a referral for colonoscopy, which can confirm or rule out the finding.

If the patient's results do not trigger an alert, nothing changes. The blood test result is filed as normal, the patient goes home, and the algorithm moves on to the next patient. The whole process adds no time to the appointment, no cost to the blood draw, and no burden to anyone in the clinical workflow.

That simplicity is not an accident. It is the design principle that makes this approach fundamentally different from every other early detection method described in this book. The Pap smear required a trained cytology technician to examine every slide by hand. Low-dose CT scanning requires specialized imaging equipment, a radiologist trained to read the results, and a system for identifying and reaching the eligible patients. ColonFlag requires none of that. The blood is already being drawn. The laboratory is already running the analysis. All ColonFlag does is read the output more carefully than any human has time to do.

What Happened in Israel

The first large-scale clinical deployment of ColonFlag happened at Maccabi Healthcare Services itself, the same health system whose patient records had been used to build the algorithm. The target population was straightforward: patients who were overdue for colorectal cancer screening and had not responded to standard reminders. These were people who had been told they should get screened, had not done so, and had largely fallen off the radar of conventional follow-up systems.²

ColonFlag was applied to the blood records of nearly 80,000 such patients. From that pool, it identified 688 individuals whose blood patterns placed them in the highest-risk category, the top fraction of one percent of the entire screened population. Letters went to their physicians. Physicians contacted their patients. Some patients agreed to have a colonoscopy. Some did not.

Of the 254 patients who did agree to colonoscopy, nineteen were found to have colorectal cancer. That is a detection rate of 7.5 percent among the colonoscopies performed, compared to roughly one percent in standard population screening. The algorithm had found a group of people seven times more likely than the general population to harbor an undetected cancer, drawn from a pool of patients who had already declined conventional screening invitations.

But here is what makes those nineteen cases genuinely moving rather than merely statistically interesting. These were not patients who were about to walk into a hospital with symptoms. They were patients who felt fine, had no reason to suspect anything was wrong, and were going about their lives. Without ColonFlag, most of them would have continued to feel fine for another year, perhaps two or three, while their cancer grew from a curable early-stage tumor into something very different.

Beyond the nineteen cancers found through colonoscopy at Maccabi, the research team identified an additional fifteen patients in the flagged group who were later found to have cancer, diagnosed outside the Maccabi system and recorded in the electronic medical record after the fact. Thirty-four people in total, from a pool of 688 flagged patients among 80,000 screened, who were carrying a cancer that their blood had been quietly announcing for years.³

What Happened in Pennsylvania

Geisinger Health System is not a glamorous institution. It is a regional health system headquartered in Danville, Pennsylvania, serving more than three million patients across 45 counties in rural and semi-rural northeastern and central Pennsylvania. The population it serves is older, poorer, and sicker than the national average. About a third of its patients live in counties designated as medically underserved. It is exactly the kind of population -- rural, lower-income, with limited access to specialty care -- that early detection programs most consistently fail to reach.⁴

Geisinger deployed ColonFlag on a specific population: patients who were overdue for colorectal cancer screening, meaning they were the group most likely to fall through the cracks of conventional reminder-based screening programs. The algorithm analyzed the blood test records of more than 25,000 such patients and flagged 706 as high-risk.

Of the 706 flagged patients, 104 went on to have a colonoscopy. Among those 104, eight percent were found to have colorectal cancer. In standard population screening, the cancer detection rate runs at roughly one percent. Eight percent means that for every hundred colonoscopies performed on ColonFlag-flagged patients, eight found cancer compared to one in conventional screening. That is the eightfold improvement that appears in the published literature, and it represents something concrete: for every eight cancers found through ColonFlag-guided colonoscopy, seven of them were found in patients who would not have been identified through conventional screening methods at that point in time.⁵

The significance of this is worth sitting with. These were not patients who would have been caught by a different screening program. These were patients who had already been identified as overdue for screening and who had not responded to standard recall. They were, by definition, the patients the system was already failing. ColonFlag reached them not by changing their behavior or improving access to care or launching an outreach campaign, but by reading their blood tests differently.

That is a fundamentally different kind of solution. It does not require the patient to do anything. It does not require the health system to hire additional staff or build new clinical programs. It requires only that the data already being collected be analyzed by a tool already capable of reading it.

What Happened in the United Kingdom

The United Kingdom deployment of ColonFlag addressed a different but equally urgent problem. During the COVID-19 pandemic, colonoscopy services were severely disrupted. Endoscopy units shut down or reduced capacity. Waiting lists grew to lengths that had no precedent in modern British medicine. Patients who had been referred for urgent colorectal cancer investigation faced waits of months. Clinical teams needed a way to prioritize: among all the patients waiting for a colonoscopy, which ones needed to be seen first?⁶

ColonFlag was applied to the blood records of patients on the colonoscopy waiting list to triage them by cancer risk. The highest-risk patients, those whose blood patterns most strongly resembled the pre-diagnostic signatures of colorectal cancer, were moved to the front of the queue. The lower-risk patients waited longer.

The results confirmed what the Israeli and American deployments had already shown. Among the highest-risk patients prioritized by ColonFlag, the cancer detection rate was substantially higher than in unselected screening. The algorithm correctly identified the patients who most needed urgent attention, from a pool of patients who all had some clinical reason for referral, by reading the same blood test values that were already in their records.

This UK deployment also produced the most complete clinical performance figures available for any ColonFlag deployment in the published literature. At the highest-risk threshold, the algorithm correctly identified 88 percent of the patients who actually had cancer, missed 12 percent, and correctly cleared 71 percent of the patients who did not have cancer. Among all the patients it flagged as high-risk, cancer was found in roughly nine percent, comparable to the yield of mammography in breast cancer screening. Among all the patients it cleared as lower-risk, fewer than one in two hundred turned out to have missed cancer. In a pandemic-disrupted endoscopy system, that last figure was particularly valuable: it gave clinicians justified confidence that patients who waited longer were genuinely lower-risk.⁷

What ColonFlag Cannot Do

Honest accounting requires naming the limits alongside the achievements.

ColonFlag does not find every cancer. In the UK deployment, it missed twelve percent of patients who did have cancer, patients whose blood patterns did not match the signature the algorithm had learned to recognize. Some cancers, particularly those that bleed very little or very slowly, or that occur in patients whose blood values are affected by other conditions, are harder for the algorithm to catch. No screening test in medicine catches everything, and ColonFlag is no exception.

ColonFlag also flags patients who do not have cancer. Most patients it identifies as high-risk turn out, on colonoscopy, not to have colorectal cancer, though a meaningful proportion have precancerous polyps that can be removed before they become cancer, which is itself a valuable finding. The false positive rate, the rate at which it alerts physicians to elevated risk in patients who turn out to be fine, is a real feature of the algorithm that must be managed through thoughtful clinical protocols.

And ColonFlag does not eliminate the need for follow-up testing. It is a risk stratification tool, not a diagnosis. A high-risk flag means the patient's blood warrants a closer look. It does not mean the patient has cancer. The colonoscopy, or in some cases another appropriate test, is still required to confirm or rule out the finding.

These are the normal limitations of any screening test. Mammography misses cancers and generates false positives that lead to unnecessary biopsies. Low-dose CT for lung cancer detects nodules that turn out, on further investigation, not to be malignant. The PSA test for prostate cancer generates false positives that have led to millions of unnecessary follow-up procedures. None of these limitations has prevented those tests from saving hundreds of thousands of lives. They have been managed through clinical experience, through protocol development, through the gradual accumulation of knowledge about how to use the tool wisely. ColonFlag is at the beginning of that process, not the end.

LungFlag: The Same Idea, the Deadliest Cancer

Now consider a different patient. A man in his early sixties, a former smoker who quit a decade ago, living in Southern California. He sees his doctor, gets his blood drawn, and goes home. His blood count is reviewed, everything looks normal, and the results are filed. He is not eligible for low-dose CT lung cancer screening under current guidelines, because his smoking history, though real, falls just short of the twenty pack-year threshold that triggers a recommendation for the scan.

He has lung cancer. It will not be diagnosed for another nine months. By then, it will have spread.

This is the gap that LungFlag was built to fill. Not the patients who clearly qualify for existing screening programs, but the patients who fall through the eligibility criteria -- the former smokers who quit before they hit the threshold, the women whose lung cancers develop without the heavy smoking history that current guidelines require, the patients who would be identified as high-risk by a more sensitive tool but who fall under every radar the existing system deploys.

LungFlag was validated on a study population of nearly 200,000 patients at Kaiser Permanente Southern California, one of the largest integrated health systems in the United States. Kaiser is an ideal validation environment for this kind of algorithm because it has decades of complete longitudinal medical records for millions of patients, including serial blood test results, imaging records, and cancer diagnosis data, all linked in a single electronic health record. The researchers identified 6,505 patients who had been diagnosed with non-small cell lung cancer and compared their blood records against 189,597 patients who had not developed lung cancer, looking for the patterns that separated them.⁸

What they found confirmed what the biology predicted. Lung cancer leaves a distinctive inflammatory fingerprint in the blood. The ratio of neutrophils to lymphocytes, two types of white blood cells measured on every routine blood count, rises steadily in patients who will later develop lung cancer at a rate roughly ten times higher than in healthy patients. Red cell distribution width, the measure of variation in red blood cell size, climbs as well. These are not dramatic changes. They are subtle, consistent, and entirely within the range that a reviewing physician would consider normal.

LungFlag, trained on these patterns across a quarter-million patient records, identified forty percent of future lung cancer patients nine to twelve months before their clinical diagnosis, at a false positive rate of just five percent. Put in plain terms: for every hundred patients the algorithm flagged as high-risk, ninety-five were genuinely at elevated risk, and forty percent of the lung cancers that would later be diagnosed in the entire population had already been identified by a blood test the patients had already received.⁹

The comparison to existing screening standards is striking. Low-dose CT screening, the only established lung cancer early detection tool, is recommended only for current or former heavy smokers aged 50 to 80 with a specific smoking history. A study of nearly a thousand lung cancer patients at a major medical center found that only 35 percent of them actually met those eligibility criteria. The other 65 percent, including a disproportionate share of women and Asian patients, would never have been referred for a lung scan under current guidelines. LungFlag identified a substantial proportion of this missed population, because it reads the biological signal in the blood rather than applying a demographic checklist.

Figure 6. Demonstrated algorithm accuracy for non-cancer diseases compared to deployed cancer detection algorithms. AUC of 0.50 = coin flip; 1.0 = perfect prediction.

What LungFlag Has Not Yet Done

Here is the honest limitation of the LungFlag story, and it matters enormously for the argument this book makes.

LungFlag has been validated. The science is solid. The performance figures are published in a peer-reviewed journal and have withstood scrutiny. But LungFlag has not been deployed at scale in a clinical setting the way ColonFlag has been at Maccabi, Geisinger, and in the United Kingdom. The validation study was conducted retrospectively, meaning researchers looked back at existing patient records to see whether the algorithm would have identified lung cancer patients before their diagnosis. That is a necessary and rigorous step. It is not the same as running the algorithm prospectively, in real time, on real patients who do not yet know their diagnosis.

The next step for LungFlag, and for all the algorithms in Chapter 4 that have published validation data but no clinical deployment, is the prospective study: a health system that agrees to run the algorithm on its current patients, follow those patients over time, and measure in real clinical conditions whether the flagged patients do in fact develop cancer at higher rates, and whether earlier detection leads to better outcomes.

This is not a small step. It requires institutional commitment, regulatory consideration, clinical protocol development, and the willingness of physicians to act on an algorithmic flag for a disease as serious as lung cancer. But it is a knowable, achievable step. ColonFlag has already walked this path. The template exists.

The Template

What ColonFlag's deployments in Israel, Pennsylvania, and the United Kingdom demonstrate, collectively, is that the path from algorithm to clinical practice is navigable. It is not quick, and it is not without friction. But it has been walked, and walking it produced real results for real patients.

The template has five steps.

First, build the algorithm on a large, representative patient population with complete longitudinal blood records and linked cancer diagnosis data. The larger and more diverse the training population, the more robust the algorithm. ColonFlag was trained on 600,000 patients. LungFlag was validated on nearly 200,000. The algorithms for the other cancers need training populations of comparable scale.

Second, validate the algorithm externally, meaning test it on a different patient population from the one used to build it, to confirm that the patterns it learned are generalizable rather than specific to one health system or one demographic group. ColonFlag was validated in Israel, then in the United Kingdom, then in the United States. Each validation produced comparable results, confirming that the blood signatures it reads are consistent across populations.

Third, integrate the algorithm into existing clinical workflows without adding burden to physicians or patients. The ColonFlag integration at Geisinger and Maccabi worked because it required nothing new from anyone. The blood was already being drawn. The lab was already running the analysis. The algorithm simply read the output and sent an alert when the pattern warranted attention. The physician received a note in the electronic health record. The patient received a phone call. No new equipment. No new appointments. No new anything.

Fourth, develop clear clinical protocols for what happens when a patient is flagged. What follow-up test is indicated? Who orders it? How quickly? What happens if the patient declines? These protocols do not need to be invented from scratch. They can be adapted from the clinical guidelines that already govern colonoscopy referral, low-dose CT referral, and other established screening pathways.

Fifth, measure the outcomes and report them. The deployments at Maccabi and Geisinger were valuable not only for the patients whose cancers were found but for the medical literature they generated. Published deployment data creates the evidence base that persuades the next health system to implement, the next professional society to endorse, the next insurer to cover. Each deployment makes the next one easier.

This five-step template is not theoretical. It has been executed, across three countries, for colorectal cancer. The same template, applied to the validated algorithms for lung, liver, gastric, ovarian, and the other cancers described in Chapter 4, would produce the same kind of results. The biology is consistent. The methodology is proven. The infrastructure -- the blood draw, the laboratory, the electronic health record -- is already in place in every major health system in the country. What is required is the decision to begin.¹⁰

The Patients Who Were Not Found

I want to close this chapter with a thought that I find both clarifying and difficult.

The deployments described in this chapter found real patients whose cancers were real and whose outcomes were changed by being found earlier. That is the story that is easy to tell, because it has a human face and a clear result.

The harder story to tell is the one about the patients who were not found. At Geisinger, 706 patients were flagged as high-risk. Of those, 104 had colonoscopies. Eight percent of those 104 had cancer. The math tells us that roughly 602 flagged patients did not have a colonoscopy, either because they declined, because their physician did not refer them, or because the follow-up system did not reach them. Among those 602, some certainly had cancer. How many? The study cannot tell us precisely. But if the detection rate in the flagged population was eight percent, and if even half of the patients who did not complete follow-up had the same elevated risk, then dozens of cancers in that population went undetected at a stage when they might have been curable.

This is not a criticism of Geisinger. It is a description of the real-world limitations of any clinical program -- the gap between identifying patients at risk and ensuring that every one of them receives the follow-up they need. Closing that gap is a separate and equally important challenge, one that involves patient navigation, health literacy, cultural barriers to care, and the structural inequities in healthcare access that this book addresses directly in Chapter 12.

But the gap also tells us something important about what full deployment of these algorithms could achieve. If the detection rate in the Geisinger flagged population was eight percent among those who completed follow-up, and if a more complete follow-up system had reached all 706 flagged patients, the number of cancers found would have been substantially higher. Full deployment, with robust follow-up infrastructure, does not merely find more cancers. It finds the cancers in the patients who are hardest to reach -- the patients in underserved communities, the patients who have fallen off the conventional screening radar, the patients for whom early detection has always been most elusive and most needed.

That is the full promise of what ColonFlag and LungFlag have demonstrated: not just that the algorithm works, but that it works precisely for the patients the current system fails most consistently. Getting that promise to every patient is the work that remains.

Chapter 6

Building the Thirteen Algorithms

A General Reader’s Version

This book has a technical chapter by the same name, written for clinicians, researchers, and consortium planners who will actually do this work. This shorter version is for everyone else. It covers the same ground in plainer language, so a reader who simply wants to follow the argument can do so without the sample sizes, the acronyms, and the names of competing machine learning methods.

Anyone who reads this version and wants the full detail should turn to the companion chapter. The two are designed as alternatives rather than as a pair to be read together.

The question this chapter answers is how you would actually build a cancer-detection algorithm for each of the thirteen cancers that Chapter 4 described. Not the biology. Not the clinical payoff. The practical decisions, the unglamorous spadework, the choices that separate a useful tool from a disappointing one.

What an Algorithm Actually Does

Start with a picture. Imagine two stacks of blood test results on a long table. The left stack comes from people who were later found to have cancer. The right stack comes from people who were not. A computer sits at the table and spends months studying both stacks. Its job is to figure out what makes the left stack different from the right stack.

Once the computer has learned that, a new blood test result can be dropped on the table. The computer looks at it, compares it to everything it has learned, and sorts it onto one side or the other. That is the whole operation. Every cancer algorithm in this book, and every one in the published literature, is a version of this picture.

The computer is not doing anything mysterious. It is looking for patterns across the dozens of values in a standard blood test, patterns that a human doctor could never hold in mind all at once. The mystery, if there is one, is in how the two stacks get built. That is where almost everything that matters in this field is decided.

Six Rules for Building the Stacks

The technical chapter lists nine rules. For a general reader, six will do, because three of them are variations on the same idea. Here they are, in plain language.

Rule One: The Computer Is Only As Good As the Two Stacks

This is the master rule. Give the computer clean, honest, well-chosen stacks and you will get a useful tool. Give it sloppy or biased stacks and no amount of sophistication in the computer itself will save you. The whole game is in the stacks.

Rule Two: Make Sure the Cancer Cases Really Had Cancer

This sounds obvious. It is not. Electronic medical records are full of incorrect diagnosis codes. A doctor who suspects pancreatic cancer and orders a test to check often enters a pancreatic cancer code when ordering the test, and that code looks exactly like a code for a confirmed case. A researcher pulling the left stack by diagnosis code alone will include hundreds of patients who never actually had cancer. The rule is simple: every patient in the cancer stack must have both a diagnosis and evidence of actual treatment. Chemotherapy. Radiation. A surgical note describing the tumor. Without that, the stack is contaminated.

Rule Three: Include Plenty of Early-Stage Cancers

Hospitals see advanced cancer more often than early cancer, for an understandable reason. Patients with late-stage cancer are visibly sick and come through the door. Patients with early-stage cancer often feel fine and are found only by accident. If you build your left stack from whichever cancer patients the hospital happens to have, you will get mostly advanced cases. The computer will then learn what advanced cancer looks like, and it will be nearly useless at spotting the early cases we most want to catch. Every recipe in this book deliberately overweights stage I and stage II cases, even though they are harder to assemble.

Rule Four: Use Blood Tests Drawn Before Diagnosis, Not On the Day Of

A blood test drawn the morning of a colonoscopy is not much use for early detection. The patient was about to be diagnosed anyway. What matters is the blood test drawn eighteen months earlier, at a routine physical, when no one suspected anything. Those are the tests the algorithm will actually see when it is deployed. Every cancer has its own useful window. For fast-moving cancers like pancreatic, the window is maybe two years. For slow ones like multiple myeloma, it can reach five years. The recipe for each cancer specifies how far back the blood tests need to come from.

Rule Five: The Right Stack Should Not Be Too Clean

This one is counterintuitive. The instinct is to fill the right stack with healthy people, because healthy people clearly do not have cancer. The problem with that instinct was explained in the previous chapter. If you train the computer to tell cancer apart from picture-book health, the computer will flag everyone who has a cold, everyone on a medication, everyone with a chronic condition. The right stack should look like the real world, full of ordinary noisy human beings with ordinary noisy biology. An algorithm trained on noisy, real-life non-cancer patients is the one that will still work when it meets real patients in a real clinic.

Rule Six: The Algorithm Must Be Tested Somewhere It Has Never Seen

This is the single most important rule. An algorithm built on patients at one hospital, then tested on more patients at the same hospital, proves only that the algorithm works at that hospital. Nothing more. To know whether the algorithm actually learned something about cancer, rather than something about the quirks of one institution, you have to test it on a completely different patient population at a different hospital, ideally in a different country. If it still works there, you have a tool. If it does not, you have a lab curiosity. This test, called external validation, is the difference between an algorithm that gets deployed and one that quietly disappears.

What Makes a Good Testing Partner

External testing requires a partner. Not any partner will do. The hospital or health system that tests the algorithm has to have certain properties. The technical chapter lists seven; the general reader can get by with four.

First, the testing partner’s patients should be genuinely different from the patients the algorithm was built on. A hospital twenty miles away, sharing doctors and referral patterns, is not different enough. A hospital in California, or Israel, or Taiwan is.

Second, the partner’s medical records should reach back far enough to contain the blood tests the algorithm needs to see. Five years is minimum for fast-moving cancers. Fifteen years is better for the slow ones.

Third, the partner should have a complete picture of what happened to each patient over time. Health systems that lose track of patients when they move or switch insurance cannot tell the researchers which patients really did or did not develop cancer.

Fourth, the partner should use slightly different laboratory equipment than the hospital that built the algorithm. If the training labs and the testing labs are identical, the test only checks whether the algorithm works on that equipment, not whether it generalizes.

No single partner has all four properties at once for every cancer. That is why the recipes below often name two or three institutions per cancer rather than one.

A Note Before the Recipes

Each of the thirteen recipes below answers four questions. How strong is the signal this cancer leaves in the blood? Roughly how many cancer patients and non-cancer patients does the algorithm need to learn from? How far back do the blood tests need to reach? And which hospitals or health systems should test the finished algorithm?

The numbers are approximations. The technical chapter has the precise targets. What matters for the general reader is the shape of each recipe, not the digits.

One framing assumption runs through all thirteen. A consortium of New York and Northeast Corridor hospital systems, spanning roughly 50 million people across New York, New Jersey, and Pennsylvania, provides the training data. This consortium is not imaginary. It is built from hospital systems the author has professional relationships with: Memorial Sloan Kettering and its regional locations, Weill Cornell and New York-Presbyterian, Montefiore in the Bronx, Northwell on Long Island, NYC Health and Hospitals across the five boroughs, and Jefferson Health’s twenty-three hospitals across metropolitan Philadelphia. The demographic breadth of this consortium, which includes essentially every population group in the United States, is central to why the algorithms it could produce would be trustworthy from the start.

The Thirteen Recipes

1. Colorectal Cancer

The signal in the blood: strong. Colorectal tumors bleed microscopically into the intestine for years before diagnosis, and that slow bleeding shows up as gradually falling hemoglobin and changes in red blood cell size. ColonFlag, the algorithm that already exists for this cancer, is the most validated tool in the field.

What the consortium would do: collect about 10,000 confirmed colorectal cancer patients and around 80,000 non-cancer adults, using three to four years of blood test history. Because ColonFlag already works, the consortium’s job is not to invent a new algorithm but to deploy it at scale and prove it works across the full demographic range of the Northeast Corridor.

Where to test it: Maccabi Healthcare Services in Israel, which built the original ColonFlag and has been running it in routine clinical use since 2015. Kaiser Permanente in California, which has validated it on a separate American population. Geisinger Health System in rural Pennsylvania, which deployed it on an underserved rural population. All three have already done this work; the consortium’s contribution is scale and demographic breadth.

2. Lung Cancer

The signal in the blood: moderate to strong, mostly from inflammation. As a lung tumor grows, it disturbs the immune system, pushing certain white blood cell counts up and others down. The ratio between them climbs over time. LungFlag, the existing algorithm, has been validated on nearly 200,000 patients but is not yet in routine clinical use.

What the consortium would do: collect about 12,000 lung cancer cases, deliberately overrepresenting women, Asian patients, and lighter former smokers. These are the patients current CT-scan screening misses most often, and they are the ones a blood-test algorithm could most help. Add around 120,000 non-cancer adults of similar age.

Where to test it: Kaiser Permanente in California, where LungFlag was originally built. The Veterans Affairs hospital system, which serves an older male population where lung cancer rates are high, a demographic the algorithm has to handle reliably. Clalit Health Services in Israel, for the population that falls outside current screening rules, particularly non-smoking women.

3. Liver Cancer

The signal in the blood: strong. The liver cannot hide dysfunction from a standard blood test. A calculation doctors already use called the FIB-4 index (named for its four inputs: age, two liver enzymes, and platelet count) tracks liver damage well. Watching how that number changes over successive blood draws is one of the most powerful predictors in the field.

What the consortium would do: collect around 4,000 liver cancer cases, drawing especially on the large hepatitis B patient populations in NYC Health and Hospitals, Northwell’s Asian American communities, and immigrant communities in Queens and Brooklyn. Add about 40,000 non-cancer adults with chronic liver disease as the comparison group. Healthy non-cancer patients would teach the algorithm the wrong lesson; the comparison must be with people who have liver problems but not cancer.

Where to test it: Taiwan’s national health database and South Korea’s national health insurance records, because hepatitis B is far more common in Asian populations than Western ones, and algorithm performance has to be proven there. Clalit Health Services in Israel, which has unusual experience with hepatitis C in Middle Eastern populations.

4. Gastric Cancer

The signal in the blood: moderate. Gastric tumors bleed similarly to colorectal ones, though less consistently. An algorithm from Shanghai, trained on twenty routine blood tests, works particularly well at finding gastric cancer in patients whose traditional tumor markers come back normal. Those are exactly the patients current screening misses.

What the consortium would do: collect around 3,000 gastric cancer cases, drawing particularly on Asian American populations in Queens and on Long Island, and Jewish populations at Sloan Kettering and Weill Cornell, both of which have higher incidence than the general American population. The comparison group, roughly 30,000 patients, should not be healthy volunteers. It should be patients with precancerous stomach conditions, who are the patients doctors actually have to distinguish from gastric cancer.

Where to test it: Taiwan’s and South Korea’s national databases, where gastric cancer is much more common and well-documented. Maccabi Healthcare Services in Israel for additional validation. A Japanese partner if available, because Japan’s national gastric cancer screening program has produced the best stage-balanced cohorts in the world.

5. Pancreatic Cancer

The signal in the blood: moderate, and present only in a narrow window of about two years before diagnosis. Pancreatic cancer damages insulin-producing cells, causing fasting blood sugar to drift upward, and it compresses the bile duct, raising liver enzymes. Both changes appear on standard blood panels before the tumor is visible on any scan. This is the hardest recipe in the chapter, but the potential payoff is larger than almost any other. Catching pancreatic cancer two years earlier would change survival more than any drug discovery of the past decade.

What the consortium would do: collect about 5,000 pancreatic cancer cases, anchored at Sloan Kettering, which has one of the largest pancreatic cancer programs in the world. Early-stage cases will be scarce, because this cancer rarely presents early. About 50,000 non-cancer adults, deliberately including patients with chronic pancreatitis, type 2 diabetes, and benign pancreatic cysts.

Where to test it: Maccabi Healthcare Services and Medial EarlySign in Israel, the team and the company that built ColonFlag and LungFlag. For pancreatic cancer specifically, the combination of deep Israeli medical records and the algorithm-building expertise of Medial is probably essential. Kaiser Permanente in Northern California for American cross-validation. The international Cancer of the Pancreas Screening Consortium, led from Johns Hopkins, which focuses specifically on high-risk early-stage patients.

6. Ovarian Cancer

The signal in the blood: surprisingly strong, given that there is no ovarian cancer screening test in current use. Ovarian tumors produce a signaling molecule that tells the bone marrow to make more platelets. The platelet count climbs slowly in the eighteen months before diagnosis, then faster in the final six months. Algorithms reading this pattern have reached performance levels that would be considered extraordinary for any cancer screening test.

What the consortium would do: collect around 2,500 ovarian cancer cases. This is difficult, because ovarian cancer is usually found late and stage I cases are genuinely rare. The consortium would need Sloan Kettering’s gynecologic oncology referrals plus the women’s health programs at Weill Cornell and NYU, with deliberate overrepresentation of BRCA-positive women whose earlier-stage cancers are more commonly caught. About 25,000 non-cancer women for the comparison group.

Where to test it: the United Kingdom’s primary care research database, which covers about ten million British patients and accumulates enough rare early-stage cases to support validation. The Danish national cancer registry, which has near-perfect record-keeping on every cancer diagnosis in the country. Maccabi Healthcare Services for additional validation, particularly in Ashkenazi Jewish women whose higher BRCA rates produce more early-stage cases.

7. Kidney Cancer

The signal in the blood: moderate. Kidney tumors gradually reduce the kidney’s production of a hormone that tells the body to make red blood cells, so hemoglobin slowly drops. Inflammation markers rise, and kidney-function values drift. An algorithm using just eight routine blood tests performed well at one Chinese hospital but has never been validated elsewhere.

What the consortium would do: collect around 6,000 kidney cancer cases, concentrating early-stage cases (which are often found by accident on scans done for other reasons) through Sloan Kettering, Weill Cornell, Jefferson, and Northwell urology programs. About 60,000 non-cancer adults, with deliberate inclusion of patients with chronic kidney disease, kidney stones, and benign cysts, the conditions the algorithm has to tell apart from cancer.

Where to test it: Kaiser Permanente for American external validation. The Veterans Affairs hospital system, where kidney cancer rates are elevated due to age and sex distribution. Maccabi Healthcare Services in Israel. The goal would be the first rigorously externally validated kidney cancer algorithm in the literature.

8. Multiple Myeloma

The signal in the blood: strong, distinctive, and unusually long. Multiple myeloma leaves its fingerprint in the blood three to five years before diagnosis, one of the longest warning windows of any cancer in this group. The fingerprint includes rising calcium, rising total protein, rising creatinine, and falling hemoglobin. A patient with elevated calcium is more than ten times as likely to develop myeloma as a patient without.

What the consortium would do: collect around 4,000 myeloma cases. This recipe has a specific requirement that the others do not. Myeloma incidence in African Americans is more than twice the rate in whites, and no prior myeloma algorithm has had adequate representation of Black patients. The consortium’s access to the Bronx, central Brooklyn, and Philadelphia makes Black-population validation possible at a scale no other research group can match. About 40,000 non-cancer adults in the comparison group, including patients with a pre-myeloma condition called MGUS that the algorithm has to learn to distinguish from actual cancer.

Where to test it: Howard University Hospital, Morehouse School of Medicine, and Meharry Medical College, all historically Black medical institutions with strong research infrastructure and the patient populations that existing myeloma algorithms have undersampled. The United Kingdom’s primary care database, because foundational myeloma research has been done there. Maccabi Healthcare Services for additional international validation.

9. Leukemia

The signal in the blood: strong, because leukemia originates in the blood itself. For chronic lymphocytic leukemia, the most common adult leukemia, lymphocyte counts rise gradually for years before diagnosis. Recent work on more than a million patient records has shown that the rate of rise is more informative than any single reading.

What the consortium would do: collect around 8,000 leukemia cases across the different types. Each type has a different blood signature and should be modeled separately. CLL has the longest warning window and should form the largest share. About 80,000 non-cancer adults in the comparison group, including patients with viral infections that temporarily raise lymphocyte counts.

Where to test it: the Danish national cancer registry, which has the deepest longitudinal records in the world. Maccabi Healthcare Services in Israel. Kaiser Permanente in California. Dana-Farber Cancer Institute in Boston for acute leukemia specifically.

10. Lymphoma

The signal in the blood: moderate. Lymphomas disturb the immune system and produce inflammation, which shows up in blood tests. A value called LDH, which reflects cell turnover, is abnormal in more than half of non-Hodgkin lymphoma patients at diagnosis. Research on 663,000 Danish patients has shown that five years of blood test history improves prediction accuracy.

What the consortium would do: collect around 6,000 lymphoma cases, modeled separately for Hodgkin, non-Hodgkin, aggressive, and indolent subtypes. About 60,000 non-cancer adults in the comparison group.

Where to test it: the Danish national cancer registry, where the foundational research in this area was done. Maccabi Healthcare Services, which has experience with Mediterranean-specific lymphoma subtypes rare in the United States. Kaiser Permanente in Northern California, whose Asian American population includes lymphoma subtypes almost absent from European research.

11. Bladder Cancer

The signal in the blood: moderate. Chronic inflammation from bladder tumors produces rising inflammatory markers and falling hemoglobin. A 2022 study using eight routine laboratory values, including some from urinalysis, reached strong performance using patients with other urological conditions (rather than healthy people) as the comparison group.

What the consortium would do: collect around 5,000 bladder cancer cases, most of them early-stage since the majority of bladder cancers are caught before spreading. About 50,000 non-cancer patients with urological symptoms, not healthy volunteers. This is the control group that mirrors the real diagnostic question.

Where to test it: the Veterans Affairs hospital system, which has better documentation of occupational chemical exposures than most civilian systems, and bladder cancer rates are elevated in workers with certain exposure histories. Kaiser Permanente. Maccabi Healthcare Services.

12. Esophageal Cancer

The signal in the blood: moderate, similar to colorectal and gastric cancers. But unlike those, no large-population machine learning study exists for esophageal cancer. This recipe describes what that study should look like.

What the consortium would do: collect around 3,000 esophageal cancer cases. This is tight, because the cancer is not common. Both subtypes, squamous cell (associated with smoking and alcohol) and adenocarcinoma (associated with acid reflux and Barrett’s esophagus) should be modeled separately because their risk populations differ. About 30,000 patients with Barrett’s esophagus or chronic reflux in the comparison group.

Where to test it: Taiwan’s national database, where squamous cell esophageal cancer is far more common than in the West. The Mayo Clinic’s comprehensive esophageal program. Maccabi Healthcare Services. Because this algorithm does not yet exist, the consortium’s work here is to build it first and validate second.

13. Thyroid Cancer

The signal in the blood: moderate. A hormone called TSH is already measured routinely in many patients, and its trajectory over time is informative. Thyroid cancer has the smallest mortality burden of the thirteen, but the recipe matters because the aggressive subtypes have much worse prognoses and early detection of any form allows simpler treatment.

What the consortium would do: collect around 20,000 thyroid cancer cases, overrepresenting the aggressive subtypes. About 175,000 adults on routine TSH monitoring in the comparison group. This cancer’s hardest problem is overdiagnosis: an overly sensitive algorithm would flag many patients with small, indolent cancers that would never have harmed them, leading to unnecessary surgeries. The recipe has to be calibrated to avoid this trap.

Where to test it: South Korea’s national health insurance records are essential here. The entire global conversation about thyroid cancer overdiagnosis was defined by Korean data after a screening program there produced a massive rise in thyroid cancer diagnoses without any corresponding drop in mortality. An algorithm not cross-checked in Korea should be regarded with skepticism. The Mayo Clinic’s endocrinology program has extensive overdiagnosis research. Maccabi Healthcare Services for additional international validation.

What This Adds Up To

Thirteen cancers. Thirteen recipes. Each one specifies the same handful of things: who goes in the cancer stack, who goes in the non-cancer stack, how far back the blood tests have to reach, where the finished algorithm should be tested. The particulars vary by cancer. The structure does not.

A reader who has followed this chapter now knows enough to evaluate any cancer algorithm study that appears in the news. When a company announces impressive results, the questions are these. How did they pick their cancer patients? How did they pick their non-cancer patients? Did they include early-stage cases? Did they test the algorithm at a different hospital? If those questions are answered clearly, the study is probably sound. If they are skimmed over, the results probably will not hold up in the clinic.

The consortium described here is not theoretical. The hospital systems it draws on are real, the relationships between them are real, and the patient records already exist. What does not yet exist is the institutional commitment to use them in this way. The last chapters of the book are about how that commitment could be built, and what happens if it is.

Three hundred ninety-three thousand Americans die of these thirteen cancers every year. Published estimates suggest that a consortium deployment at the scale described here could prevent one hundred thousand to one hundred seventy-five thousand of those deaths annually. Not eventually. Within a few years of commitment. That is the stake.

Chapter 7

Building the Thirteen Algorithms

A Recipe for Each Cancer, Anchored in the Northeast Corridor Consortium

Chapter 4 described the thirteen cancers and the evidence that each of them leaves a readable fingerprint in routine blood work. Chapter 5 showed what happens when one of those algorithms, ColonFlag, is actually deployed in a clinic. This chapter takes the next step. It lays out, cancer by cancer, what it would take to build a credible, deployable algorithm for each disease, using a consortium drawn from the five New York hospital systems, Jefferson Health across the Philadelphia metropolitan region, Northwell on Long Island and the outer boroughs, and Sloan Kettering’s regional locations.

The consortium pool represents roughly 50 million covered lives across three states, with demographic breadth that no single-institution study has ever matched. For most of the thirteen cancers, this is enough to build the algorithm. For external validation, which is a separate and equally important step, the consortium needs partners outside its own pool. This chapter specifies, for each cancer, what that external validation should look like.

Before the thirteen recipes, a set of rules. These apply to every cancer. They are the methodological spine of the chapter, and reading them first makes the recipes that follow considerably shorter than they would otherwise be.

The Nine Rules

Rule One: The Algorithm Is Only As Good As the Two Stacks

Every blood-test cancer algorithm is a comparison engine. It learns the patterns that distinguish two stacks of blood records: one from patients with the disease, one from patients without. The mathematical choice of model matters less than the composition of the two stacks. A mediocre algorithm trained on well-built stacks will outperform a sophisticated algorithm trained on poorly built ones. This rule governs every other rule.

Rule Two: Cancer Cases Must Be Confirmed by Both Diagnosis and Treatment

Electronic medical records are full of provisional diagnosis codes. A researcher who pulls the cancer stack by diagnosis code alone will contaminate it with rule-outs and administrative errors. The working rule is that every patient in the cancer stack must have both a cancer diagnosis code and evidence of actual cancer treatment, meaning chemotherapy orders, radiation records, or a surgical note describing the tumor. Studies that skip this step overstate their performance. The Singh research team applied this standard explicitly in their 2025 multi-cancer model, and it should be the consortium’s default for every cancer recipe that follows.

Rule Three: Stage Distribution Must Favor Early Disease

A hospital tumor registry, left alone, skews toward advanced cases. Patients with late-stage cancer arrive visibly ill. Patients with early-stage cancer are often walking around feeling fine, found only incidentally on scans done for other reasons. An algorithm trained on a registry-weighted target group learns to recognize late cancer, which is biologically loud. It will fail on early cancer, which is biologically quiet, and early cancer is what we need to catch. For every cancer in this chapter, the recipe specifies a target group deliberately skewed toward stage I and stage II disease, often requiring referral-pattern aggregation from Sloan Kettering and Jefferson to reach those case counts.

Rule Four: Blood Tests Must Predate Diagnosis by a Meaningful Window

A blood test drawn the day of a biopsy carries little early-detection value. The patient was going to be diagnosed anyway. What matters is the blood test drawn eighteen months, three years, or five years before diagnosis, during routine physicals when no one suspected anything. Those are the tests a deployed algorithm will actually see. Each cancer has its own biological window, short for fast-progressing diseases like pancreatic cancer, long for indolent ones like myeloma or chronic lymphocytic leukemia. Every recipe specifies the window, and the recipe fails if the available blood tests do not reach back far enough.

Rule Five: Controls Must Resemble Cancer Patients in Every Way Except the Cancer

Healthy controls produce spectacular performance numbers and useless tools. The best control group for any cancer algorithm includes people with infections, medications, chronic conditions, coexisting illnesses, and the ordinary noise of ordinary lives. For some cancers, the harder and more useful comparison is not against healthy people at all but against patients with the precancerous or clinically confusable conditions that actually present to physicians. Every recipe specifies the composition of the control group, not just its size.

Rule Six: The Ratio of Cancer to Non-Cancer Should Be Between One to Five and One to Twenty

Below about one cancer case per three controls, the algorithm has not seen enough normal blood variation. Above about one cancer case per twenty controls, additional controls provide diminishing information. The practical sweet spot is between one to five and one to twenty. Feeding the algorithm two million controls against fifteen thousand cancer cases does not improve performance. It slows computation. Every recipe specifies a control count in this range.

Rule Seven: External Validation Is Not Optional

An algorithm tested only on patients from the hospital that built it tells a reader what the algorithm does at that hospital. Nothing more. Cross-population validation is the only way to distinguish an algorithm that has learned cancer biology from one that has memorized a local patient population’s quirks. The nature of the validation partner needed depends on the cancer, and the rest of this section explains what properties a good validation partner has for this kind of work. Every recipe ends with specific validation guidance.

Rule Eight: Subgroup Performance Matters More Than Overall Performance

An algorithm that performs well overall but fails in the subgroups where current medicine most needs help is not a useful addition to medicine. For gastric cancer, the subgroup that matters is patients with negative tumor markers. For lung cancer, patients who do not meet smoking-history criteria for low-dose CT. For kidney cancer, patients whose tumors are too small to show incidentally. Every recipe identifies the subgroup where the algorithm must perform, not just the overall population.

Rule Nine: Demographic Breadth Is Not a Luxury

Every published cancer algorithm in routine clinical use was trained on a narrower population than it was later deployed in. Algorithm performance degrades, sometimes severely, when moved to populations that differ from the training set by ethnicity, socioeconomic status, or local laboratory practices. The New York consortium’s composition, which spans Bronx and Queens neighborhoods with populations from every continent, Long Island suburbs, and the Philadelphia urban and rural catchment, addresses this at the training stage rather than hoping for it at the validation stage. Every recipe specifies which consortium sites contribute which demographic segments of the target and control groups.

What Makes a Good External Validation Partner

External validation is the single most important test any blood-test cancer algorithm must pass. An algorithm tested only at the hospital where it was built tells us what it does at that hospital and nothing more. An algorithm tested on an independent patient population at a different institution tells us whether it learned something about cancer biology or something about the local patient mix. Every algorithm that has reached clinical use, ColonFlag, LungFlag, and the cardiovascular risk models now embedded in primary care software, passed through external validation before deployment. The ones that skipped this step either never reached the clinic or failed once they got there.

The right validation partner is not a single thing. It is a set of properties, and the right partner for any given cancer depends on which properties matter most for that cancer’s biology and its current evidence gaps. The consortium’s validation planning rests on seven properties a partner institution may have. A good validation partner for a given cancer will typically have three or four of these. A single institution almost never has all seven, which is why the recipes later in this chapter often name more than one partner.

Population Independence

The partner must draw patients from a source that does not overlap with the consortium’s training data. A hospital twenty miles from a consortium site, sharing referral patterns and physician networks, provides little validation value no matter how rigorous its records are. The test is whether the partner’s population is genetically, environmentally, and demographically distinct enough that an algorithm performing well on both populations has almost certainly learned cancer biology rather than local patterns. This is why partners on the West Coast, in Europe, or in Israel provide stronger external validation than partners in adjoining states.

Depth of Longitudinal Records

For cancers where the pre-diagnostic blood window extends years before diagnosis, the partner’s records must reach back far enough to contain those early blood tests. A hospital system with five years of complete electronic history is useful for fast-moving cancers like pancreatic. A system with fifteen or twenty years of complete history is essential for cancers with long indolent phases like multiple myeloma and chronic lymphocytic leukemia. Without this temporal depth, the validation tests only what the algorithm does in the period immediately before diagnosis, which is not the period of clinical interest.

Completeness of Outcome Ascertainment

The partner must be able to confirm which patients did and did not develop cancer over the follow-up window, without losing track of anyone. Health systems that lose track of patients when they switch insurance, move, or seek care elsewhere produce incomplete outcome data. Incomplete outcomes systematically bias algorithm performance numbers upward because the missed cancers become invisible. National registries, closed integrated health systems, and universal-coverage health plans all provide complete ascertainment. Loose networks of independent hospitals usually do not.

Different Laboratory Standards

Blood test values drift between instrument manufacturers, reagent lots, and calibration standards. If the consortium trains on Northeast Corridor lab equipment and validates on lab equipment calibrated to the same standards, the validation tests only biology, not the mechanical robustness of the algorithm to laboratory variation. A partner whose laboratory infrastructure differs from the training set provides a test that catches deployment failures the consortium would otherwise miss until patients were already being harmed.

Demographic Breadth Suited to the Specific Cancer

This is where partner choice varies most by disease. For cancers with sharp ethnic incidence differences (multiple myeloma in African Americans, gastric and liver cancer in Asian populations, thyroid cancer in Korean populations), the right partner serves those populations at scale. For cancers with socioeconomic gradients (lung, cervical, most gastrointestinal cancers), the right partner covers the full socioeconomic range rather than a narrow slice. A single partner rarely serves every demographic question at once, which is why the chapter specifies different partners for different cancers rather than one universal validator.

Existing Experience with Blood-Test Cancer Algorithms

For cancers where the consortium’s work is building on an existing algorithm, a partner with direct experience running similar algorithms in clinical use shortens the deployment pathway substantially. The protocols for physician notification, patient follow-up, and clinical integration already exist in working form. Health systems with this experience are rare globally; they should be prioritized as partners for the cancers where the consortium’s role is deployment validation rather than new algorithm development.

Institutional Willingness to Participate

A partner that meets every methodological criterion but will not commit research infrastructure, clinician time, and governance support cannot actually serve as a validation site. This is less a methodological criterion than a practical one, but it is the one that most often determines which partners join the work.

With those properties in mind, the thirteen recipes that follow each name the partners most likely to meet them for that specific cancer, and explain why. The named institutions are not the only possibilities. They are the ones whose public track records, demographic compositions, or research infrastructure most closely match what each cancer’s validation requires.

A Note on the Numbers

The case counts that follow are calibrated to United States incidence rates, applied to the consortium’s approximately 50 million covered lives across the Northeast Corridor, accrued over a practical five-year window of electronic medical record history. For most cancers, this window yields enough confirmed cases to build a deployable-scale study without pulling from outside the consortium. For three of the thirteen (pancreatic, ovarian, thyroid), the recipe is tighter than ideal and is strengthened substantially by international partnerships. For one (esophageal), the recipe specifies a smaller development study followed by validation-at-scale, because the literature is thin and initial algorithm development should precede full consortium deployment.

The blood signal strength classification, strong, moderate, or weak, carries through from Chapter 3’s biological discussion. Strong-signal cancers require smaller target groups because the biological fingerprint is obvious once the algorithm knows where to look. Weak-signal cancers require much larger target groups and deeper pre-diagnostic histories, because the algorithm must find subtler patterns against ordinary biological noise.

The Thirteen Recipes

1. Colorectal Cancer

Blood signal strength: strong. The iron depletion from chronic microscopic bleeding produces one of the clearest signatures in the full library, with clear changes in hemoglobin, red cell indices, and platelets across three to four years preceding diagnosis. ColonFlag already exists, validated across three continents, so the consortium’s role is deployment-scale prospective validation rather than new algorithm development.¹

Target group: 8,000 to 12,000 confirmed colorectal cancer cases across the consortium, drawn from the five New York systems plus Jefferson and Northwell over five years. Sloan Kettering and its regional network should contribute the heaviest share of stage I and stage II cases, which should constitute at least 60% of the target group. Confirmation standard: diagnosis code plus colonoscopy pathology report plus either surgical resection or documented chemotherapy.

Control group: 80,000 to 100,000 adults aged 40 and over, drawn across all consortium sites with deliberate representation from Montefiore, NYC Health and Hospitals, Northwell community sites, and Jefferson community hospitals. Controls must include patients with inflammatory bowel disease, iron deficiency from other causes, and common non-cancer conditions that perturb the complete blood count.

Pre-diagnostic blood window: three to four years of CBC history per patient, minimum. Consortium EMR access across Jefferson and the New York systems must be linked to support this.

The hard part: not the algorithm, which exists, but the follow-up infrastructure. The Geisinger deployment found cancers in only 104 of 706 flagged patients because follow-up was uneven. The consortium’s recipe must include patient navigator capacity adequate to reach every flagged patient.

Validation partners

For colorectal cancer, the two properties that matter most are existing experience with blood-test cancer algorithms and population independence. ColonFlag is already deployed in at least three health systems globally, and the consortium’s work here is about scaling what works rather than inventing what is missing. The strongest candidates are Maccabi Healthcare Services in Israel, which built the original ColonFlag and has been running it clinically since 2015 with protocols and physician education already in place; Kaiser Permanente in California, which has validated the algorithm on an entirely separate American population with different laboratory practices from the Northeast Corridor; and Geisinger Health System in Pennsylvania, which deployed ColonFlag on an underserved rural population and is close enough geographically to share clinical protocols while distant enough in demographics and referral patterns to count as independent validation.

2. Lung Cancer

Blood signal strength: moderate to strong, primarily inflammatory. LungFlag exists and has been validated at Kaiser Permanente Southern California on nearly 200,000 patients, but has not yet been deployed in routine clinical use. The consortium’s role is prospective validation and deployment, plus extension to patients who fall outside current screening eligibility.²

Target group: 10,000 to 15,000 confirmed non-small cell lung cancer cases across the consortium over five years. Critical design point: deliberately over-represent women, Asian patients, and never-smokers or light former smokers. These are the patients current low-dose CT screening misses most reliably, and they are the ones the consortium’s algorithm must detect. At least 40% of the target group should come from demographics currently excluded by the USPSTF smoking-history rule.

Control group: 100,000 to 150,000 adults aged 50 and over, with deliberate sampling from Queens, the Bronx, and Long Island to span the full smoking-history range from lifelong non-smokers to heavy former smokers. Controls must include patients with COPD, pneumonia histories, and other non-cancer lung conditions.

Pre-diagnostic blood window: two years of CBC history, with attention to neutrophil-to-lymphocyte ratio trajectory. Shorter windows are acceptable if necessary but reduce sensitivity.

The hard part: ensuring that the flagged patients actually receive low-dose CT. Lung cancer diagnosis requires imaging, not blood work alone, and many flagged patients will be in populations with limited CT access.

Validation partners

For lung cancer, the priority properties are existing algorithm experience and demographic breadth across smoking-history subgroups. Kaiser Permanente Southern California is the natural primary partner because LungFlag was developed there, which means every methodological choice in the consortium’s extension of LungFlag can be directly compared against Kaiser’s original results. The Veterans Affairs system (particularly the Philadelphia, Bronx, and Northport VAs) provides the cross-check in a population heavily weighted toward older male patients with high smoking prevalence, the demographic where current screening works best and where an algorithm must at minimum match it. For validation in the populations current screening misses, Clalit Health Services in Israel serves a broader mix of smoking patterns, including substantial non-smoking lung cancer cases in women of Middle Eastern ancestry, a group that is almost entirely absent from American validation cohorts.

3. Liver Cancer

Blood signal strength: strong. The liver cannot hide dysfunction from a metabolic panel. One particular combination of four standard blood values, the FIB-4 index (named for its four inputs: age, two liver enzymes called AST and ALT, and platelet count), already carries substantial predictive weight for liver damage and cancer risk, and physicians have been using it for years without realizing they could run it automatically on every blood panel. Tracking how the FIB-4 score changes across successive blood draws has proven to be one of the strongest predictive signals in the liver cancer literature. Multiple strong machine learning algorithms now exist. A Hong Kong health-system study reached an AUC of 0.89, and a 2024 model from Kevin Kwok and colleagues that combined routine blood tests with electronic health records, lifestyle, genetics, and metabolism data, drawing on both the UK Biobank and the US All of Us research program (covering more than 900,000 people together), reached 0.88 across two separate populations. Neither algorithm is yet deployed in routine clinical practice.³

Target group: 3,000 to 5,000 confirmed hepatocellular carcinoma cases across the consortium over five years. Hepatitis B and hepatitis C cohort overlap is critical: the consortium should deliberately over-sample patients with prior hepatitis diagnoses, since liver cancer arises predominantly in that population, and since NYC Health and Hospitals serves one of the largest hepatitis B populations in the country among its Asian American and West African patient groups.

Control group: 30,000 to 50,000 adults with chronic liver disease or hepatitis infection but no liver cancer. This is the clinically realistic comparator. Using healthy controls would inflate performance artificially and teach the algorithm nothing useful about distinguishing hepatocellular carcinoma from its precursor conditions.

Pre-diagnostic blood window: two to three years, with longitudinal FIB-4 trajectory as a primary feature.

The hard part: coordinating hepatology practices across the consortium to ensure consistent staging criteria. Liver cancer staging is heterogeneous and each system’s criteria must be harmonized before training.

Validation partners

Liver cancer validation is genuinely different from the other cancers in this chapter because hepatitis B is the dominant risk factor globally, and hepatitis B epidemiology differs sharply between Western and Asian populations. The priority validation properties are demographic breadth in populations with high hepatitis B prevalence and laboratory standards distinct from American practices. Taiwan’s National Health Insurance Research Database, which covers essentially the entire Taiwanese population with complete longitudinal records, is probably the single strongest partner in the world for liver cancer algorithm validation. South Korea’s National Health Insurance Service provides a comparable asset. Within the consortium’s reach, Clalit Health Services in Israel adds cross-population breadth and has unusually detailed experience with hepatitis C epidemiology in Middle Eastern populations, important because the Kwok model drew heavily from British and American cohorts and needs testing on populations whose hepatitis risk profiles differ.

4. Gastric Cancer

Blood signal strength: moderate. Gastrointestinal bleeding produces iron depletion similar to colorectal cancer but less consistent. The Ke XHGC20 model from Shanghai achieved AUC 0.90 using a precancerous-lesion control group, and performed exceptionally well (AUC 0.97) in patients with negative traditional tumor markers. The consortium’s role is Western-population validation of this approach.⁴

Target group: 2,500 to 4,000 confirmed gastric cancer cases across the consortium over five years. The consortium’s Asian American patient populations in Queens and on Long Island, and Jewish populations served by Sloan Kettering and Weill Cornell, cover two of the higher-incidence subgroups. Early-stage cases should constitute at least 50% of the target group, which requires Sloan Kettering’s referral pattern.

Control group: 25,000 to 40,000 patients with confirmed precancerous stomach conditions (atrophic gastritis, intestinal metaplasia, Helicobacter pylori-associated gastritis). This mirrors the Shanghai design and produces a clinically realistic comparator.

Pre-diagnostic blood window: eighteen months to two years. Gastric cancer’s window is shorter than colorectal, because the disease progresses faster once established.

The hard part: obtaining confirmed precancerous pathology across the consortium. Many New York patients with gastritis are managed without biopsy. The recipe may require active recruitment of endoscopy patients to build the control stack.

Validation partners

Gastric cancer’s validation requires the same kind of demographic breadth that liver cancer does, for the same reason: incidence and underlying biology differ sharply between Western and Asian populations, and most published algorithms were built on Asian cohorts that do not necessarily generalize to American patients. Taiwan’s National Health Insurance Research Database is again a prime partner, because Taiwanese gastric cancer incidence is among the highest in the world and case density is exceptional. South Korea’s National Health Insurance Service has produced some of the largest gastric cancer cohorts ever studied. For within-consortium-reach partnership, Maccabi Healthcare Services in Israel provides a Jewish and Middle Eastern population mix that differs meaningfully from Chinese and Korean patients, making it a useful second validation layer. The consortium should also pursue a Japanese partner if available; Japan’s nationwide endoscopic screening program has produced the most stage-balanced gastric cancer cohorts in the literature.

5. Pancreatic Cancer

Blood signal strength: moderate, but with a critical constraint: the biological window before diagnosis is short, typically two years at most. This is the hardest recipe in the chapter. The disease kills 52,000 Americans annually and catches almost all of them too late. Existing machine learning models have identified more than half of future late-stage patients two years before diagnosis, at a stage when surgery was still possible. Deploying such a model at scale could change the natural history of pancreatic cancer more than any drug discovery of the past decade.⁵

Target group: 4,000 to 6,000 confirmed pancreatic adenocarcinoma cases across the consortium over five years. At the incidence rate of 13.8 per 100,000 annually, a 50-million-life consortium yields roughly 6,500 cases per year, making this feasible in principle but requiring every case available. Sloan Kettering’s pancreatic program is among the largest in the world and must anchor the target group. Early-stage cases will be scarce, perhaps 15% of the total, which is below the target this chapter usually recommends but reflects the natural distribution of this cancer.

Control group: 40,000 to 60,000 adults aged 50 and over, with explicit representation of patients with chronic pancreatitis, type 2 diabetes, and pancreatic cysts, each of which is either a risk factor for or a clinical mimic of pancreatic cancer.

Pre-diagnostic blood window: two years, with particular attention to the fasting glucose trajectory (pancreatic cancer damages insulin-producing cells before the tumor is visible on any scan) and liver enzyme elevations from early bile duct compression.

The hard part: the window is narrow and the cases are scattered across the consortium. Aggregating records fast enough to capture the pre-diagnostic window requires real-time EMR linkage, not retrospective batch pulls.

Validation partners

For pancreatic cancer, the priority validation properties are depth of longitudinal records and existing experience building blood-test cancer algorithms. The window is too narrow to tolerate sparse data, and the algorithm development itself is complex enough to benefit from partners who have done this work before. Maccabi Healthcare Services, working with Medial EarlySign (the Israeli company that built ColonFlag and LungFlag), combines both properties in a way no other partner does. Kaiser Permanente in Northern California is the strongest American candidate because of its integrated electronic record and its willingness to engage with algorithm validation work. For the stage I case scarcity problem specifically, the Cancer of the Pancreas Screening Consortium, an international research group led from Johns Hopkins that focuses on high-risk screening, provides a supplementary cohort of early-stage cases that no single health system can match. The consortium’s pancreatic cancer work may need to draw on all three.

6. Ovarian Cancer

Blood signal strength: moderate to strong, surprisingly so given the absence of any current population screening test. The platelet count rises substantially in the eighteen months preceding diagnosis. This happens because ovarian tumors produce a signaling molecule that tells the bone marrow to manufacture more platelets, a process the lab report simply records as a slowly climbing platelet count, first gradually, then accelerating in the six months before diagnosis. Machine learning models using routine blood values have achieved AUC 0.95 to 0.97, numbers that would be considered extraordinary for any screening test in medicine.⁶

Target group: 2,000 to 3,500 confirmed ovarian cancer cases across the consortium over five years. At the incidence rate of roughly 11 per 100,000 annually in women, the consortium yields perhaps 2,500 cases per year. Early-stage cases will be particularly hard to assemble because most ovarian cancers are diagnosed at stage III or IV. The recipe requires Sloan Kettering’s gynecologic oncology referrals plus Weill Cornell’s and NYU’s populations to reach the target, with explicit over-sampling of BRCA-positive women for the earlier-stage end of the distribution.

Control group: 25,000 to 35,000 women aged 40 and over, with representation from obstetrics and gynecology practices across the consortium. Controls must include women with benign ovarian cysts, endometriosis, and uterine fibroids, each of which presents with abdominal symptoms that overlap with ovarian cancer.

Pre-diagnostic blood window: eighteen months to two years, anchored on platelet count trajectory.

The hard part: stage I and II cases are genuinely rare. A consortium with 2,500 ovarian cancers per year will see perhaps 500 stage I cases annually, across all sites combined. The five-year accumulation is tight.

Validation partners

Ovarian cancer’s validation problem is different from every other cancer in this chapter. Stage I cases are so scarce that no single health system has enough of them to validate an algorithm rigorously. The priority validation property is therefore population size, particularly national-registry-scale cohorts that accumulate rare early cases over large populations. The United Kingdom’s Clinical Practice Research Datalink, which covers about ten million British patients with complete primary care records, is probably the single best partner for ovarian cancer algorithm validation in the English-speaking world. The Danish national cancer registry, which achieves essentially complete capture of every cancer diagnosis in Denmark, provides complementary validation with even deeper longitudinal records. Within the consortium’s network, Maccabi Healthcare Services adds demographic breadth (particularly in Ashkenazi Jewish women, who have elevated BRCA prevalence and therefore an enriched early-stage case pool). Because the disease is so rare at stage I, the consortium’s ovarian cancer work should plan for partnership with all three, not one.

7. Kidney Cancer

Blood signal strength: moderate. Declining hemoglobin (from reduced erythropoietin), rising inflammatory markers, and gradually rising creatinine produce a detectable pattern. The Li 2022 proof of concept reached AUC 0.93 at one Chinese hospital using eight routine markers, but has no external validation and a small target group.⁷

Target group: 5,000 to 8,000 confirmed clear cell renal cell carcinoma cases across the consortium over five years. At the incidence of 17 per 100,000 annually, the consortium yields roughly 8,500 cases per year, so the five-year accumulation is comfortably above the target. Early-stage cases, many of which were found incidentally on CT scans done for other reasons, can be concentrated through Sloan Kettering, Weill Cornell, Jefferson, and Northwell’s urology programs. At least 60% of the target group should be stage I, with incidental discovery flagged as a feature in the training data.

Control group: 50,000 to 80,000 adults aged 40 and over, with deliberate inclusion of patients with chronic kidney disease, kidney stones, benign renal cysts, and urinary tract infections. These are the conditions that produce overlapping blood signatures and that the algorithm must learn to distinguish from cancer.

Pre-diagnostic blood window: two to three years, with particular attention to hemoglobin decline even when within the normal range.

The hard part: incidentally discovered cases often lack multi-year pre-diagnostic blood history, because the patient was healthy enough not to be getting frequent labs. The recipe may require sub-sampling from routinely tested populations (patients on statins, patients with hypertension) to ensure adequate pre-diagnostic windows.

Validation partners

Kidney cancer has no established population screening, and no algorithm has yet been validated outside the institution that built it. The priority validation properties are therefore population independence (to establish that any algorithm works beyond a single hospital) and demographic breadth (because kidney cancer incidence varies substantially by race). Kaiser Permanente is the strongest American partner because its urology programs see large volumes of incidentally discovered kidney cancers and its records support the multi-year pre-diagnostic window this recipe requires. The Veterans Affairs system provides a validation population with different laboratory practices and an older, more male-skewed demographic where kidney cancer incidence is elevated. Maccabi Healthcare Services adds international cross-validation with unusually complete urologic workup records. The consortium should aim for publication of the first rigorously externally validated kidney cancer algorithm in the literature, which will require all three partners together.

8. Multiple Myeloma

Blood signal strength: strong and distinctive. The disease’s fingerprint, rising total protein, rising calcium, rising creatinine, and declining hemoglobin and albumin, appears in the standard blood panel two to five years before clinical diagnosis in many patients, a detection window that is longer than for almost any other cancer in this list. Elevated calcium alone carries an odds ratio above eleven for subsequent myeloma.⁸

Target group: 3,000 to 5,000 confirmed multiple myeloma cases across the consortium over five years. At the incidence of 7 per 100,000 annually, the consortium yields roughly 3,500 cases per year, and the five-year window puts the recipe in a comfortable range. Critically, the consortium’s Black patient population (from the Bronx, Brooklyn, and Philadelphia) is essential because myeloma incidence in African Americans is more than twice the rate in whites, and no prior myeloma algorithm has had adequate representation of this group.

Control group: 30,000 to 50,000 adults aged 50 and over, with inclusion of patients with monoclonal gammopathy of undetermined significance (MGUS), chronic kidney disease, and other conditions that elevate calcium or total protein.

Pre-diagnostic blood window: three to five years. Myeloma’s long indolent phase makes this one of the few cancers where deeper EMR history genuinely improves performance.

The hard part: distinguishing myeloma from MGUS, which is technically a precursor state rather than a malignancy and which most algorithms currently classify inconsistently. The recipe must specify whether MGUS cases are in the target group, control group, or excluded entirely.

Validation partners

Multiple myeloma validation is defined by the African American incidence disparity. The priority validation property is demographic breadth in Black populations at a scale that lets subgroup performance be tested rigorously, not merely reported. Howard University Hospital in Washington, Morehouse School of Medicine in Atlanta, and Meharry Medical College in Nashville together have unusually strong research capacity in Black-majority patient populations and would make the consortium’s myeloma work the first to validate adequately in this group. The United Kingdom’s Clinical Practice Research Datalink provides the second validation layer, because Koshiaris and colleagues have built foundational myeloma work on British primary care data and the comparison point is direct. Maccabi Healthcare Services adds international cross-validation. The consortium’s myeloma recipe should not publish without Black-majority population validation; doing so would perpetuate exactly the demographic gaps this book criticizes.

9. Leukemia

Blood signal strength: strong, because the disease originates in blood-forming cells. Chronic lymphocytic leukemia in particular produces a clear trajectory of rising lymphocyte counts years before clinical diagnosis. The Aoki 2025 study trained on over one million patient records reached AUC 0.92 using twelve CBC-derived features plus age and sex.⁹

Target group: 6,000 to 10,000 confirmed leukemia cases across the consortium over five years, spanning acute myeloid leukemia, acute lymphoblastic leukemia, chronic myeloid leukemia, and chronic lymphocytic leukemia. Each subtype should be modeled separately because their blood signatures differ. CLL, the most common adult leukemia and the one with the longest pre-diagnostic window, should form the largest share of the target group.

Control group: 60,000 to 100,000 adults aged 50 and over, with inclusion of patients with reactive lymphocytosis (from viral infections), mild anemia, and other common causes of abnormal CBC findings.

Pre-diagnostic blood window: five to seven years for CLL, two to three years for the acute leukemias which progress faster.

The hard part: ground-truth diagnosis. Leukemia is often inferred from CBC patterns rather than formally coded, particularly indolent CLL. The recipe must use the blood pattern itself, not ICD coding, as the definitive case definition, as the Aoki team did.

Validation partners

For leukemia, the priority validation properties are depth of longitudinal records (for CLL’s multi-year pre-diagnostic window) and completeness of outcome ascertainment (because CLL is often recognized informally rather than coded). The Danish national cancer registry, combined with the Copenhagen primary care laboratory database that Christensen and colleagues used for their hematologic malignancy work, provides the gold standard for both properties. Maccabi Healthcare Services in Israel supplies comparable depth with a different population. Within the United States, Kaiser Permanente’s integrated record supports the required multi-year CBC history and its hematology programs have both the patient volume and the research appetite to serve as validation sites. For the acute leukemias specifically, which progress faster and have different case-finding patterns, Memorial Sloan Kettering’s own acute leukemia cohort may serve better as a development than a validation partner, and the consortium should look to Dana-Farber Cancer Institute in Boston for American external validation.

10. Lymphoma

Blood signal strength: moderate. LDH elevation in more than half of non-Hodgkin lymphoma patients at diagnosis provides a clear biochemical signature, and inflammatory markers plus lymphocyte shifts add to the pattern. A Danish study on 663,000 patients achieved AUC 0.85 at six months before diagnosis.¹⁰

Target group: 5,000 to 8,000 confirmed lymphoma cases across the consortium over five years, combining Hodgkin and non-Hodgkin, with subtypes modeled separately. At the combined incidence of 20 per 100,000 annually, the consortium yields roughly 10,000 cases per year, which is comfortably above the target even allowing for stage distribution concerns.

Control group: 50,000 to 80,000 adults across the full adult age range, with inclusion of patients with HIV, rheumatoid arthritis, and other conditions associated with altered lymphocyte counts or elevated LDH.

Pre-diagnostic blood window: three to five years. The Danish study demonstrated that five-year CBC history improves performance, so the consortium recipe should default to deeper history where available.

The hard part: lymphoma is not one disease. The algorithm must handle heterogeneity across Hodgkin versus non-Hodgkin, aggressive versus indolent, and B-cell versus T-cell origins. Subtype-specific algorithms are likely to outperform a single combined model.

Validation partners

Lymphoma is the cancer where Danish partnership is most valuable, because the foundational Christensen blood-test machine learning work was built on the Copenhagen primary care database and the direct comparison is methodologically important. The Danish national cancer registry also has unusually complete subtype-specific coding, which matters because lymphoma algorithms need to be built separately by subtype. Within the consortium’s reach, Maccabi Healthcare Services provides population independence and has substantial experience with Mediterranean-specific lymphoma subtypes (some of which are rare in the United States). Kaiser Permanente in Northern California serves a large Asian American population that includes patients with NK/T-cell lymphoma subtypes almost absent from European cohorts. The consortium’s lymphoma recipe benefits from all three partners because the disease itself is so heterogeneous that no single population covers the full subtype range.

11. Bladder Cancer

Blood signal strength: moderate. Chronic inflammation from the tumor produces rising inflammatory markers, declining hemoglobin from microhematuria, and shifts in metabolic panel values. The Tsai 2022 study achieved AUC 0.88 to 0.92 using eight routine laboratory values, with controls drawn from patients with other pelvic cancers and cystitis, a clinically realistic design.¹¹

Target group: 4,000 to 6,000 confirmed bladder cancer cases across the consortium over five years. At 17 per 100,000 annually, the consortium yields roughly 8,500 cases per year, so accumulation is straightforward. Early-stage non-muscle-invasive cancer (the majority of bladder cancer) should constitute at least 70% of the target group.

Control group: 40,000 to 60,000 patients with urologic symptoms but without bladder cancer, drawn from urology practices across the consortium. Inclusion of recurrent urinary tract infection patients, benign prostatic hyperplasia patients, and kidney stone patients is essential. A parallel control arm of cystitis patients (the most common clinical mimic) should be analyzed separately.

Pre-diagnostic blood window: eighteen months. The blood signal window is relatively short for bladder cancer compared to the gastrointestinal cancers.

The hard part: microhematuria itself is the most specific clinical sign of bladder cancer, and it is usually detected on urinalysis, not blood work. The consortium must decide whether urinalysis results are in scope for the algorithm. Tsai’s study included urine occult blood among its eight features, which strengthens performance but requires urinalysis integration.

Validation partners

Bladder cancer’s validation needs are conventional: population independence, different laboratory standards, and demographic breadth across the age range where bladder cancer incidence climbs. The Veterans Affairs system is a notably strong partner here because bladder cancer incidence is elevated in populations with occupational chemical exposures, and the VA’s occupational history documentation is more complete than most civilian health systems. Kaiser Permanente provides the integrated-record cross-validation with different laboratory infrastructure. Maccabi Healthcare Services adds international validation with a different population mix. The consortium’s bladder cancer recipe should prioritize the VA partnership because no existing bladder cancer algorithm has been validated on occupationally exposed populations, which is precisely where bladder cancer early detection has the highest value.

12. Esophageal Cancer

Blood signal strength: moderate, with a pattern similar to colorectal and gastric cancers (declining hemoglobin, rising red cell size variation, elevated inflammatory ratios). No large-population machine learning study of esophageal cancer in the ColonFlag or LungFlag style has been published. The recipe below specifies what that study should look like.¹²

Target group: 2,000 to 3,500 confirmed esophageal cancer cases across the consortium over five years. At 4 per 100,000 annually, the consortium yields roughly 2,000 cases per year, so the five-year accumulation is tight. The recipe requires every available case. Both squamous cell and adenocarcinoma subtypes should be modeled separately because their risk populations differ (squamous correlates with alcohol and tobacco, adenocarcinoma with reflux and Barrett’s esophagus).

Control group: 20,000 to 35,000 patients with confirmed Barrett’s esophagus (the precursor to adenocarcinoma) or chronic reflux disease. This mirrors the best practices established in Ke’s gastric cancer study.

Pre-diagnostic blood window: two years, with Barrett’s surveillance pathology as an additional input where available.

The hard part: esophageal cancer is relatively uncommon and case accumulation is the primary bottleneck. Unlike colorectal or gastric cancer, there is no existing validated algorithm to build on. This recipe specifies initial algorithm development, not deployment-scale validation.

Validation partners

Because esophageal cancer has no existing large-population blood-test algorithm, the consortium’s work here is developmental rather than confirmatory, and the validation partners matter slightly less than they do for the other twelve cancers. Publication of the initial development study should precede consortium-wide deployment. When validation does come, the priority properties are demographic breadth across both major esophageal cancer subtypes (squamous and adenocarcinoma) and different geographic cancer profiles. Taiwan’s National Health Insurance Research Database provides validation in a population where squamous cell esophageal cancer is much more common than in the West. The Mayo Clinic’s comprehensive esophageal cancer program, combined with its integrated electronic record, provides American external validation. Maccabi Healthcare Services supplies population independence. The consortium may ultimately need to partner with all three to publish a credibly validated esophageal cancer algorithm, because the disease’s subtype heterogeneity requires each subtype to be validated in the population where it predominates.

13. Thyroid Cancer

Blood signal strength: moderate. TSH is already a standard value on many annual blood panels, and its trajectory over time combined with cholesterol and CBC values has achieved AUC around 0.91 in published work. Thyroid cancer has the smallest mortality burden of the thirteen (roughly 2,000 deaths annually), but the recipe is relevant because the aggressive subtypes have much worse prognoses and early detection of any form enables less invasive treatment.¹³

Target group: 15,000 to 25,000 confirmed thyroid cancer cases across the consortium over five years. Thyroid cancer incidence is high (14 per 100,000 annually) so case accumulation is straightforward; the challenge is avoiding overdiagnosis of indolent papillary cancers. The recipe should deliberately over-represent aggressive subtypes (anaplastic, medullary, poorly differentiated papillary) since these are where early detection matters clinically.

Control group: 150,000 to 200,000 adults on routine TSH monitoring, with inclusion of patients with Hashimoto thyroiditis, subclinical hypothyroidism, and benign thyroid nodules.

Pre-diagnostic blood window: three to five years of TSH trajectory.

The hard part: this is the cancer where overdiagnosis is most likely to produce harm. A sensitive algorithm will flag many patients with small, indolent cancers that would never have harmed them. The recipe must include an explicit subgroup analysis for aggressive-subtype sensitivity, not overall sensitivity, and the clinical protocol must avoid defaulting to total thyroidectomy for every flagged patient.

Validation partners

Thyroid cancer validation is dominated by one specific methodological concern: overdiagnosis. The priority property in a validation partner is therefore not size or diversity but prior experience with the overdiagnosis question. South Korea is the essential partner here, because the global conversation about thyroid cancer overdiagnosis was defined by Korean data after the country’s screening program produced a massive rise in papillary thyroid cancer diagnoses without a corresponding drop in mortality. The South Korean National Health Insurance Service has the longest-running national dataset for studying this phenomenon, and any algorithm validated without Korean cross-checking should be regarded with skepticism. Within the consortium’s network, Maccabi Healthcare Services provides population independence. The Mayo Clinic’s endocrinology program has the American infrastructure for long-term thyroid cancer follow-up and has published extensively on overdiagnosis. The consortium’s thyroid cancer recipe is unusual in that its hardest methodological test is not whether the algorithm works but whether its deployment would cause net harm through overdetection; Korean partnership is the safeguard against that.

What This Chapter Adds

The preceding chapter described what the algorithms can do. This chapter describes what it would take to build or validate one for each of the thirteen cancers, using the specific consortium structure the preceding New York and Northeast Corridor relationships make possible.

A reader who has come this far should be able to open any one of these thirteen recipes, pair it with the nine rules that opened the chapter, and estimate whether a proposed study anywhere in the world is designed rigorously. The variables are limited. The rules are consistent. The hard part, as always, is not deciding what to do. It is finding the institutional commitment to do it.

The consortium described here is not a hypothesis. It is a description of relationships that already exist, assembled at a scale and with a demographic breadth that the field has been waiting for. The recipes in this chapter are the practical instructions for turning those relationships into evidence. The next chapter turns to the consortium itself: how it would work, what it would cost, and what it would produce.

Chapter 8

Beyond Cancer

The Same Blood Draw, the Same Logic, and 300,000 More Lives a Year

Here is a man we will call David. He is 58 years old, works in insurance, and thinks of himself as basically healthy. He has gained some weight over the past decade. He gets winded on stairs. His doctor has mentioned, more than once, that his blood pressure is on the high side and that he should watch what he eats. David agrees, intends to do something about it, and mostly does not.

At his annual physical, his blood work comes back. His fasting blood glucose reads 96. The level at which doctors start to worry is 100. His cholesterol is slightly elevated but not alarming. His kidney function marker, a value called creatinine, reads 1.1, comfortably within the normal range of up to 1.2. His doctor reviews everything, tells David it all looks fine, and suggests he try to lose a few pounds. David goes home.

In ten years, David has his first heart attack. It costs $40,000 to treat, leaves him with permanently reduced heart function, and begins a decade of mounting medical bills and declining quality of life. In fifteen years, he is diagnosed with Type 2 diabetes, a disease that was already developing quietly in his blood the morning his doctor told him everything was fine. In twenty years, his kidneys begin to fail.

None of this had to happen. Every one of these conditions was leaving traces in David's blood years before any diagnosis. His fasting glucose of 96 was not reassurance. It was the opening note of a trajectory that, read alongside his cholesterol trend and his creatinine, told a story that machine learning algorithms can now recognize and act on, years before the crisis arrives.

This chapter is about David's story, and about the millions of Americans like him whose blood has been trying to tell their doctors something for years, in a language medicine has only recently learned to read.

Cancer is not the only killer hiding in your blood work. When you look at what actually kills Americans at scale, it may not even be the biggest one.

The Same Method, an Even Larger Problem

The idea behind the cancer detection algorithms in the previous chapters is not complicated. Chronic disease changes blood chemistry gradually, over months and years, in patterns too subtle for any human reviewing individual test values to notice but entirely readable by a computer trained to recognize them. Cancer does this. So does virtually every other major chronic disease.

Heart disease, Type 2 diabetes, kidney failure, heart failure, sepsis, liver disease: each of these conditions progresses slowly through the body, leaving its mark on the blood at every stage. Each of them produces changes that appear in the same routine blood panels drawn at every annual physical. And each of them, if caught earlier, responds to interventions that are simpler, cheaper, and more effective than the treatments available once the disease has advanced.

Heart disease kills approximately 700,000 Americans every year, more than any other single cause of death. More than half of all people who have a first heart attack had no prior symptoms, meaning their cardiovascular system was failing silently for years before the event that finally announced it. Type 2 diabetes kills 89,000 Americans annually through its direct effects and contributes to hundreds of thousands of additional deaths through its complications, including heart attack, stroke, kidney failure, nerve damage, and limb amputation. Chronic kidney disease sends 130,000 patients onto dialysis every year at a cost of $90,000 per patient per year, and kills 57,000. Heart failure kills 68,000 and leaves most patients with a five-year survival rate of fifty percent, comparable to many cancers. Sepsis, an overwhelming infection response that spirals out of control and shuts down organ systems, kills 270,000 Americans annually and is one of the leading causes of death in hospital intensive care units.¹

Together, these diseases kill more than 1.2 million Americans every year. The combined death toll from the thirteen cancers described in Chapter 4 is approximately 393,000. The non-cancer diseases kill three times as many people.

And for every one of them, the blood has been trying to tell us something.

Why the Blood Speaks So Early

To understand why these diseases show up in blood tests years before a doctor diagnoses them, it helps to think about what the blood actually is. Blood is not simply a delivery system for oxygen. It is the body's internal communication network, carrying hormones, immune signals, waste products, proteins, and metabolic byproducts between every organ and every tissue. When something goes wrong in any part of the body, the blood reflects it, because the blood is connected to everything.

Consider what happens when the kidneys begin to lose function. Healthy kidneys filter waste products from the blood and excrete them in urine. As kidney function declines, waste products begin to accumulate. Creatinine, one of those waste products, rises in the bloodstream. But here is the critical point: creatinine rises gradually, over years, through values that a doctor reviewing a single annual result would consider entirely normal, before it ever crosses the threshold that triggers clinical concern. A creatinine of 0.9 one year, 1.0 the next, 1.1 the year after: each reading looks fine in isolation. The upward trend, read by an algorithm across multiple years of data, is the early warning that kidney disease is developing.

The same logic applies to every disease in this chapter. Heart disease narrows the arteries and strains the heart, producing changes in inflammatory markers and lipid values that accumulate in the blood years before a heart attack. Diabetes begins with gradual insulin resistance that pushes fasting glucose and a marker called hemoglobin A1c upward by fractions each year, staying below the diagnostic threshold for a decade before the disease is finally named. Fatty liver disease alters the ratio of liver enzymes in the comprehensive metabolic panel long before the liver itself shows structural damage on any scan.

What machine learning adds to this picture is not new data. The data has always been there. What it adds is the ability to read the data the way it deserves to be read: as a system, as a trend, as a conversation between values that, taken together, describes what is happening in the body with a clarity that no single number can provide.

The Diseases Already in the Electronic Medical Record

Let us go through the major non-cancer diseases where algorithms trained on routine blood work have demonstrated the ability to detect disease years before conventional diagnosis. For each one, the structure is the same: the disease, what it costs in human lives and dollars when found late, what the blood says in the years before diagnosis, and what algorithms trained on those blood signals have already demonstrated.

Heart Disease and Cardiovascular Risk

Cardiovascular disease is the story of a slow, decades-long process of arterial damage and cardiac strain that medicine has traditionally tried to assess through simple risk scores based on age, blood pressure, cholesterol, and smoking history. Those scores, the Framingham Risk Score being the most widely used, have been the standard tool for decades. They are blunt instruments. They classify tens of millions of people as moderate risk without distinguishing who in that group is actually heading toward a heart attack in the next five years.²

Machine learning changes this picture substantially. Algorithms trained on complete blood panel data, including not just cholesterol levels but the inflammatory markers in the white blood cell count, the metabolic values in the comprehensive panel, and the trajectory of all of these over time, predict cardiovascular events five to ten years before they occur with accuracy that substantially outperforms traditional risk scoring. A neural network trained on the records of more than 378,000 patients in UK primary care correctly identified patients who would have a heart attack with accuracy equivalent to a strong cancer screening test, and did so from the same blood values that those patients had already received at their annual physicals.

The blood markers driving these predictions are not exotic. Red cell distribution width, the same measure of red blood cell size variation that appears in the colorectal cancer signature, independently predicts cardiovascular mortality even in patients with otherwise normal blood counts. The neutrophil-to-lymphocyte ratio, elevated in lung cancer patients, also tracks chronic low-grade inflammation that damages arterial walls over years. Lipid trends across multiple blood draws reveal patterns of risk that a single cholesterol reading cannot capture. None of these values individually alarms anyone. Together, across time, they describe a cardiovascular system under stress years before the stress produces a clinical event.³

The clinical stakes of earlier identification are enormous. Statin therapy reduces the risk of heart attack by 25 to 35 percent in high-risk patients. Blood pressure control reduces the risk of stroke and heart failure by 30 to 50 percent over a decade. These interventions are most effective when started early, before the arterial damage has accumulated to the point where the cardiovascular system has limited capacity for recovery. Identifying a patient as high-risk at 50 rather than waiting for a heart attack at 60 does not merely give that patient ten more years of treatment. It gives them ten more years in which the interventions available to them are actually capable of preventing the catastrophe.

Type 2 Diabetes

Type 2 diabetes is one of the clearest examples of a disease that announces itself in the blood years before diagnosis and where earlier intervention has been proven, in a landmark clinical trial, to prevent the disease entirely. The disease develops over five to ten years through a gradual process of insulin resistance, during which the pancreas struggles to keep blood glucose in the normal range. Fasting blood glucose rises slowly, from 85 to 90 to 95 to 98 milligrams per deciliter, staying below the 100 milligram threshold that technically defines prediabetes. Hemoglobin A1c, the three-month average blood glucose marker that appears on many standard panels, drifts upward by fractions of a percent annually. Triglycerides climb. HDL cholesterol, the protective kind, falls.⁴

No single value triggers a clinical alarm. Every individual reading looks acceptable. But the pattern across all of them, tracked over several years, describes a metabolic system in slow-motion failure that a machine learning algorithm can recognize and flag three to five years before the diagnosis of diabetes is made.

The reason this matters so profoundly is that the window between prediabetes and diabetes is the only window in which the disease can be reliably prevented. The Diabetes Prevention Program, one of the most important clinical trials in the history of preventive medicine, enrolled more than 3,000 prediabetic patients and randomly assigned them to lifestyle intervention, metformin medication, or a placebo. The lifestyle intervention group, which achieved a modest average weight loss of seven percent through diet and 150 minutes of weekly exercise, reduced their rate of developing diabetes by 58 percent. Metformin reduced it by 31 percent. These are not marginal improvements. In the lifestyle intervention group, more than half the people who would otherwise have developed diabetes did not. The intervention costs roughly $3,500 per person. The lifetime cost of treating diabetes and its complications averages $250,000.⁵

The promise of the algorithmic approach is to identify patients in the prediabetic window three to five years earlier than conventional screening catches them, delivering that $3,500 intervention at a point when it can still prevent the disease entirely rather than merely managing it after the fact.

Chronic Kidney Disease

The kidneys are among the most stoic organs in the human body. They can lose more than half their function before a standard blood test catches anything definitively wrong. This is because the kidney function marker creatinine rises only gradually as nephrons, the microscopic filtration units, are destroyed, and the remaining nephrons compensate by working harder. By the time creatinine crosses the diagnostic threshold for chronic kidney disease, much of the damage is already done.

But the trajectory of creatinine, tracked across multiple annual blood draws, tells the story earlier. A creatinine that moves from 0.9 to 1.0 to 1.1 over three years while remaining technically normal is not reassuring. It is a trend. An algorithm that reads that trend alongside declining estimated glomerular filtration rate, rising blood urea nitrogen, and falling albumin can identify patients heading toward kidney failure two to five years before they cross any diagnostic threshold, at a stage when interventions including blood pressure control, dietary modification, and SGLT2 inhibitor medications can meaningfully slow or halt the progression.⁶

The Klinrisk algorithm, developed on a random survival forest model and validated on 4.8 million US adults across commercial, Medicare, and Medicaid insurance populations, has already achieved CE-mark regulatory approval in Europe from Roche for its navify clinical platform, making it the most advanced non-cancer blood algorithm in clinical deployment. It achieves an accuracy score in the good to very good range for predicting kidney failure two years before conventional diagnosis, outperforming the standard clinical risk tool at every time interval tested. The math behind deploying this algorithm at scale is not complicated: the average dialysis patient costs $90,000 per year to treat. Delaying dialysis by five years through earlier detection and intervention saves approximately $450,000 per patient.⁷

Heart Failure

Heart failure is a condition in which the heart muscle weakens over time until it can no longer pump blood efficiently enough to meet the body's needs. It affects more than six million Americans and kills 68,000 per year. Once diagnosed, the five-year survival rate is approximately 50 percent, comparable to several of the cancers described in Chapter 4. And like those cancers, it is a condition that develops slowly, leaving a trail in the blood years before the breathlessness, fluid retention, and fatigue that bring patients to the hospital.⁸

The blood markers that precede heart failure include a protein called NT-proBNP, released by the heart muscle under stress, alongside high-sensitivity troponin, fasting glucose, and kidney function markers -- all values measurable at a routine annual physical. A machine learning model trained on 19,080 adults from four major US population studies predicted who would develop heart failure over the following ten years with an accuracy score of 0.88 to 0.89, substantially outperforming established clinical risk scores. The top predictors were blood values collected at a single routine visit.

The clinical significance of ten-year heart failure prediction is substantial because the interventions that prevent heart failure -- SGLT2 inhibitors, blood pressure control, statin therapy -- are most effective when started before the heart muscle has remodeled itself in response to years of strain. Starting these interventions a decade before heart failure develops, guided by an algorithmic blood test flag, offers the realistic possibility of preventing a disease that currently claims 68,000 American lives per year.⁹

Sepsis

Sepsis is different from the other conditions in this chapter in an important way. The others develop over years. Sepsis develops over hours. It is the body's catastrophic overreaction to infection, a cascade of inflammatory signals that, once triggered, can shut down organ systems within days. It kills 270,000 Americans per year, a higher toll than any single cancer on the list in Chapter 4, and it kills them fast.¹⁰

The algorithmic approach to sepsis is therefore not about detecting a developing disease years before it surfaces. It is about detecting the early warning signs of a life-threatening infection cascade hours before the patient's condition deteriorates to the point where survival becomes doubtful. In hospitalized patients, blood values including white blood cell count, lactate, creatinine, platelets, and bilirubin begin to shift measurably in the hours before sepsis is clinically recognized. An algorithm reading those shifts in real time can flag the patient before the attending physician has connected the dots.

The TREWS algorithm, developed at Johns Hopkins and deployed across five of its hospital campuses, reads electronic health record data in real time and alerts clinical staff when a patient's values match the pattern that precedes septic shock. In a prospective study of more than 6,000 patients, TREWS reduced relative sepsis mortality by 18.7 percent compared to the standard clinical alert system. That is not a marginal improvement. In a disease that kills 270,000 Americans annually, an 18.7 percent mortality reduction means roughly 50,000 lives saved per year if deployed at scale across American hospitals. TREWS is not a hypothetical. It is running right now, in real hospitals, on real patients.¹¹

Figure 7. Detection lead time: how far in advance algorithms can identify disease before conventional diagnosis.

The Diseases Doctors Almost Never Catch in Time

Some of the most compelling opportunities in the beyond-cancer program involve conditions that are not only underdiagnosed but almost universally diagnosed too late: conditions where the blood carries a highly specific signature for years before the disease causes serious harm, and where the treatment, when given early, is simple and completely effective.

Hemochromatosis is a genetic disorder in which the body absorbs too much iron from food, gradually accumulating it in the liver, heart, pancreas, and joints until organ damage becomes irreversible. It affects roughly one in two hundred people of Northern European descent, making it one of the most common genetic disorders in the United States. It is almost always diagnosed after organ damage has occurred. And yet the metabolic panel tells the story five to ten years earlier: transferrin saturation and ferritin levels rise steadily, liver enzymes trend upward, and glucose climbs as iron deposits damage the insulin-producing cells of the pancreas. The treatment is phlebotomy, the therapeutic removal of blood, essentially the same procedure as blood donation, costing a few dollars per session. Started early, it prevents every complication of the disease. It is among the highest-value, lowest-cost interventions in all of medicine.¹²

Familial hypercholesterolemia is a genetic disorder affecting one in every 250 people that causes dramatically elevated LDL cholesterol from birth or early childhood. People with this condition suffer heart attacks in their thirties and forties at rates that are shocking by any standard. The disorder is massively underdiagnosed: most people who carry the gene are walking around with LDL levels consistently above 190 milligrams per deciliter, visible on every lipid panel they have ever received, and have never been told why. Aggressive statin therapy started in the twenties or thirties prevents cardiovascular disease almost entirely. An algorithm flagging the distinctive lipid trajectory of familial hypercholesterolemia across population-level blood data could prevent an estimated 10,000 to 15,000 premature cardiac deaths per year.¹³

Addison's disease is a failure of the adrenal glands to produce cortisol, the hormone that helps the body respond to stress. Without it, an infection, injury, or surgical procedure can trigger an adrenal crisis -- a sudden collapse into shock that kills without warning. The disease is rare, but its catastrophic presentation is almost entirely preventable: the comprehensive metabolic panel shows gradually declining sodium, rising potassium, and unstable glucose levels for months to years before the crisis. Hormone replacement therapy, started early, prevents every serious complication. The tragedy is that patients typically receive the diagnosis only after they collapse, because no one recognized the pattern in the blood tests they had already had.

Two Algorithms Already Deployed

The diseases described in this chapter are not waiting for science to catch up. Two algorithms targeting non-cancer conditions are already deployed in clinical practice, with results that parallel what ColonFlag and LungFlag have demonstrated for cancer.

Klinrisk for chronic kidney disease received CE-mark regulatory approval from Roche in October 2025 for deployment on the navify clinical platform. It has been validated on 4.8 million US adults across commercial, Medicare, and Medicaid populations and achieves accuracy scores in the good to very good range for predicting kidney failure two years before conventional diagnosis. It is the most commercially advanced non-cancer blood algorithm in the world and represents exactly the kind of pathway -- development, validation at massive scale, regulatory approval, and commercial partnership -- that the remaining algorithms need to follow.¹⁴

TREWS for sepsis is deployed across five Johns Hopkins hospital campuses and has demonstrated an 18.7 percent relative mortality reduction in a prospective clinical study. It reads blood values in real time, alerts clinical staff to the early warning patterns of septic shock, and has been shown to change physician behavior in ways that save lives. These are not test deployments or pilot programs. They are operational clinical systems producing measurable patient outcomes.¹⁵

The fact that two non-cancer algorithms have already reached clinical deployment with demonstrated patient benefit is significant for a specific reason: it confirms that the methodology is not limited to cancer. The same approach -- train a machine learning algorithm on longitudinal blood records from a large patient population, validate it on an independent population, integrate it into clinical workflows, measure the outcomes -- works for any disease that leaves a blood signature. And virtually every major chronic killer does.

Figure 9. Estimated deaths preventable annually across 14 non-cancer conditions. Total: ~300,000 to 500,000 per year. Assumes 50% population penetration.

The Combined Opportunity

Now step back and look at the full picture.

The thirteen cancers in Chapter 4 kill approximately 393,000 Americans per year. Conservative estimates, assuming algorithmic blood test analysis reaches half the adult population and performs at the level already demonstrated by ColonFlag and LungFlag, project that 100,000 to 175,000 of those deaths could be prevented annually. The non-cancer diseases described in this chapter kill more than 1.2 million Americans per year. Applying the same conservative assumptions to those diseases, where the evidence for early intervention is in many cases even stronger than it is for cancer, projects that an additional 300,000 to 500,000 deaths could be prevented annually.¹⁶

Add those numbers together. Conservative estimates suggest that algorithmic analysis of routine blood tests could prevent 400,000 to 675,000 American deaths per year. That represents 13 to 22 percent of all annual deaths in the United States. To put that in perspective, the elimination of all deaths from motor vehicle accidents would save approximately 40,000 lives per year. The algorithmic blood test program, at conservative estimates, would save ten to seventeen times as many.

And it would do so using blood tests that are already being drawn. No new needles. No new equipment. No new patient behavior. No new clinical infrastructure. Just a smarter way of reading data that is already being collected, at a rate of approximately 200 million blood panels per year in this country.

Figure 10. Combined annual impact of cancer and non-cancer algorithmic early detection. Total: 400,000 to 675,000 deaths preventable per year (13\u201322% of all U.S. annual deaths). Assumes 50% population penetration.

David, the man we met at the beginning of this chapter, never had to have his heart attack. His trajectory was readable in his blood for years before it arrived. The glucose at 96, the creatinine at 1.1, the inflammatory markers in his white blood cell count: together they described a future that medicine had the tools to change but lacked the analytical framework to see.

We have that framework now. The algorithms for David's diseases, like the algorithms for the thirteen cancers, are built and validated. Some are already deployed and saving lives. What remains is the decision to use them at the scale the problem demands.

Chapter 9

From Flag to Finding

What Happens After the Blood Test Raises an Alert

Imagine you are a 57-year-old woman who went in for a routine annual physical. You had your blood drawn, as you always do. A week later, your doctor calls. The routine blood test result was flagged by an algorithm. Your platelet count has been climbing slowly for more than a year. Your inflammatory markers have shifted. Together, these changes match a pattern seen in patients who later developed ovarian cancer.

You feel perfectly fine. You have no symptoms. Nothing hurts.

Your doctor has an alert. Now what?

That question -- what happens after the flag -- is what this chapter answers. A flag is not a diagnosis. It is a signal that something in your blood warrants a closer look. But knowing that something warrants a closer look is only useful if medicine has a specific next step to offer. For each of the thirteen cancers, that next step exists. This chapter walks through it in plain terms.

Two tools appear repeatedly across the thirteen pathways, and both are worth understanding before we begin.

Two Tools That Appear Throughout

The first is a targeted MRI scan. Most people have had an MRI of a specific body part at some point -- a knee, a spine, a brain. A targeted cancer confirmation MRI works the same way: powerful magnets and radio waves produce detailed images of internal organs, with no radiation involved. You lie still for thirty to sixty minutes. You walk in healthy and walk out the same way, ideally with answers.

What makes modern MRI so valuable for early cancer confirmation is its precision. Today's high-powered scanners can detect solid tumors as small as a pea, roughly five to ten millimeters across. At that size, for most cancers, surgery is straightforward and cure rates are high. The same tumor, found a year later when it has grown to three or four centimeters, may have already spread to nearby tissue, and the situation is far more complicated. Getting the right scan at the moment the blood flag appears can be the difference between a cancer that is curable and one that is not.

Figure 5. Full-body MRI screening performance compared to mammography. Source: Prenuvo Polaris study (n=1,011).

The second tool is called a liquid biopsy. A conventional biopsy means a physician inserts a needle or performs a small surgical procedure to remove a piece of tissue from a suspected tumor. A liquid biopsy requires nothing more than a blood draw. It works by detecting tiny fragments of tumor DNA that break off from cancer cells and float in the bloodstream. These fragments carry the specific genetic fingerprint of the cancer, allowing a laboratory to identify what kind of tumor is present. Liquid biopsy cannot yet find very small tumors on its own, but it plays a vital supporting role: when a scan result is uncertain, a positive liquid biopsy can confirm that cancer is genuinely present before any more invasive step is taken.¹

With these two tools in mind, here is what happens after a blood flag for each of the thirteen cancers.

The Thirteen Pathways

1. Colorectal Cancer

The path from blood flag to confirmation for colorectal cancer is the simplest and most established of any cancer on this list. The next step is a colonoscopy, a procedure in which a physician uses a flexible camera to look directly inside the colon, identify any abnormal growths, and remove them on the spot. What makes this pathway particularly powerful is that the confirmation tool is also the treatment tool. A colonoscopy that finds an early-stage cancer or a precancerous polyp can deal with it immediately, in the same visit. No second appointment. No surgery. The ColonFlag algorithm, when deployed at a Pennsylvania health system, produced eight times the cancer detection rate of standard screening. Eight times, from the same blood test patients already had.²

2. Lung Cancer

For lung cancer, the standard first imaging step is a low-dose CT scan of the chest, a quick scan that uses far less radiation than a conventional CT and can detect nodules as small as a grain of rice. When the CT shows a shadow that is uncertain, a targeted MRI adds clarity without more radiation. A blood-based liquid biopsy can then look for tumor DNA carrying the genetic mutations most common in lung cancer. If tissue is needed for a definitive answer, a thin needle guided by the CT scan, or a small camera passed through the airway, collects a sample. The LungFlag algorithm identifies forty percent of future lung cancer patients nine to twelve months before their conventional diagnosis, a detection lead time during which surgery is still curative for most.³

3. Liver Cancer

Liver cancer confirmation uses a specialized MRI protocol with a contrast agent designed specifically to highlight liver tissue. Radiologists read the result using a standardized scoring system, and for patients with known liver disease, certain MRI findings are considered definitive for liver cancer without any biopsy at all. For findings that are less clear-cut, liquid biopsy adds a blood-based confirmation layer. The blood algorithm detects abnormal liver chemistry three to eighteen months before a doctor would normally notice anything, meaning the MRI is looking at lesions the size of a small pea rather than a golf ball. At that size, a range of local treatment options, from targeted heat ablation to surgical removal, remain available.⁴

4. Gastric Cancer

Stomach cancer confirmation mirrors colorectal cancer in its directness. A physician passes a flexible camera through the mouth and into the stomach, examines the lining with high-resolution specialized lighting, and takes a targeted biopsy of any suspicious area. Like colonoscopy, upper endoscopy is both the confirmation and, for early lesions, sometimes the cure: the abnormal tissue can be removed in the same procedure. For the small number of patients who cannot safely undergo endoscopy, liquid biopsy carrying specific genetic markers associated with gastric cancer provides a supplementary blood-based path to confirmation.⁵

5. Pancreatic Cancer

Pancreatic cancer is the most challenging on this list, and the reason the blood algorithm matters so much for this disease is the exceptionally long warning it provides. The pancreas sits deep in the abdomen, surrounded by major blood vessels and other organs, making it difficult to reach from outside the body. But the blood algorithm detects the glucose disruption that a developing pancreatic tumor causes two to three years before a conventional diagnosis. A specialized MRI that visualizes the pancreatic duct can detect subtle duct changes before the tumor itself is clearly visible. A physician can then pass an ultrasound device through the stomach wall to sit right next to the pancreas and sample a lesion as small as five to eight millimeters. A liquid biopsy looking for the KRAS genetic mutation, present in more than ninety percent of pancreatic cancers, can add blood-based confirmation when imaging findings are subtle. Two to three years of warning is a long time. It is time that could, if used well, turn one of medicine's most feared diagnoses into a curable one.⁶

6. Multiple Myeloma

Myeloma is a cancer of the plasma cells, the antibody-producing cells of the immune system. Because it originates in the blood rather than a solid organ, its confirmation pathway looks different from all the others. The blood algorithm detects the protein changes and elevated calcium that precede myeloma by two to five years. The first confirmation step is not a scan but a blood test called serum protein electrophoresis, which looks for the abnormal protein that myeloma cells produce. If that test is positive, a whole-body MRI identifies focal areas of abnormal bone marrow activity as small as five millimeters. A bone marrow biopsy at the most active sites provides the final pathological diagnosis. The myeloma pathway is one of the most favorable on this list because of how early and how clearly the blood speaks.⁷

7. Leukemia

Leukemia originates in the blood, which makes its confirmation pathway the most direct of any cancer in this group and the least invasive. No organ-targeted MRI is needed. For the most common adult leukemias, a blood test called flow cytometry, which analyzes the specific types and proportions of blood cells in a standard draw, confirms the diagnosis. For certain leukemia types, a specific genetic test of the blood identifies the defining mutation without any surgical procedure whatsoever. Only the more aggressive acute leukemias require a bone marrow biopsy for precise subtype classification. In a chapter full of needles, cameras, and scans, leukemia stands apart: a blood flag leads to a blood test that leads to a diagnosis.⁸

8. Lymphoma

Lymphoma develops in the lymph nodes, the small glands distributed throughout the body that help fight infection. A targeted MRI of the chest, abdomen, and pelvis using a special imaging sequence that highlights abnormally dense tissue can map lymph node involvement across the entire body without radiation, detecting suspicious nodes as small as five to seven millimeters. Liquid biopsy can confirm the presence of tumor DNA in the blood and track how much cancer is present. The definitive confirmation requires removing an entire lymph node rather than just a core needle sample, because the internal structure of the node is needed to classify the lymphoma correctly. This is a minor outpatient surgical procedure, far less significant than what advanced lymphoma requires.⁹

9. Ovarian Cancer

This is one of the two genuinely difficult pathways on this list, and it deserves honesty. A targeted pelvic MRI can detect ovarian masses as small as five to eight millimeters and is more accurate than standard ultrasound for complex findings. Serial CA-125 blood marker measurements and liquid biopsy add supplementary confirmation. The hard part is obtaining tissue. A small ovarian mass in an otherwise healthy woman cannot usually be safely reached with a needle through the skin. The ovary is not easily accessible from outside the body, and a definitive biopsy most often requires a minimally invasive laparoscopic surgical procedure performed under general anesthesia. That is a real burden for a patient who does not yet have a confirmed diagnosis. Medicine cannot pretend otherwise. What it can say is this: a small ovarian mass found early by an algorithm, requiring a laparoscopic procedure to confirm, is a far better situation than ovarian cancer found at Stage IV, which is where most women find it today because there is currently no other way to find it sooner.¹⁰

10. Kidney Cancer

Kidney cancer confirmation uses a targeted MRI that is particularly sensitive for small kidney masses, outperforming CT scans for lesions below one centimeter. The imaging result is read using a standardized classification system that guides the decision between proceeding immediately to biopsy or continuing with surveillance. A needle biopsy guided by imaging, in which a physician inserts a thin needle through the skin to collect a tissue sample, achieves diagnostic accuracy of ninety to ninety-five percent. Because the blood algorithm elevates the probability that any kidney abnormality is genuinely cancerous, physicians act at smaller lesion sizes than they would in unselected patients. Liquid biopsy carrying genetic markers specific to kidney cancer provides supplementary blood confirmation.¹¹

11. Bladder Cancer

Bladder cancer has a notably accessible confirmation pathway because the bladder can be directly examined without any incision. A targeted pelvic MRI assesses whether a tumor has invaded the muscle layer of the bladder wall, which is the critical decision point for treatment planning. A urine-based liquid biopsy, which detects tumor DNA shed directly into the urine, achieves sixty to eighty percent sensitivity at early stage and serves as a straightforward non-invasive first step. A cystoscopy, in which a physician passes a thin camera through the urethra, allows direct visualization and tissue sampling of lesions as small as one to two millimeters. For early-stage bladder cancer found this way, the same procedure that removes the tumor from the bladder lining is simultaneously the diagnosis and the treatment.¹²

12. Esophageal Cancer

Esophageal cancer confirmation uses an upper endoscopy with specialized lighting that can highlight subtle changes in the lining of the esophagus, combined with targeted biopsy of any suspicious area. A liquid biopsy carrying specific genetic mutations present in more than seventy percent of esophageal cancers provides supplementary blood confirmation. For early-stage lesions found in the window the blood algorithm opens, the endoscopic removal of the abnormal mucosal layer -- a procedure performed through the same camera -- can be both the diagnosis and the cure in a single session. The confirmation step is the treatment step. That is early detection working exactly as it should.¹³

13. Thyroid Cancer

The thyroid sits just below the skin at the front of the neck, which makes it one of the most accessible organs for imaging. Neck ultrasound provides excellent visualization, and a standardized scoring system guides the decision about which nodules are concerning enough to warrant further investigation. The biopsy is a fine-needle aspiration guided by the ultrasound image, a quick outpatient procedure that takes only minutes. For nodules whose cellular samples come back ambiguous, specialized genetic tests can reclassify them as benign or malignant with high accuracy, substantially reducing unnecessary thyroid surgery. Liquid biopsy provides supplementary blood confirmation for cases where the diagnosis remains uncertain.¹⁴

What Happens When the Scan Finds Nothing

Every physician who encounters this framework will ask a reasonable question: what do we do when the blood algorithm flags a patient as high risk and the subsequent scan comes back clear?

A clear scan does not mean the algorithm was wrong. It means the tumor, if one is developing, is too small to see on today's imaging. That is important and actionable information, not a dead end.

The protocol for a flagged patient with a clear scan is active surveillance. The blood algorithm runs again every three months. A liquid biopsy is checked for tumor DNA in the bloodstream. A repeat scan is scheduled at six months. If the liquid biopsy comes back positive in a patient whose scan is clear, that is strong evidence of a tumor too small to see yet, not evidence that the flag was a false alarm. If the liquid biopsy is also negative, the probability that cancer is present drops, and the surveillance interval can be extended. A large study of whole-body MRI in more than a thousand asymptomatic patients found that 99.8 percent of those with a clear scan remained cancer-free for at least a year, which tells us that a negative scan in a flagged patient provides real reassurance, even if it is not absolute certainty.¹⁵

The logic here is straightforward. An algorithm that flags a patient eighteen months before a conventional diagnosis is flagging a tumor that may be only five to ten millimeters right now. Modern MRI can see lesions at that size for most organs. A negative scan in a flagged patient means the cancer, if present, is either smaller than five millimeters or has not yet formed a solid mass that imaging can detect. Regular surveillance catches it as it grows. The patient is not left with a frightening alert and no follow-up plan. The patient is in a structured monitoring program designed to find the disease at the earliest possible stage.

Two Problems That Remain Hard

Two places in this framework remain genuinely difficult, and naming them clearly is part of what makes the rest credible.

Ovarian cancer, as described above, often requires laparoscopic surgery to obtain tissue confirmation when a suspicious mass is found. That is a real procedural burden. It demands careful patient communication, honest shared decision-making, and clear protocols for which patients should proceed to surgery versus continue under surveillance. Medicine is working to make this step less invasive, and progress is being made, but it is not there yet.

Pancreatic cancer in its earliest phase presents a different challenge. The blood algorithm can detect the metabolic signals of a developing pancreatic tumor two to three years before diagnosis, often when the tumor is smaller than five millimeters. At that size, tissue sampling through any route is technically demanding and not always definitive. In those cases, the best approach is a combination of a sustained blood algorithm flag, a rising liquid biopsy signal carrying the KRAS mutation that is almost universal in pancreatic cancer, and close surveillance imaging while waiting for the tumor to reach a size where confirmation is more reliable.

Neither of these problems is a reason to withhold the algorithms. They are well-defined clinical challenges, actively being worked on, that are categorically better than the alternative: finding ovarian and pancreatic cancer at Stage IV, when no clever confirmation pathway is needed because the disease has already made itself unmistakably known, usually by then beyond any realistic hope of cure.

The Pipeline Is Real

Return to the woman at the beginning of this chapter. She is 57, feels fine, and has just been told her blood test was flagged for ovarian cancer risk.

She is frightened. That is understandable. But consider what medicine now has to offer her. A targeted pelvic MRI, possibly combined with a liquid biopsy blood test, can determine within days whether there is a suspicious lesion in her ovaries. If there is, it can be characterized with precision and, if necessary, confirmed through a minimally invasive laparoscopic procedure. If there is not, surveillance continues on a regular schedule. Either way, she is not sent home with vague reassurance and no plan. She is in a clinical pipeline that is looking for her cancer at a stage when it can still be cured.

That is what the thirteen pathways in this chapter collectively represent. The blood algorithm is not the end of the story. It is the opening of a structured clinical sequence that, for most of the thirteen cancers, leads from a blood draw, to a scan, to a confirmation, to a treatment, at a point in the disease when treatment is most likely to work.

The blood test raises the flag. The pathway converts the flag into a finding. The finding leads to treatment at the stage where medicine can still make a difference.

That is the whole argument of this book, made concrete.

Chapter 10

The Consortium Model

How We Get From Here to There

The previous seven chapters have made a case. The case is that machine learning algorithms trained on routine blood tests can detect thirteen cancers and a dozen additional life-threatening diseases months or years before conventional medicine would find them, that two of those algorithms are already deployed and saving lives, that the clinical pathways from flag to confirmation to treatment are specified and workable, and that the combined opportunity could prevent 400,000 to 675,000 American deaths every year.

A reasonable reader arrives at this point and asks a reasonable question. If all of this is true, why isn't it already happening everywhere?

The answer is not that the science is unproven. It is proven. The answer is not that the technology doesn't exist. It exists. The answer is that deploying a new clinical tool at scale, across a healthcare system as large and fragmented as America's, requires more than a validated algorithm. It requires institutional commitment, regulatory navigation, workflow integration, clinical protocol development, and the willingness of major health systems to take on the organizational work of making something new a standard part of care.

None of that is easy. But all of it is knowable, and the path through it has already been walked. This chapter describes that path, and the specific model that can compress what might otherwise be another decade of delay into a period of two to three years.

The Template Already Exists

ColonFlag and LungFlag did not appear fully formed in clinical practice. They followed a specific pathway from idea to deployment, a pathway that every algorithm on the list in Chapter 4 needs to follow, and that is now well enough understood to be replicated deliberately rather than rediscovered from scratch.

The pathway has five steps, and it is worth naming them clearly before describing how the consortium model accelerates them.

Step one is retrospective development. Scientists take a large archive of patient blood records, some of whom went on to develop a specific cancer, and train a machine learning algorithm to recognize the patterns that distinguished those patients from the ones who stayed healthy. This is the step that has already been completed, in peer-reviewed published studies, for all thirteen cancers and most of the non-cancer diseases in Chapter 6.

Step two is external validation. The algorithm is tested on a completely different patient population from the one used to build it, to confirm that the patterns it learned are real and not specific to one health system or one demographic group. ColonFlag was validated in Israel, then in the United Kingdom, then in the United States, with comparable results each time. LungFlag was validated across nearly 200,000 patients at Kaiser Permanente. For the algorithms that have not yet reached this step, this is where the consortium model begins its work.

Step three is clinical integration. The algorithm is embedded into the laboratory reporting workflow of a health system so that it runs automatically on every eligible blood result, without requiring any additional action from the physician or the patient. At Geisinger, ColonFlag generated an alert in the electronic medical record. The physician received a note. The patient received a phone call. No new equipment. No new appointments. No new anything except a smarter reading of data that was already there.

Step four is protocol development. Clinical teams agree on exactly what happens when a patient is flagged: what follow-up test is ordered, how quickly, by whom, and what happens if the patient declines. These protocols do not need to be invented from scratch. They can be adapted from the confirmation pathways described in Chapter 7.

Step five is outcome measurement and publication. The health system tracks what happens to flagged patients over time, compares those outcomes to what would have been expected without the algorithm, and publishes the results. This is the step that turns a local clinical program into evidence that persuades the next health system to implement, the next professional society to endorse, and the next insurer to cover. The published deployment data from Maccabi, Geisinger, and the United Kingdom are the reason the ColonFlag story has spread as far as it has.¹

Every algorithm on the list needs to walk these five steps. The question is how long each step takes and how many algorithms can advance simultaneously. Left to its own momentum, the medical system tends to advance one algorithm at a time, one health system at a time, sequentially. The consortium model changes that.

What a Consortium Actually Is

A consortium, in this context, is a formal partnership among four to five major health systems that agree to work together on the prospective validation and clinical deployment of blood-based cancer detection algorithms. Each health system contributes patient data, clinical expertise, and institutional infrastructure. The consortium shares the results across all partners simultaneously, so that a validation study conducted at one institution informs the protocols at all the others in real time rather than after a year's delay in the publication cycle.

Why four to five health systems rather than one? Because the central weakness of any single-institution study is generalizability. A result from Geisinger's rural Pennsylvania population may not apply in the same way to an urban academic medical center in New York or a community hospital network in the South. A consortium that spans different demographic groups, different geographic regions, and different patient populations produces evidence that is far harder to dismiss as specific to one context. It also produces the diversity of training data that makes the algorithms themselves more robust.

The data exists at exactly the right institutions for this work. Kaiser Permanente, with more than 12 million members and decades of linked electronic health records, is the kind of institution where LungFlag was validated precisely because the data infrastructure to do it existed. Maccabi Healthcare Services in Israel, with 2.5 million members, is where ColonFlag was born. Geisinger, with more than 500,000 patients and a culture of precision health research, is where it was deployed in the United States. These institutions have already demonstrated that they can do this work. A consortium that adds Weill Cornell Medicine, Memorial Sloan Kettering, and two or three comparable systems to the mix creates a partnership with the scale, the diversity, and the institutional credibility to move all of the remaining algorithms through validation and deployment simultaneously.²

I sit on the boards of Weill Cornell Medicine and Memorial Sloan Kettering. I have watched these institutions move slowly when left to their own momentum, and move quickly when the right leaders decide something matters enough to push. The question is not whether these institutions are capable of this work. They demonstrably are. The question is whether the decision to prioritize it has been made.

This book is, among other things, an argument for making that decision.

What the Consortium Needs to Do

A consortium of four to five health systems working in parallel rather than sequentially could advance multiple algorithms simultaneously. Here is what that work looks like in practice.

For the two algorithms already deployed -- ColonFlag and LungFlag -- the consortium's job is expansion. ColonFlag currently reaches a fraction of the eligible population in the health systems where it runs. LungFlag has completed validation but has not been widely integrated into clinical workflows. Expanding both to full coverage within consortium member health systems, and publishing the outcomes data from that expansion, creates the real-world evidence base that accelerates adoption at other institutions.

For the algorithms validated in large populations but not yet deployed -- primarily liver and gastric cancer detection -- the consortium's job is clinical integration. The science is done. The confirmation pathways are specified. What is needed is the organizational work of embedding the algorithms into laboratory reporting systems, training clinical staff on the protocols, and tracking outcomes. This is not a two-year randomized controlled trial. It is a six to twelve month implementation project at institutions with the infrastructure to do it.

For the algorithms with published peer-reviewed models but not yet validated in large diverse populations, the consortium's job is the prospective validation study. This means running the algorithm on current patients in real time, following those patients forward, and measuring whether the flagged patients do in fact develop cancer at higher rates and whether earlier detection leads to better outcomes. The design for this kind of study is well established. The ColonFlag validation at Geisinger is the template. A consortium of five health systems conducting simultaneous prospective validations across different patient populations could complete the validation step for the remaining algorithms in two to three years rather than the decade or more that sequential single-institution development would require.³

The Regulatory Path

A question that health policy audiences always raise at this point is: what about the FDA? Don't these algorithms need regulatory approval before they can be deployed clinically?

The answer is nuanced, and the nuance matters.

The FDA regulates software as a medical device, a category that includes clinical decision support tools like cancer detection algorithms. The relevant regulatory pathway is the 510(k) clearance process, which requires demonstration that a new device is substantially equivalent in safety and efficacy to a legally marketed predicate device. ColonFlag received FDA clearance through this pathway. The Klinrisk kidney disease algorithm received CE-mark approval in Europe from Roche in October 2025 and is advancing toward FDA clearance in the United States. These precedents are important: they demonstrate that the regulatory pathway exists, has been navigated successfully by algorithms in this family, and does not require the decade-long process that a new drug approval demands. A 510(k) clearance for a well-validated algorithm typically takes twelve to eighteen months.⁴

For consortium validation studies, the regulatory consideration is slightly different. A prospective study in which an algorithm is used for research purposes, with results reported to physicians but not yet constituting a clinical recommendation, can proceed under institutional review board oversight without 510(k) clearance. The clearance comes after the validation data is complete, as part of the submission to the FDA. This is the sequence ColonFlag and LungFlag followed, and it is the sequence the consortium should follow for the remaining algorithms.

The regulatory path is not a wall. It is a road with known mile markers. The consortium knows where those markers are because two algorithms in this family have already reached the end of the road.

The Cost of the Work

What does it actually cost to run the consortium, and where does the funding come from?

The computational cost of running a trained machine learning algorithm on a routine blood panel is measured in fractions of a cent per patient. That is not a misprint. The algorithm is a mathematical model. Once trained and validated, running it on a new blood result requires a small amount of computing power and a fraction of a second of processing time. For a health system running 500,000 blood panels a year, the computational cost of applying a cancer detection algorithm to each one is in the range of a few thousand dollars annually. That is less than the cost of a single MRI scan.⁵

The costs that are not trivial are the upfront costs of validation and integration. A prospective validation study requires research staff, data management infrastructure, regulatory coordination, and clinical protocol development. A full validation study at a single large health system, comparable to the Geisinger ColonFlag deployment, costs in the range of two to five million dollars, depending on the cancer and the population size.

A consortium of five health systems validating ten algorithms simultaneously would require an investment in the range of $100 to $250 million over three years. That sounds large until it is placed alongside the numbers in the next chapter. Dialysis for a single kidney failure patient costs $90,000 per year. A heart attack costs $40,000 to treat acutely, plus ongoing care that can exceed $200,000 over a decade. Treating a late-stage pancreatic cancer costs hundreds of thousands of dollars, buys three percent survival, and generates months of suffering. The investment required to validate and deploy the algorithms that could prevent these outcomes is not large relative to what we currently spend managing the diseases we could have caught earlier.

The funding mechanisms available for this work are multiple. Federal research grants through the National Cancer Institute and the National Institutes of Health fund exactly this kind of prospective validation work. Philanthropic investment from individuals and foundations with the scale to make transformational commitments -- the kind that built cancer centers and endowed research programs -- is entirely appropriate for a program with this scope and these stakes. Industry partnerships with diagnostic companies including Roche, Siemens Healthineers, and others already engaged in this space can contribute both funding and technical expertise. Roche's partnership with the Klinrisk kidney disease algorithm, which led to CE-mark approval in 2025, is the model for what a commercial partnership in this space can achieve.⁶

What the Consortium Produces

A consortium of four to five major health systems working together for three years on the validation and deployment of blood-based cancer and disease detection algorithms would produce four things that do not currently exist.

First, it would produce prospective clinical validation data for the remaining algorithms in the thirteen-cancer program, in diverse patient populations across different geographic regions and demographic groups. That data would be publishable, persuasive, and sufficient to support FDA clearance applications for the algorithms that have not yet received them.

Second, it would produce operational deployment at scale. Instead of ColonFlag running in three health systems and LungFlag waiting in the queue, the consortium would have twelve to fifteen algorithms running simultaneously across institutions serving tens of millions of patients. The number of cancers caught early, and the number of patients whose outcomes are changed by earlier detection, would be measurable in the first year of full deployment.

Third, it would produce the clinical protocol infrastructure that every other health system in the country needs before it can implement. When a hospital administrator in Memphis or a health plan executive in Denver asks how to deploy ColonFlag, the answer today is: look at what Geisinger did. When the consortium completes its work, the answer will be: here is the validated protocol, here is the integration template, here is the training program for clinical staff, here is the outcome tracking system. Replication becomes dramatically easier when the first mover has done the hard work of figuring out how.

Fourth, it would produce the evidence needed to change coverage policy. The major drivers of healthcare adoption in the United States are insurance coverage decisions and professional society endorsements. When the U.S. Preventive Services Task Force endorses a screening test, insurers are required to cover it. When a major professional society like the American Society of Clinical Oncology incorporates an algorithm into its clinical practice guidelines, oncologists across the country change their practice. These endorsements require prospective clinical evidence from large, diverse, well-conducted studies. The consortium produces exactly that evidence, for every algorithm in the program, simultaneously.⁷

The New York Opportunity

New York City is home to several of the finest medical institutions in the world. Memorial Sloan Kettering Cancer Center, the most specialized cancer hospital on earth. Weill Cornell Medicine, one of the top academic medical schools in the country. NewYork-Presbyterian Hospital, one of the largest and most comprehensive hospital networks in America. NYU Langone Health. Mount Sinai Health System. Northwell Health, the largest health system in the state, with 21 hospitals and more than 800 outpatient facilities serving millions of patients across New York, New Jersey, and Connecticut.

These institutions share a city, often share patients, and collectively possess a concentration of medical expertise, research infrastructure, electronic health record data, and philanthropic support that does not exist anywhere else in the country. They also share a common problem: the fragmentation of American healthcare means that each of them is, in important ways, competing with the others, which makes formal collaboration feel counterintuitive.

But the collaboration required for a cancer detection consortium is not the kind that threatens competitive position. No health system loses patients or revenue by participating in a shared validation study for a cancer detection algorithm. Every health system gains: validated algorithms, published evidence, improved patient outcomes, and the institutional reputation that comes from leading a program that saves tens of thousands of lives.

The New York institutions have collaborated before on research programs, clinical trials, and public health initiatives. The infrastructure for data sharing under appropriate privacy protections exists. The regulatory frameworks are understood. What has been missing is not capability but the specific proposal -- the concrete ask, backed by the evidence assembled in this book -- that makes the case for moving from collaboration as an aspiration to consortium as an operational reality.

That is what REDI is building. And New York is where it starts.

The Timeline

What does the consortium model look like as a timeline?

In year one, the consortium forms. Four to five health systems sign data use agreements and research collaboration frameworks. Technical teams integrate existing validated algorithms -- ColonFlag and LungFlag at minimum -- into the laboratory reporting workflows of each member institution. Prospective validation study designs are finalized for the next tier of algorithms, those with large published validation studies but not yet deployed, including liver and gastric cancer detection. Institutional review board approvals are obtained. The infrastructure is built.

In year two, the prospective validation studies run. Every blood panel drawn at every consortium member institution is analyzed by the deployed algorithms. Flagged patients enter the clinical confirmation pathways described in Chapter 7. Outcomes are tracked in real time. The data from ColonFlag and LungFlag deployments at full consortium scale produces the first published results from a multi-institutional prospective program, creating the evidence base for professional society review and FDA clearance applications for the next-tier algorithms.

In year three, the next-tier algorithms reach deployment. Liver and gastric cancer detection, validated by the prospective studies in year two, are integrated into the same laboratory reporting workflows. FDA clearance applications are filed for algorithms completing the validation process. The outcomes data from year two deployments is published, reviewed by professional societies, and begins the process of changing coverage policy at major insurers. The consortium presents its results at national oncology conferences. Other health systems begin the process of replication.

By year four, the consortium model has become the template for the country. The evidence exists. The protocols exist. The regulatory clearances exist. The question shifts from whether to deploy these algorithms to how quickly every health system in America can implement them.

This is not an optimistic fantasy. It is a timeline derived from the actual experience of ColonFlag and LungFlag, compressed by the parallel structure of a multi-institution consortium. ColonFlag went from initial publication in 2016 to clinical deployment in multiple countries by 2018. LungFlag went from validation publication in 2021 to the point of broad deployment readiness within three years. A consortium that runs five health systems simultaneously, sharing data and protocols in real time, compresses each of those timelines further. Three years from consortium formation to broad multi-algorithm deployment is a realistic target, not an aspirational one.⁸

The Decision

Every chapter in this book has described a gap between what the science makes possible and what the healthcare system is actually doing. The science is not the barrier. The tools are not the barrier. The data is not the barrier. The barrier, consistently and at every stage, is the institutional decision to act.

The consortium model is a way of making that decision concrete. It is not a policy proposal that requires Congress to act or a regulatory change that requires years of rulemaking. It is an agreement among institutions, the kind of agreement that hospital boards and health system leaders make every year about research programs, capital investments, and strategic priorities.

What makes this agreement different from most is the scale of what it produces. Most hospital board decisions affect thousands of patients. This one affects hundreds of thousands, eventually millions, because the algorithms validated and deployed through the consortium become the standard of care that diffuses through the entire healthcare system.

I have sat on the boards of two of the institutions that could anchor this consortium for more than two decades. I have seen what those institutions can do when they decide to do it. The question this book asks of those institutions, and of the others who could join them, is a simple one.

The science is ready. The algorithms are built. The pathways are specified. The blood is already being drawn.

What are we waiting for?

Chapter 11

The Economics Are Overwhelming

Why Preventing Cancer Costs a Fraction of Treating It

Consider two patients with lung cancer.

The first patient's blood algorithm flags an elevated risk ten months before any tumor is detectable by conventional means. His doctor orders a low-dose CT scan of the chest. A small nodule is found, two centimeters, still confined to the lung. He undergoes surgery. The procedure costs approximately $40,000. He recovers over several weeks. His five-year survival probability is sixty percent.

The second patient has no algorithm running on his blood tests. He feels fine for another year. Then a persistent cough sends him to his doctor. Scans show the cancer has spread to his lymph nodes and liver. Surgery is no longer possible. He begins chemotherapy, then immunotherapy, then a targeted therapy. His oncologist prescribes Keytruda, an immunotherapy drug that costs more than $150,000 per year. He receives approximately $350,000 in medical care over the next fourteen months. His five-year survival probability is eight percent. He dies.

The difference in cost between these two patients is roughly $310,000. The difference in outcome is the difference between a realistic chance of survival and an almost certain death. The difference in what caused each outcome is ten months and one algorithm running on a blood test the first patient had already received.

This chapter is about the economics of that difference. The financial case for early detection through algorithmic blood analysis is not close. It is not even a reasonable debate. The numbers are so lopsided that the only honest question left is why the healthcare system has not already moved decisively in this direction.

The answer to that question is structural, and it tells us something important about why good economics alone are not sufficient to produce change, and what else is required.

The Cost of Late Detection: The Full Picture

The table below captures the essential economic comparison across the major diseases in this program. The pattern is the same every time: early detection costs dramatically less, and produces dramatically better outcomes. In no other area of medicine do we have interventions that simultaneously cost less and work better. The table is worth reading carefully.

Table 1. Cost and survival comparison: early versus late detection across major diseases in the REDI program. Early-stage cancer survival figures from Cancer Statistics 2024 (SEER database). Cost estimates from published healthcare expenditure analyses. DPP = Diabetes Prevention Program.
Disease / Cancer	Early Treatment Cost	Late Treatment Cost	5-Year Survival (Early)	5-Year Survival (Late)
Lung Cancer	$30K–$60K	$200K–$350K+	60%	8%
Colorectal Cancer	$20K–$40K	$200K–$400K	91%	14%
Pancreatic Cancer	$100K–$180K	$300K–$500K	50%	3%
Ovarian Cancer	$30K–$60K	$200K–$400K	93%	31%
Kidney Disease (dialysis)	~$5K/yr (medications)	$90K/yr (dialysis)	Delay or prevent dialysis	Ongoing dialysis dependency
Type 2 Diabetes (prevention)	$3,500 (DPP intervention)	$250K lifetime	58% prevention rate	Lifetime complications
Heart Attack	$5K/yr (statins + lifestyle)	$40K+ acute + $200K lifetime	Largely preventable	30–50% mortality reduction possible

Read across any row in that table and the argument makes itself. Lung cancer caught early costs $30,000 to $60,000 to treat with surgery and gives the patient a sixty percent chance of surviving five years. Lung cancer caught late costs $200,000 to $350,000 or more in chemotherapy and immunotherapy and gives the patient an eight percent chance of surviving five years. The expensive treatment produces far worse outcomes. The inexpensive treatment produces far better ones.

This pattern holds for every disease in the program. The more we spend treating late-stage disease, the worse the outcomes. The earlier we detect and intervene, the lower the cost and the better the result. There is no tradeoff here, no tension between affordability and effectiveness. Early detection delivers both.

Figure 8. Per-patient cost comparison: treating disease after late detection vs. intervening after early algorithmic detection.

Disease by Disease: What the Numbers Mean

Lung cancer is the country's deadliest cancer, killing approximately 127,000 Americans every year, eighty-five percent of them diagnosed after the disease has spread. The immunotherapy drug Keytruda, most commonly prescribed for advanced lung cancer, generated more than $25 billion in annual revenue for Merck in 2023. Its median survival benefit for advanced lung cancer patients is approximately ten months. Ten months of life bought with $150,000 in drug costs alone, plus hospitalization, supportive care, and the considerable physical suffering of treatment. Against that, LungFlag's published budget impact model projects net savings of $2.87 million per health system over five years, seventeen additional early-stage diagnoses annually, and twenty-two fewer deaths. The economics of early detection for lung cancer alone justify the entire investment in the program.¹

Kidney failure is perhaps the starkest economic case in the non-cancer program. Dialysis costs $90,000 per patient per year, and 130,000 Americans start dialysis every year. Most remain on dialysis until they die, four to five years on average, at a total lifetime cost of $360,000 to $450,000 per patient. The Klinrisk algorithm for chronic kidney disease, now CE-mark approved from Roche and validated on 4.8 million US adults, detects declining kidney function years before dialysis becomes necessary. The cost of the medications that slow kidney disease progression when caught early, SGLT2 inhibitors and ACE inhibitors, runs to a few thousand dollars per year. If algorithmic detection delays dialysis by five years in even one third of new dialysis patients annually, the savings to Medicare alone exceed $15 billion per year.⁵

The diabetes numbers are the most precisely calculated of any disease in this program, because the Diabetes Prevention Program produced a decade of cost-effectiveness data. The intervention, modest lifestyle changes achieving seven percent weight loss and 150 minutes of weekly exercise, costs approximately $3,500 per participant. It prevents diabetes in fifty-eight percent of prediabetic patients. The lifetime direct medical cost of Type 2 diabetes, including heart disease, kidney failure, blindness, and amputation, averages $250,000 per patient. A $3,500 intervention that prevents a $250,000 disease in more than half the people who receive it produces a return of more than forty to one. The National Diabetes Prevention Program has been authorized. It has not been funded at anything approaching the scale the economics justify.⁶

The Cost of the Algorithm

Against these treatment costs, what does the algorithmic detection program actually cost to run?

The blood draw costs nothing extra, because it is already happening. Two hundred million routine blood panels are drawn in the United States every year as part of standard medical care. The patient is already in the chair. The needle is already in the arm. The data is already flowing into the electronic medical record.

The computational cost of running a trained machine learning algorithm on a blood result is measured in fractions of a cent per patient. For a health system processing 500,000 blood panels per year, the entire computational cost of applying a cancer detection algorithm to each one is a few thousand dollars annually, less than the cost of a single MRI scan, and orders of magnitude less than the cost of treating a single late-stage cancer case.

The total cost of running the full algorithmic program across a population of one million patients, including computational costs, laboratory system integration, and clinical coordination for flagged patients, has been estimated at approximately $50 to $100 per patient per year. Against average late-stage cancer treatment costs of $200,000 to $500,000 per case, even a modest improvement in the fraction of cases caught at early stage produces a return that is not ten to one or twenty to one, but several hundred to one. The LungFlag budget impact model, the only published per-health-system analysis in this family, quantifies this concretely: $2.87 million in net savings per health system over five years, from a single algorithm applied to blood tests already being drawn.⁹

Why the Math Does Not Drive the System

Any thoughtful reader will ask: if the economics are this clear, why hasn't the healthcare system already moved?

The answer is that the healthcare system does not experience these economics the way a straightforward accounting analysis suggests it should. The costs and benefits accrue to different parties, over different time horizons, in ways that create systematic misalignment between what is financially rational for society and what individual institutions actually do.

Consider how a hospital generates revenue. It earns money from the procedures, treatments, and hospitalizations it provides. A hospital that catches a lung cancer early and removes it with a straightforward surgery earns the surgery fee, meaningful but modest. A hospital that treats a Stage IV lung cancer patient through two years of chemotherapy, immunotherapy, emergency admissions, and palliative care earns far more. The hospital does not benefit financially from early detection. It benefits financially from treatment. This is not a conspiracy. It is a structural feature of fee-for-service healthcare that rewards volume and complexity of treatment rather than prevention of disease.

The insurance industry faces a related but distinct version of this misalignment. An insurer that pays for algorithmic blood analysis and catches a lung cancer at an operable stage saves hundreds of thousands of dollars in future treatment costs. But most Americans change health insurance plans frequently through job changes, employer decisions, and annual open enrollment. The insurer who pays for the early detection program may not be the insurer who would have paid for the late-stage treatment. The savings accrue to a future insurer, or to Medicare when the patient ages into it, while the costs of the prevention program fall on the current insurer. This creates a rational but socially destructive incentive not to invest in prevention.¹⁰

Value-based care arrangements, in which health systems are paid for patient outcomes rather than volume of services, partially address this misalignment. Kaiser Permanente, which both insures and provides care to its twelve million members, has a direct financial incentive to keep those members healthy rather than treating them expensively when they become sick. This structural alignment is part of why Kaiser was the site of the LungFlag validation and why integrated health systems generally have stronger records in early detection than fee-for-service systems.

The Societal Calculation

Even if individual institutions face misaligned incentives, the societal economic calculation is unambiguous.

The combined annual direct medical cost of cancer in the United States exceeds $200 billion. Add the indirect costs of lost productivity from cancer deaths and disability, and the total economic burden exceeds $400 billion per year. The combined cost of the major non-cancer diseases in this program, heart disease, diabetes, kidney failure, heart failure, and sepsis, adds another $700 billion or more in annual direct medical costs. Together these diseases impose an economic burden approaching $1 trillion per year.¹¹

Against that number, the cost of deploying the algorithmic early detection program at scale, including confirmatory imaging and follow-up for flagged patients, runs to $10 billion to $20 billion per year at full national scale. That is one to two percent of the economic burden the program would address.

The conservative estimate in this book projects that algorithmic early detection could prevent 400,000 to 675,000 American deaths per year. Using the standard value of a statistical life employed by federal regulatory agencies, approximately $11 million per life, the annual value of those prevented deaths is $4.4 trillion to $7.4 trillion. No medical intervention in history has approached this ratio of cost to benefit.

The Diabetes Prevention Program offers the most precisely calculated example. A $3,500 intervention prevents a $250,000 disease in fifty-eight percent of the people who receive it, a return of more than forty to one. The ten-year cost-effectiveness analysis of the DPP confirmed net savings over a decade relative to treating diabetes and its complications. Federal agencies have acknowledged this math. The program has not been funded at anything approaching the scale the economics justify.¹²

What Changes the Calculus

Three things can change the structural misalignment that prevents the healthcare system from acting on the economics of early detection.

The first is insurance coverage mandates. When the Affordable Care Act required insurers to cover preventive services recommended by the U.S. Preventive Services Task Force, it changed the economics for insurers by requiring them to pay for prevention regardless of the patient turnover problem. A similar coverage mandate for algorithmic blood test analysis, once the algorithms have achieved validation and professional society endorsement through the consortium program, would extend coverage to every insured American.

The second is value-based contracting. Health systems paid for keeping their patient populations healthy rather than for volume of services have a direct financial incentive to deploy early detection algorithms. Expanding these arrangements accelerates deployment without requiring new legislation.

The third is the consortium itself. When major health systems publish outcomes data showing that algorithmic blood analysis catches hundreds of cancers per institution annually at early stages, the financial and reputational benefits become visible to every other health system in the country. The institutions that did not participate face a choice: deploy the validated algorithms and capture the benefits, or continue finding cancers late while their competitors find them early. That competitive dynamic is a powerful driver of adoption.

The Human Economy

There is a calculation that does not appear in any of the cost analyses above, because it cannot be monetized, but that is more important than all of them.

My mother died of colon cancer at fifty-six. She left behind a husband, two sons, and a life that had decades still in it. The economic cost of her death, in lost earnings and lost productivity, can be estimated. The cost in lost years with her family cannot be. The cost of the grandchildren she never met, the conversations that never happened, the presence that was simply gone, has no dollar value.

My brother Michael died of metastatic cancer at seventy-two. He was a lawyer who had spent his career fighting for human rights around the world. The value of the work he did not live to do is not calculable. The value of his company, his wisdom, his laugh, is not calculable.

Multiply these stories by 400,000 per year. That is the conservative estimate of what algorithmic early detection could prevent annually. Four hundred thousand people who would not die of diseases their blood was already announcing. Four hundred thousand families that would not lose a parent, a spouse, a sibling, a child, to a diagnosis that came too late.

The economics are overwhelming. But the human stakes are larger than the economics. They are what this book is actually about.

Chapter 12

The Equity Imperative

Why the Patients Who Need Early Detection Most Are the Ones Who Get It Least

There is a ZIP code in Memphis, Tennessee, where the cancer death rate is more than twice the national average. The people who live there are not biologically different from people who live in wealthier parts of the city. They do not smoke more, or eat worse, or carry more genetic cancer risk. What they lack is access. Access to a primary care doctor who knows their name. Access to a colonoscopy without a six-month wait. Access to a lung cancer screening program that, if it exists at all in their community, they have never been told about. Access to the kind of routine annual physical that generates the blood test that might, if someone were reading it the right way, catch a cancer before it kills them.

In America, where you are born, where you grow up, and where you live determines, to a degree that should disturb every one of us, whether your cancer is found early or late. Early-stage cancer is largely a privilege. Late-stage cancer is largely a consequence of not having that privilege.

This chapter is about why that is true, why it matters more than almost any other dimension of the cancer problem, and why the algorithmic blood test program described in this book is the first early detection approach in the history of medicine that has a genuine chance of closing the gap rather than widening it.

The Gap That Has Persisted for Decades

Black Americans die of cancer at higher rates than white Americans for virtually every major cancer type. The cancer death rate for Black men is approximately twenty percent higher than for white men. For cervical cancer, Black women die at rates roughly twice as high as white women, despite the fact that cervical cancer is one of the most preventable cancers in existence. For colorectal cancer, Black Americans are both more likely to develop the disease and more likely to die from it. For lung cancer, the pattern is similar. These disparities have been documented for decades. They have not meaningfully closed.¹

Income is an equally powerful predictor of cancer outcomes. Americans in the lowest income quartile are diagnosed with cancer at later stages, receive less aggressive treatment, and die at higher rates than Americans in the highest income quartile, across almost every cancer type. A study published in JAMA Oncology found that the cancer mortality rate for Americans living in the most deprived counties was nearly twice that of Americans living in the least deprived counties. The difference was not explained by tumor biology or genetic risk. It was explained by access to screening, timely diagnosis, and consistent treatment.²

Geography adds a third layer of inequity. Rural Americans face substantially worse cancer outcomes than urban Americans, and the gap has been growing. Between 2012 and 2015, cancer death rates were thirty-seven percent higher in the most rural counties than in the most urban ones. The reasons are not mysterious. Rural communities have fewer primary care physicians per capita, longer distances to imaging centers and specialists, lower rates of health insurance coverage, and less access to the kind of comprehensive cancer centers where early detection programs are most likely to be implemented.³

These three dimensions of inequity—race, income, and geography—are not independent. They overlap and compound. A low-income Black woman in a rural county in the Mississippi Delta carries all three disadvantages simultaneously. She is less likely to have health insurance. Less likely to have a primary care doctor. Less likely to be offered a colonoscopy or a low-dose CT scan. Less likely to complete follow-up if she is flagged for a suspicious finding. And if her cancer is found at Stage IV, she is less likely to receive the full range of treatment options available at a major cancer center.

This is not a description of a few exceptional cases. This is a description of the systematic experience of tens of millions of Americans, and it accounts for a substantial portion of the 400,000-plus cancer deaths this book argues could be prevented annually.

Why Every Previous Early Detection Approach Has Failed on Equity

Every major early detection advance in the history of cancer medicine has promised to help the most vulnerable patients. Every one has, in practice, helped the most advantaged patients first, and reached the most vulnerable patients last, if at all.

The Pap smear was developed in the 1920s and became standard of care by the late 1940s. Sixty years later, when Congress finally established regulatory standards for cytology laboratories, the women most likely to die of cervical cancer were still those with the least access to quality screening: low-income women, rural women, uninsured women, women of color. Today, the cervical cancer death rate in the United States is three times higher among Black women than among white women, forty years after the disease became almost entirely preventable through screening and vaccination. The tool existed. The gap persisted.⁴

Lung cancer screening with low-dose CT was validated in 2011 and endorsed by major professional societies by 2013. A decade later, only five to six percent of eligible Americans are being screened. The patients least likely to be screened are precisely the patients who most need it: lower-income patients, rural patients, patients without a primary care doctor who knows to order the scan, patients from communities where the smoking rates are highest and the healthcare access is lowest. A technology that could save 25,000 lives per year is reaching a tiny fraction of the people it could save, and the fraction it is missing is disproportionately the most disadvantaged.⁵

The pattern is not a coincidence. Every early detection program that requires a patient to proactively seek out a specific test, navigate a referral system, schedule an appointment at a specialized facility, and complete follow-up care reproduces the inequities already present in the healthcare system. Patients with resources navigate these systems. Patients without resources do not.

This is the fundamental structural flaw in every early detection approach that medicine has deployed before now. The programs are designed for the patients who are already getting good healthcare. They ask the most from the patients who have the least. And they generate their best results in the communities that already have the best outcomes.

Why the Blood Algorithm Is Different

The algorithmic blood test approach does not ask the patient to do anything. It does not require the patient to know that a screening test exists, to make a separate appointment, to navigate a referral, to travel to a specialized facility, or to complete a follow-up protocol that requires multiple visits and strong health literacy.

It requires only that the patient's blood be drawn, which is already happening at every annual physical and at countless other routine healthcare encounters. The algorithm runs in the background, automatically, on the same blood test that was already ordered, and flags the result if the pattern warrants a closer look.

Consider what this means for the equity problem. The ColonFlag algorithm, deployed at Geisinger Health System in rural Pennsylvania, serves a population that is older, poorer, and sicker than the national average. Thirty-two of the forty-five counties in Geisinger's service area are classified as rural. The average household income in Geisinger's service area is fifteen percent below the national average. This is precisely the population that conventional cancer screening programs consistently fail to reach. ColonFlag reached them not by building new clinical infrastructure or launching a community outreach campaign, but by reading the blood test they were already receiving in a smarter way. The patients who were flagged as high-risk and completed follow-up colonoscopy had an eight percent cancer detection rate. These were patients who had already been identified as overdue for conventional screening and had not responded to standard recall invitations. The algorithm reached patients that the conventional system had already given up on.⁶

This is the equity promise of the algorithmic approach, stated plainly. A patient who never seeks out a colonoscopy, who does not know that low-dose CT scanning exists, who has no specialist within fifty miles and no reliable transportation to reach one, still has their blood drawn at their annual physical or their diabetes check or their hypertension follow-up. That blood test, read by an algorithm, can find a cancer or flag a developing disease regardless of whether the patient knows to ask for it, regardless of whether their community has a cancer screening program, and regardless of whether they have the health literacy, the transportation, or the time off work to navigate the conventional early detection system.

The Limits: Where Equity Can Still Break Down

Honesty requires naming the places where equity can still break down even with the algorithmic approach, because identifying these failure points is the first step to preventing them.

The algorithm flag is equitable. What happens after the flag can reproduce existing inequities if the follow-up system is not designed with those inequities in mind.

A patient flagged as high-risk for colorectal cancer needs a colonoscopy. In rural communities, a colonoscopy may require traveling two hours to the nearest gastroenterologist, taking a day off work, arranging childcare, and paying for the preparation medication out of pocket before insurance kicks in. For a patient with a flexible job, a car, and savings, this is manageable. For a patient working an hourly job, without a car, with two children to care for, it is a substantial barrier. The Geisinger deployment data illustrated this: of the 706 patients flagged as high-risk, only 104 completed a colonoscopy. That is a follow-through rate of about fifteen percent. The patients who did not complete follow-up were not all being reckless with their health. Many of them faced real logistical obstacles that the algorithm flag did nothing to remove.⁷

This is why the consortium model described in Chapter 8 must include robust patient navigation as part of its clinical infrastructure. Patient navigators are trained healthcare workers, often from the same communities as the patients they serve, who guide people through the healthcare system step by step: scheduling the follow-up appointment, arranging transportation, explaining the procedure, following up when appointments are missed, and connecting patients to financial assistance programs. The evidence for patient navigation is strong: navigation programs consistently improve screening completion rates, follow-up rates, and treatment initiation rates, particularly in underserved communities.

The equity argument for patient navigation is reinforced by the evidence from cancer screening programs that have successfully closed disparate gaps. Patient navigator programs at major cancer centers have been shown to increase colonoscopy completion rates by thirty to fifty percent in low-income and minority populations. Navigation programs for lung cancer screening in underserved communities have achieved screening completion rates comparable to those in affluent populations. The tool works when the support infrastructure is built to serve the patients most likely to need it.⁸

A second equity failure point is algorithm training bias. A machine learning algorithm trained primarily on patients from one demographic group may perform less well on patients from other groups. This is a documented phenomenon in medical AI, and it demands active vigilance. The consortium model addresses this directly by requiring that validation studies include diverse patient populations from different racial, ethnic, income, and geographic groups. An algorithm that achieves excellent accuracy in one population must be validated in others before it is deployed to those populations. This is not merely a technical requirement. It is an ethical one.

The ColonFlag algorithm was originally trained on an Israeli population and validated in the United Kingdom and the United States, demonstrating cross-population generalizability. But the Israeli and UK populations are not demographically identical to the full diversity of the American patient population, including Black, Hispanic, Indigenous, and Asian Americans. The consortium's prospective validation studies must explicitly include these populations and publish demographic-stratified performance data. An algorithm that works equally well across all demographic groups is an algorithm that the equity argument can stand behind fully. An algorithm that works better for some groups than others must be improved before it is deployed universally.⁹

The Political Dimension

I have been a progressive my entire adult life. The inequities described in this chapter are not abstract to me. They are a matter of political conviction as much as scientific concern. A healthcare system that consistently produces better outcomes for wealthy white Americans than for poor Black Americans, for urban Americans than for rural Americans, for the insured than for the uninsured, is a healthcare system that has failed to live up to the basic moral commitment that medicine makes to every patient.

The Affordable Care Act was a meaningful step toward closing some of these gaps. Its destruction, or severe limitation, under subsequent administrations has reversed some of that progress and left millions of Americans without coverage they had previously received. This is not a political abstraction. It is a clinical reality. Uninsured patients are diagnosed with cancer at later stages and die at higher rates than insured patients, a fact that is documented in the literature as clearly as any biological finding in oncology.

The algorithmic blood test program cannot, on its own, fix the structural inequities of the American healthcare system. It cannot restore coverage to people who have lost it. It cannot build hospitals in communities that do not have them. It cannot train primary care physicians and deploy them to underserved rural counties. These are political and policy challenges that require political and policy solutions.

What the algorithmic program can do, within the healthcare system as it exists, is remove one of the most consequential barriers to equitable early detection: the requirement that a patient know to seek out a specific test, navigate a specialized referral system, and complete a multi-step screening protocol that asks the most from the patients who have the least.

Senator Hubert Humphrey said it clearly in the congressional hearings on the National Cancer Act in 1971: reducing cancer deaths does not require a great scientific breakthrough. It simply calls for a more equitable, just distribution of the resources already available to the more privileged members of our society. That statement was made more than fifty years ago. It remains true today. The resources available to the more privileged members of our society now include algorithmic blood test analysis. Making that resource available to every American who has a blood test drawn is not a technical challenge. It is a moral imperative and an organizational decision.¹⁰

What Equity Looks Like in Practice

What would it actually look like to deploy the algorithmic blood test program with equity as a design principle rather than an afterthought?

It would begin with the consortium selecting health systems that serve diverse and underserved populations as core members, not as afterthoughts or equity add-ons. Geisinger, which serves a predominantly rural, lower-income population in Pennsylvania, is already a model for this. A New York consortium that includes NewYork-Presbyterian's community health network, which serves some of the most diverse urban communities in the country, alongside Weill Cornell and Memorial Sloan Kettering, builds equity into the consortium's DNA rather than treating it as a secondary consideration.

It would require that the consortium's validation studies explicitly report outcomes stratified by race, ethnicity, income, insurance status, and geography, so that any disparities in algorithm performance or follow-up completion are identified and addressed before broad deployment rather than discovered after.

It would mean building patient navigation into the clinical protocol from day one, not adding it later when follow-through rates prove disappointing. Every patient flagged as high-risk should receive a proactive outreach call from a trained navigator, assistance with appointment scheduling, information about transportation resources, and follow-up contact if the appointment is missed.

It would mean designing the community benefit programs that major health systems are already required to provide under federal law around the algorithmic detection program, directing resources to the communities where the equity gap in cancer outcomes is widest.

And it would mean confronting, directly and without equivocation, the reality that the patients who will benefit most from this program are not the patients who are most likely to demand it. The patients who will benefit most are the ones who are currently falling through every gap in the conventional screening system: rural patients, uninsured patients, patients with limited health literacy, patients from communities where a doctor's office feels inaccessible and a cancer screening program feels like something that happens to other people. Reaching those patients requires deliberate design, active navigation, and the willingness to measure not just whether the algorithm works, but whether it works equitably.¹¹

The Promise

The history of early detection in America is, among other things, a history of tools that worked brilliantly for some and barely reached others. The Pap smear saved hundreds of thousands of lives and left a persistent gap in cervical cancer mortality between Black and white women that persists today. Colonoscopy has cut colorectal cancer mortality dramatically in populations where it is regularly used and barely touched mortality in populations where access is limited. Low-dose CT for lung cancer has validated survival benefits that reach fewer than one in fifteen eligible patients, with the patients least likely to be reached being those with the highest smoking rates and the lowest healthcare access.

The algorithmic blood test program has an opportunity that none of these predecessors had: to be designed from the beginning with equity as a central goal, not a peripheral aspiration. The tool itself is inherently more equitable than anything that has come before it, because it does not require the patient to do anything beyond showing up for the blood draw they are already having. The question is whether we build the infrastructure around it that ensures its benefits reach every patient whose blood carries a signal worth reading.

The ZIP code where you were born should not determine whether your cancer is found in time to cure it. That is not a scientific statement. It is a moral one. And it is the statement that the deployment of this program, done right, can finally begin to make true.

Chapter 13

Conquer Cancer

What We Know, What We Need to Do, and Why the Time Is Now

In December 1799, the finest physicians in America stood around George Washington's bed and bled him to death. They were not negligent men. They were doing exactly what their training demanded and what two thousand years of medical consensus required. The problem was not their character. The problem was their paradigm.

We began this book with that story because it is the story we are living through right now. The paradigm we are trapped in is the paradigm of treatment. We have organized the most powerful medical system in human history around the task of fighting cancer after it has already spread, and spent fifty years and hundreds of billions of dollars doing it. The results, for late-stage cancer, remain grim. The biology of advanced disease sets limits that no drug, however ingenious, can reliably overcome.

The alternative is in front of us. It does not require new drugs, new equipment, or new patient behaviors. It requires a new way of reading data we already collect, using tools that already work, to find cancers at the stage when medicine can still cure them.

This chapter states what needs to happen, addresses the regulatory questions that matter most, and explains why the healthcare workforce crisis facing this country makes the case for early algorithmic detection more urgent, not less.

What the Evidence Establishes

Two machine learning algorithms trained on routine blood test results are already deployed in clinical practice. ColonFlag achieves an eightfold improvement in colorectal cancer detection compared to standard screening. LungFlag identifies forty percent of future lung cancer patients nine to twelve months before their clinical diagnosis. Two additional algorithms, for liver and gastric cancer, are validated in large independent patient populations and ready for clinical deployment. Nine more cancer algorithms have published peer-reviewed models awaiting prospective validation. Two non-cancer algorithms, for kidney disease and sepsis, are deployed in clinical settings and producing measurable mortality reductions.¹

Conservative projections, applying the performance levels already demonstrated by the deployed algorithms to half the adult population, estimate that 400,000 to 675,000 American deaths could be prevented annually. That is thirteen to twenty-two percent of all annual deaths in this country, from blood tests already being drawn two hundred million times a year.²

A Healthcare System Under Pressure

There is a second argument for this program that receives too little attention: the crisis of healthcare capacity that is already arriving and will only deepen over the next twenty years.

The United States faces a projected shortage of up to 86,000 physicians by 2036, driven by an aging physician workforce, growing patient demand from an aging population, and decades of underinvestment in medical education. The nursing shortage is already severe: the Bureau of Labor Statistics projects a need for more than 190,000 additional registered nurses per year through 2031. Hospital bed capacity in many regions is operating at or near its limit, with intensive care and oncology units under particular strain.³

At the same time, cancer incidence in the United States is rising. The aging of the baby boom generation is producing the largest cohort of cancer-age Americans in the country's history, and cancer rates rise sharply after sixty. By 2040, the annual number of new cancer diagnoses is projected to reach 2.3 million, a thirty percent increase from current levels. More patients with more cancer, treated by a shrinking physician workforce, in hospitals with insufficient beds: that is the trajectory the current system is on.⁴

Late-stage cancer is one of the most resource-intensive conditions in all of medicine. A patient diagnosed with advanced lung cancer typically requires an oncologist, a pulmonologist, a radiologist, a palliative care team, frequent hospitalizations, and months of inpatient and outpatient care. The same patient, if found at an early operable stage, requires a surgeon, an anesthesiologist, a short hospitalization, and follow-up monitoring. The reduction in physician time, nursing hours, hospital bed days, and specialist capacity required to manage a localized cancer rather than a metastatic one is substantial.

This means that algorithmic early detection is not just a patient outcome intervention. It is a healthcare workforce intervention. Every cancer caught early is a cancer that does not consume the months of intensive specialist care that late-stage disease demands. Across hundreds of thousands of patients annually, that reduction in downstream care intensity translates directly into physician capacity, nursing capacity, and hospital bed availability freed up for other patients.

A healthcare system facing a physician shortage has every reason to embrace tools that reduce the burden of late-stage disease management. The algorithmic blood test program is exactly such a tool. It does not require additional physicians to run. It reads data that laboratories are already producing. The follow-up it generates, a targeted imaging study and a clinical consultation, is far less resource-intensive than the alternative, which is treating the same patient's cancer at Stage IV.

Does This Require FDA Approval?

The question of whether algorithmic blood test analysis requires FDA approval before a health system can deploy it is one of the most important practical questions this program faces, and the answer is more nuanced, and more encouraging, than most people assume.

The starting point is what the algorithm actually does. ColonFlag and LungFlag do not diagnose cancer. They do not tell a physician that a patient has cancer. They generate a risk score from blood test values, a number that tells the physician this patient's blood pattern places them in an elevated-risk category that warrants closer attention. The physician reviews the score, considers it alongside everything else known about the patient, and decides whether to order additional testing. The physician makes the clinical decision. The algorithm informs it.

This distinction matters enormously in the regulatory framework. Under the 21st Century Cures Act, clinical decision support software that meets four criteria is explicitly exempt from FDA device regulation. The four criteria are: it does not acquire or analyze medical images or physiological signals; it displays or analyzes medical information about a patient; it supports, rather than replaces, the judgment of a healthcare professional; and it enables the healthcare professional to independently review the basis for any recommendation it provides. The key test, articulated clearly in FDA's own 2026 guidance on clinical decision support software, is not whether the tool uses artificial intelligence. It is whether the clinician can independently evaluate the basis for the recommendation.⁵

ColonFlag and LungFlag generate risk scores from standard blood count values that a physician can examine directly. A physician who receives a high ColonFlag score can look at the underlying hemoglobin trend, the platelet trajectory, and the inflammatory markers that drove it, and make an independent clinical judgment about whether a colonoscopy is warranted. The algorithm is transparent. The physician is in control. The final decision to order a follow-up test belongs entirely to the clinician.

This is precisely the model of risk stratification that medicine uses routinely without FDA oversight. A cardiologist uses a Framingham Risk Score to stratify a patient's ten-year cardiovascular risk and decide whether to prescribe a statin. A nephrologist uses an estimated glomerular filtration rate to stratify kidney disease progression and decide on management. These are algorithmic calculations applied to laboratory values, generating risk estimates that inform physician decisions. No FDA approval is required for a physician to use a risk calculator in clinical practice, because the physician, not the calculator, is making the treatment decision. The FDA's own guidance acknowledges that software providing a risk probability or risk score for a disease can fall within enforcement discretion policies for software performing calculations routinely used in clinical practice.⁶

The strongest reading of the regulatory framework, supported by the 21st Century Cures Act and by FDA's clinical decision support guidance, is that a blood-based risk stratification algorithm that generates a score for physician review, without acquiring imaging data, without replacing clinical judgment, and with full transparency into the underlying blood values, qualifies as non-device clinical decision support. It is risk stratification informing physician judgment, not autonomous diagnosis.

Health systems that wish to take the most conservative regulatory approach can deploy these algorithms as laboratory-developed tests, which are subject to certification under the Clinical Laboratory Improvement Amendments rather than FDA device clearance. This is a well-established pathway that clinical laboratories use for complex diagnostic calculations every day. It does not require FDA approval. It requires laboratory quality standards that any major health system's laboratory already meets.

None of this means that formal FDA clearance is undesirable. Clearance through the 510(k) pathway, which ColonFlag has already obtained, strengthens the legal standing of the tool, facilitates insurance coverage, and removes any regulatory ambiguity. The consortium should pursue clearance for each algorithm as validation evidence accumulates. But clearance is not a prerequisite for deployment, and treating it as one would add years of unnecessary delay to a program whose potential benefit is measured in hundreds of thousands of lives annually.

What Needs to Happen

The path from here to broad deployment has been walked before by ColonFlag and LungFlag. What it requires is institutional will applied to a specific sequence of steps.

A consortium of four to five major health systems commits to simultaneous prospective validation and clinical deployment of the remaining algorithms. The consortium shares data, protocols, and results across member institutions in real time. Each consortium member integrates the deployed algorithms into its laboratory reporting workflow so that every eligible blood result is automatically analyzed. High-risk patients receive a proactive outreach call, assistance with scheduling the appropriate follow-up study, and navigation support through the confirmation process.

The validation evidence that accumulates through the consortium feeds two parallel tracks: professional society guideline review, which changes physician behavior across the country, and FDA clearance applications for algorithms completing prospective validation, which facilitates insurance coverage. Neither track needs to be complete before deployment begins. They run alongside deployment, not ahead of it.

The World This Creates

A primary care physician in Memphis sees a 54-year-old woman for her annual physical. Her blood is drawn. The algorithm runs automatically. Her platelet count has been trending upward for fourteen months. A flag appears in the electronic health record. The physician reviews the underlying blood values, considers the patient's history, and refers her for a pelvic MRI. A small ovarian mass is found, still confined to the ovary. It is surgically removed. Her five-year survival probability is ninety-three percent.

Without the algorithm, she feels fine for another eighteen months. Then abdominal pain brings her to the emergency room. The cancer has spread. She dies fourteen months later. Her five-year survival probability was thirty-one percent.

That is one patient. Multiply her by the thousands of Americans who die of ovarian cancer every year, almost all at late stage because there is no screening test. Then multiply across all thirteen cancers and all the non-cancer diseases in this program. Each of those late-stage patients also represents months of intensive specialist care, hospital admissions, and nursing hours that a strained healthcare system must absorb. Catching them early does not only save their lives. It conserves the capacity of the system to care for everyone else.

This is what conquering cancer actually looks like. Not the elimination of the disease. The systematic removal of late-stage diagnosis as its default trajectory.

The Blood Is Already Telling Us

Pierre Louis published his statistical evidence against bloodletting in 1828. It took fifty years for the practice to be fully abandoned. The physicians who continued bleeding patients during those fifty years were not bad doctors. They were operating within a paradigm that medicine had not yet decided to leave. The tools exist now to find cancer years before it becomes the thing that is killing people. The regulatory framework permits their deployment. The workforce crisis makes their deployment urgent. The only thing that determines whether this happens in the next three years or the next thirty is whether the people with the authority and the resources to act decide to act.⁷

The blood is already telling us. We just need to start listening.

Chapter 14

The Last Mile

Why Finding the Patient Is Only Half the Work

Imagine the consortium described in this book is fully operational. The algorithms are running. The blood of millions of patients is being analyzed automatically in the background of routine care. For colorectal cancer alone, the algorithm flags roughly ten thousand high-risk patients a year across the Northeast Corridor consortium, each with a blood pattern that suggests a roughly one in ten chance of harboring an undetected cancer.

Now imagine what happens if only a small fraction of those flagged patients actually complete their follow-up colonoscopy. That has been the historical pattern across American cancer screening: a majority of eligible patients never complete the pathway, for reasons that have nothing to do with whether they want to be screened. Apply that pattern to the consortium’s colorectal algorithm and the system catches a small fraction of the cancers it could have caught. The rest remain undetected long enough to reach the stages where treatment becomes much harder and survival much lower.

That gap, between what the algorithms can find and what the clinical system delivers on, is where this book’s argument either succeeds or fails. Everything in the preceding chapters has described how to find the patients. None of it matters if the patients do not come in.

Adherence, the clinical word for whether patients actually follow through on a recommendation, is where cancer screening programs have historically gone to die. It is the reason the Pap smear took sixty years to reach its current coverage. It is the reason low-dose CT scanning has plateaued at five percent of eligible patients two decades after validation. A screening test that works brilliantly in a clinical trial can still fail in the real world for one reason. Patients do not show up.

This is the chapter that says what the rest of the book is worthless without.

Why Adherence Has Always Been the Problem

Cancer screening programs succeed or fail on simple arithmetic. A test with ninety percent sensitivity reaching fifty percent of the eligible population catches forty-five percent of the cancers that could be caught. A test with seventy percent sensitivity reaching ninety percent catches sixty-three percent. Coverage matters more than precision. Every major screening program in American history has been held back by coverage, not precision.

Pap smears, after six decades of effort, reach about seventy-five percent of eligible women in a given year. Mammography reaches roughly seventy-two percent of women in the recommended age range. Colonoscopy reaches about sixty percent of eligible adults. Low-dose CT for lung cancer reaches five percent. The best-performing program on that list misses a quarter of its target population. The worst misses ninety-five percent.

The obstacles are cumulative. Patients have never heard of the test, or they have heard of it but think it does not apply to them, or they understand and are afraid, or embarrassed, or busy, or uninsured, or working multiple shifts, or caring for a sick parent, or living in a neighborhood where the screening site is an hour away on two buses. Each obstacle screens out a fraction. Stacked together, they screen out the majority.

The obstacles concentrate in the populations that need screening most. Black women have lower mammography rates than white women despite higher breast cancer mortality. Low-income adults of every race have lower screening rates than higher-income adults. This is not a failure of individual patients. It is a failure of design. A program that depends on the patient to initiate contact, schedule the appointment, arrange transportation, take time off work, and navigate the clinical pathway will systematically lose the patients with the least capacity to do those things.

Why Blood-Test Algorithms Start Ahead

Every adherence problem comes down to the same hidden assumption. The test is something the patient must add to their life. An extra appointment, an extra procedure, an extra trip, an extra decision. Every extra is a chance to opt out.

Blood-test algorithms break this assumption. The test is not an extra anything. The blood has already been drawn. The patient has already come in. The visit has already happened. The inconvenience has already been absorbed. All the friction that prevents half the population from getting a colonoscopy has been pre-paid for reasons having nothing to do with cancer screening. The algorithm runs on data that already exists. This is the single most important structural difference between this generation of cancer screening tools and every generation that came before. The burden shifts from the patient, where it has always sat, to the system, where it belongs.

One limit needs to be named. Algorithms only reach patients who already interact with the medical system for some reason. People who never see a doctor at all are not reached by this approach. But within the population that does interact with medicine in any form, whether at annual physicals, chronic disease management visits, emergency department visits, pre-operative workups, or occupational health screenings, the blood-test algorithm reaches essentially everyone who has blood drawn.

The Multiplier Effect of Risk Stratification

There is a second structural advantage that most discussions of these algorithms underweight, and it may be the most important single argument in this chapter. It concerns what happens after a patient is flagged.

Traditional cancer screening operates on broad populations. Every adult fifty to seventy-five is eligible for colorectal screening. An eligible patient has roughly a one percent chance of harboring the cancer being screened for. The test casts a wide net and catches a small proportion of genuine cases. The downstream clinical system cannot possibly absorb every eligible patient. Endoscopy suites are already near capacity. So screening programs end up triaging by patient persistence, which correlates with education, income, and all the other factors that drive health disparities.

Blood-test algorithms change this arithmetic. When an algorithm flags a patient as high-risk for colorectal cancer, that patient’s likelihood of actually harboring the disease is not one percent. It is roughly one in ten, a figure consistently reported across the algorithm’s published validation studies. Instead of sending a hundred average-risk patients for colonoscopy to find one cancer, the system sends ten high-risk patients to find the same cancer. The yield per procedure goes up by roughly a factor of ten. The pressure on endoscopy capacity goes down by roughly a factor of ten. The patients who get scoped are, on average, the patients who most needed to be scoped.

This is the multiplier effect, and it deserves its own moment. A cancer screening pathway that has always been overwhelmed by volume can suddenly focus scarce resources on the patients most likely to benefit. Endoscopy suites that were gatekeeping by patient persistence can start gatekeeping by clinical risk. Persistence selects for wealth. Blood-test algorithms select for disease. Those are different populations.

Apply the same math to the other cancers. An ovarian cancer algorithm narrowing the population that needs ultrasound from one hundred thousand women to ten thousand makes the downstream imaging capacity feasible for the first time. A pancreatic cancer algorithm narrowing the population needing MRI from one hundred thousand adults to five thousand converts a clinical impossibility into a manageable referral volume. The algorithms do not just find more cancers. They make it possible, for the first time, to give every flagged patient a thorough workup, because the flagged population is a tenth the size of the eligible population.

What a Good Adherence Plan Looks Like

A patient whose algorithm result places them in the high-risk category has roughly a one in ten chance of harboring cancer. If that patient does not complete follow-up, the probability of a missed cancer is almost a coin flip. This is not cancer screening anymore. It is cancer diagnosis in a patient already flagged. The urgency, the resources, and the communication style should all reflect that distinction.

The plan has five essential pieces.

First, clinical handoff. When the algorithm flags a patient, what should happen is not an electronic alert in a busy primary care inbox. It is a direct handoff to a dedicated nurse navigator whose job is managing flagged patients through the pathway. The navigator contacts the primary care physician, contacts the patient, schedules the follow-up, and tracks the patient through every step. This is already the standard of care for diagnosed cancer patients at major cancer centers. The consortium extends it upstream to flagged patients who have a ten percent chance of cancer. The cost is trivial compared to one advanced cancer case treated over two years.

Second, patient communication. The initial contact should come from the patient’s own physician, explain that routine blood tests showed a pattern warranting a closer look, include a specific next step rather than a generic recommendation, and be available in the patient’s preferred language. The consortium’s New York geography makes language a first-day requirement: Spanish, Chinese, Russian, Haitian Creole, Arabic, and Bengali at minimum. A flagged patient who cannot understand the message is not an adherence problem. It is a health-system failure.

Third, removing logistical barriers. Decades of research show that missed appointments are caused more by logistics than by reluctance. The plan should build in, as defaults rather than exceptions: transportation to the follow-up appointment through rideshare services at no cost to the patient; evening and weekend appointments for patients who cannot miss work; on-site child care or reimbursement; waived copays for diagnostic workup of flagged patients, using charity care mechanisms already in place; and nurse-led preparation support with a phone check-in the night before the procedure.

Fourth, an equity-first deployment. The adherence infrastructure should be deployed most intensively in the populations that have historically received the least support. More navigators at Montefiore, NYC Health and Hospitals, Jamaica Hospital, and Jefferson’s underserved communities. More transportation where transportation is scarcest. More language services where English is least common. In most American cancer programs, the best-resourced patients get the best adherence support. The plan should do the opposite. Patients with the most obstacles should receive the most support, because they are the patients for whom missed follow-up is most likely to result in an undetected cancer.

Fifth, honest measurement. A cancer screening program that does not measure its adherence cannot improve it. Every reporting metric should track the full flagged population, not just the completed population. If the algorithm flagged ten thousand patients, the report should say so, and should track each one through every step: contact attempted, contact successful, appointment scheduled, appointment completed. Reporting should be disaggregated by demographic group from the start. If completion rates at one consortium site are below those at another, the consortium needs to know. If Black patient completion rates are ten points below white patient completion rates, someone in leadership needs to own the problem until the gap closes.

The Patients Who Do Not Come In

Honest accounting requires addressing the limit of this approach. Patients who never interact with the medical system are not reached by blood-test algorithms. They are concentrated in the populations screening programs have always missed most: the uninsured, undocumented immigrants, unhoused individuals, people with severe mental illness. Even among these populations, most adults eventually touch some part of the medical system. Emergency department visits, urgent care clinics, pre-employment physicals, pregnancy care, workplace screenings, and community health fairs all generate blood tests. The consortium’s algorithm deployment should extend to every one of these touchpoints, not just primary care. Emergency departments matter particularly, because patients without primary care use them as their default source of medical contact.

The remaining gap, the population that does not access any part of the medical system, needs community-based outreach of the kind Percac-Lima and colleagues demonstrated at Massachusetts General Hospital: trained community health workers embedded in the neighborhoods they serve, making direct contact with patients who have fallen off the medical system entirely. That work is slow. It is also the only way to reach the population the blood-test algorithm approach cannot reach directly.

Why a Consortium, and Why None of This Works Without One

The adherence plan cannot be built by a single hospital. It has to be built by a consortium. This is an operational requirement, not a preference.

Scale. A single flagship system generates a few thousand cancer cases per year across all thirteen diseases. For common cancers this is enough. For rarer ones like pancreatic and ovarian, it is not. The target group either takes a decade to accumulate at one institution or it reaches into the records of multiple institutions working together. The patients currently dying of pancreatic cancer do not have ten years.

Demographic breadth. Every cancer algorithm published to date has been built on a population narrower than the one it would eventually serve. A consortium that spans from Sloan Kettering’s Manhattan referrals to Montefiore’s Bronx patients to NYC Health and Hospitals’ safety-net clinics to Northwell’s Long Island suburbs to Jefferson’s Philadelphia communities covers every demographic segment that matters for American cancer medicine in a single unified training set. Validation across diverse populations happens at the development stage rather than being hoped for at deployment.

Infrastructure cost. Navigators, transportation, language services, child care, financial support: the cost structure only works at consortium scale. A single hospital trying to build these services for its own flagged patients would be duplicating infrastructure a consortium could build once and share. At any smaller scale, the per-patient cost is too high to sustain.

Cross-institutional data sharing. Flagged patients often receive follow-up at a different institution from the one that ran the algorithm. A patient flagged at NYC Health and Hospitals may be scoped at Sloan Kettering. A patient flagged at Jamaica Hospital, which has no oncology department, may be referred to Jefferson or Northwell. If those institutions cannot share records in both directions, the pipeline breaks at exactly the point where the cancer is supposed to be caught.

Accountability through comparative visibility. When Massachusetts hospitals began publicly reporting central-line infection rates in a coordinated fashion, rates fell dramatically across the state, not because the medicine changed but because the visibility did. A consortium’s adherence reporting, disaggregated by demographic group and visible across member institutions, creates exactly that kind of accountability.

Regulatory leverage and public trust. A coordinated consortium speaks to the FDA, CMS, and professional societies with one voice. It also carries a kind of institutional credibility that no single member can generate alone. Patients are more likely to follow through on a recommendation from a system they trust, and trust is built institutionally.

Why this consortium, in this region, at this moment? The relationships already exist. Board service at Sloan Kettering and Weill Cornell going back two decades. Northwell’s cancer leadership coming from Memorial Sloan Kettering. Connections to Jefferson, Montefiore, NYC Health and Hospitals, and the Israeli partners who built the first validated blood-test cancer algorithm all in place. Consortia are made out of people who trust each other enough to share institutional resources on a difficult, long-term project. Those relationships cannot be created on short notice. Here, they exist. That is the one thing most of the country lacks.

The consortium is not a nice-to-have detail. It is the only structure within which the adherence plan becomes operational. The algorithms can exist without it. The patients cannot be reached without it.

What Happens If We Get This Right

Return to the consortium’s colorectal cancer deployment. Ten thousand flagged patients in a year. Each with a roughly one in ten chance of harboring an undetected cancer. If the adherence infrastructure described in this chapter is built, and completion rates reach eighty or ninety percent across the flagged population instead of the fifteen percent that has been typical for traditional screening follow-up, the arithmetic transforms. Eight or nine thousand patients complete their colonoscopies. Roughly eight hundred to nine hundred cancers are found, the vast majority at stage I or II when surgery alone is often curative. The alternative, where most flagged patients never complete follow-up, leaves the great majority of those cancers to be found later, at stages where five-year survival has collapsed from above ninety percent to well under twenty.

Scale that across the Northeast Corridor consortium’s approximately fifty million covered lives, across all thirteen cancers. Tens of thousands of additional early-stage diagnoses per year. Each of them represents a patient whose five-year survival probability changes from something grim to something good. The survival curves for pancreatic, ovarian, and advanced colorectal cancer, which have barely moved in forty years, start to move.

This is not hypothetical. The algorithms exist. The blood tests already happen. The patients are already in the system. What is missing is the commitment to the adherence infrastructure, the boring, unglamorous, administratively intensive work of making sure each flagged patient actually completes follow-up.

The Quiet Work

The rest of this book has been about the loud work: the machine learning, the biology, the consortium, the thirteen cancers and their recipes. That work is what grabs attention and draws funding.

The work this chapter describes is quiet. Navigators making phone calls. Rideshares arranged. Preparation questions answered at nine in the evening. Appointment reminders in seven languages. Adherence dashboards reviewed monthly. Community health workers walking the streets of the Bronx and North Philadelphia.

This quiet work is where the actual lives are saved. The algorithms make it possible. The algorithms do not do it. A flagged patient who never gets a colonoscopy has gained nothing from being flagged.

The whole enterprise of this book rests on whether the adherence infrastructure gets built with the same rigor as the algorithms themselves. If it does, American cancer mortality can change meaningfully within a decade. If it does not, the algorithms become another line of validation papers that never reach the patients whose lives they could have saved. That is the last mile. It is where the cancer either gets caught in time or does not.

Endnotes

Introduction

1. Wang GJ, Jackson BM, Foley PJ, et al. Epidemiology of fatal ruptured aortic aneurysms in the United States (1999-2016). J Vasc Surg. 2018;68(6):1701-1707. https://doi.org/10.1016/j.jvs.2018.04.053 National death-certificate analysis documenting more than 100,000 US deaths from ruptured aortic aneurysms over 18 years. When deaths listing aneurysm as any contributing cause are included, annual US figures run to roughly 25,000. Forty-three percent of ruptured abdominal aneurysm deaths occurred in people not eligible for current screening guidelines.

2. US Preventive Services Task Force; Owens DK, Davidson KW, Krist AH, et al. Screening for abdominal aortic aneurysm: US Preventive Services Task Force recommendation statement. JAMA. 2019;322(22):2211-2218. https://doi.org/10.1001/jama.2019.18928 The three US screening statements on abdominal aortic aneurysm, issued in 2005, 2014, and 2019. Each update reissued substantially the same criteria.

3. Ventoruzzo G, Zil-E-Ali A, Maldonado TS, et al. Clinical and financial impact of a machine learning powered screening program for abdominal aortic aneurysms. JVS Vasc Insights. 2024;2:100098. https://doi.org/10.1016/j.jvsvi.2024.100098 Reports on the first six months of Geisinger’s live monthly run of the Medial EarlySign AAA-Flag algorithm against its patient database. Documents that the algorithm is running in real clinical workflow. The underlying validation is in Zil-E-Ali et al., J Vasc Surg 2024. Full treatment appears in the aortic aneurysm chapter later in this book.

Chapter 1

1. Galen. De Methodo Medendi [On the Method of Medicine]. Translated by Ian Johnston and G.H.R. Horsley. Cambridge: Harvard University Press, 2011. The foundational text of humoral medicine, in which Galen codified the theory of four humors and prescribed bloodletting as the primary treatment for imbalances of blood. Establishes the theoretical basis that dominated Western medicine for fifteen centuries and contextualizes the physicians’ actions at Washington’s deathbed.

2. Louis PC. Recherches sur les effets de la saignée dans quelques maladies inflammatoires [Researches on the Effects of Bloodletting in Some Inflammatory Diseases]. Paris: de Mignaret, 1835. Summarized in: Warner JH. ‘Therapeutic Explanation and the Edinburgh Bloodletting Controversy.’ Medical History. 1980;24(3):241-258. https://doi.org/10.1017/S0025727300040539 Pierre Louis’s statistical analysis of patient outcomes demonstrating that bled patients died at higher rates than unbled patients. Documents the hostile reception his findings received from the medical establishment and establishes the fifty-year lag between evidence and practice change.

3. National Cancer Act of 1971, Pub. L. No. 92-218, 85 Stat. 778 (1971). History and context in: Rettig RA. Cancer Crusade: The Story of the National Cancer Act of 1971. Princeton: Princeton University Press, 1977. The legislative text and legislative history of the National Cancer Act of 1971, which authorized $1.6 billion in cancer research funding and reorganized the National Cancer Institute. Establishes the scale and intent of the ‘war on cancer’ framing that this chapter evaluates.

4. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 The definitive annual compilation of U.S. cancer incidence, mortality, and survival statistics by stage. Source for all stage-specific five-year survival rates cited in this chapter, including lung cancer survival of 8 percent at distant stage versus 60 percent at localized stage.

5. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for stage-specific survival rates across colorectal, pancreatic, and ovarian cancers. Establishes that late-stage survival rates for these cancers have remained largely static over four decades despite massive investment in treatment research.

6. Mukherjee S. The Emperor of All Maladies: A Biography of Cancer. New York: Scribner, 2010. Pulitzer Prize-winning account of cancer’s history, biology, and treatment. Mukherjee’s analysis of cancer’s biological complexity, including his description of cancer cells as ‘more perfect versions of ourselves,’ establishes the fundamental case for why treating metastatic disease is an inherently limited strategy.

7. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for the survival rate comparisons between early and late stage across all thirteen target cancers. The magnitude of the survival gap, in some cases exceeding tenfold, is the foundational quantitative argument for redirecting resources toward early detection.

8. Reck M, Rodríguez-Abreu D, Robinson AG, et al. Pembrolizumab versus Chemotherapy for PD-L1-Positive Non-Small-Cell Lung Cancer. New England Journal of Medicine. 2016;375(19):1823-1833. https://doi.org/10.1056/NEJMoa1606774 The KEYNOTE-024 pivotal trial establishing pembrolizumab as first-line therapy for high PD-L1 expressing non-small cell lung cancer. Overall survival benefit of approximately 10.4 months versus chemotherapy in the trial population. Revenue figures from Merck 2023 Annual Report.

9. Hwang TJ, Kesselheim AS, Vokinger KN. Lifecycle Regulation of Artificial Intelligence- and Machine Learning-Based Software Devices in Medicine. JAMA. 2019;322(23):2285-2286. For the drug approval analysis: Kim C, Prasad V. Cancer Drugs Approved on the Basis of a Surrogate End Point and Subsequent Overall Survival: An Analysis of 5 Years of US Food and Drug Administration Approvals. JAMA Internal Medicine. 2015;175(12):1992-1994. https://doi.org/10.1001/jamainternmed.2015.5868 Analysis of FDA cancer drug approvals demonstrating that the majority were based on surrogate endpoints rather than overall survival, and that the median overall survival benefit for drugs showing survival improvement was approximately 2.4 months. Establishes the quantitative basis for the chapter’s argument about misallocated research priorities.

10. Louis PC. Researches on the Effects of Bloodletting in Some Inflammatory Diseases. Translated by C.G. Putnam. Boston: Hilliard, Gray, 1836. Historical analysis in: Matthews JR. Quantification and the Quest for Medical Certainty. Princeton: Princeton University Press, 1995. Pierre Louis’s original English-language publication of his statistical findings, and Matthews’s scholarly analysis of the reception those findings received. Documents the specific arguments used to dismiss statistical evidence in favor of clinical authority, arguments that mirror contemporary resistance to algorithmic approaches in medicine.

Chapter 2

1. Papanicolaou GN. New Cancer Diagnosis. Proceedings of the Third Race Betterment Conference. Battle Creek, Michigan: Race Betterment Foundation, 1928:528-534. Historical analysis in: Casper MJ, Clarke AE. Making the Pap Smear into the Right Tool for the Job: Cervical Cancer Screening in the USA, circa 1940-95. Social Studies of Science. 1998;28(2):255-290. https://doi.org/10.1177/030631298028002003 The original 1928 presentation of Papanicolaou’s cervical cancer detection technique and the foundational sociological analysis of its hostile reception. Documents Max Borst’s dismissal and the typographical errors in the published report that contributed to the absence of follow-up, establishing the pattern of resistance to early detection innovation examined throughout this chapter.

2. Papanicolaou GN, Traut HF. Diagnosis of Uterine Cancer by the Vaginal Smear. New York: Commonwealth Fund, 1943. Historical context in: Carmichael DE. The Pap Smear: Life of George N. Papanicolaou. Springfield, IL: Charles C. Thomas, 1973. The definitive 1943 monograph validating the Pap smear as a cancer detection tool in large patient cohorts, and the biography documenting Joseph Hinsey’s role in reviving Papanicolaou’s abandoned research in 1939. Together these establish the eleven-year gap between discovery and institutional revival as a consequence of the absence of a champion, not a failure of evidence.

3. Casper MJ, Clarke AE. Making the Pap Smear into the Right Tool for the Job: Cervical Cancer Screening in the USA, circa 1940-95. Social Studies of Science. 1998;28(2):255-290. https://doi.org/10.1177/030631298028002003. Raab SS, Grzybicki DM. Quality in Cervical Cytology. Archives of Pathology and Laboratory Medicine. 2008;132(10):1529-1534. The primary sociological and quality analysis of Pap smear laboratory conditions, documenting the fifty-cents-per-slide payment structure, the pass-through billing practices driving volume over accuracy, and the published error rates of ten to twenty percent even under good conditions. Establishes the structural argument that perverse economic incentives undermine validated early detection tools independently of the underlying science.

4. Bogdanich W. Lax Laboratories: The Pap Test Misses Much Cervical Cancer Through Labs’ Errors. Wall Street Journal. November 2, 1987. Clinical Laboratory Improvement Amendments of 1988, Pub. L. No. 100-578, 102 Stat. 2903 (1988). Bogdanich’s Pulitzer Prize-winning investigative series exposing Pap smear laboratory quality failures and the subsequent federal legislation establishing quality control standards and the hundred-slides-per-day reading limit. The sixty-year span between Papanicolaou’s 1928 presentation and effective federal regulation is the central quantitative evidence for the First and Second Laws of early detection resistance developed in this chapter.

5. Henschke CI, McCauley DI, Yankelevitz DF, et al. Early Lung Cancer Action Project: Overall Design and Findings from Baseline Screening. Lancet. 1999;354(9173):99-105. https://doi.org/10.1016/S0140-6736(99)06093-6 The landmark 1999 Lancet publication establishing LDCT as a highly sensitive lung cancer detection tool. Documents that among the malignancies detected in the Early Lung Cancer Action Program, the great majority were Stage I, at which surgical cure rates approach eighty to ninety percent, establishing the clinical rationale for the National Lung Screening Trial.

6. Aberle DR, Adams AM, Berg CD, et al. (National Lung Screening Trial Research Team). Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening. New England Journal of Medicine. 2011;365(5):395-409. https://doi.org/10.1056/NEJMoa1102873 The definitive NLST publication demonstrating at least twenty percent mortality reduction from LDCT screening in high-risk patients. The authors note that the stop-screen design and the six and a half year post-screening follow-up period likely understated the true mortality benefit of continuous annual screening, a point that supports the chapter’s argument about the systematic understatement of early detection benefits in trial design.

7. American Lung Association. State of Lung Cancer 2023. https://www.lung.org/research/state-of-lung-cancer. Sineshaw HM, Jemal A, Ng K, et al. Receipt of Cancer Screening Among US Adults. Cancer Epidemiology, Biomarkers and Prevention. 2023;32(6):745-754. https://doi.org/10.1158/1055-9965.EPI-22-1176 Current utilization data establishing that five to six percent of eligible Americans are receiving LDCT lung cancer screening more than a decade after NLST validation and guideline endorsement. Contextualizes the implementation failure relative to the 127,000 annual lung cancer deaths and establishes the quantitative foundation for the chapter’s argument about the Second Law of early detection resistance.

8. Drolet M, Bénard E, Boily MC, et al. Population-Level Impact and Herd Effects Following Human Papillomavirus Vaccination Programmes: A Systematic Review and Meta-Analysis. Lancet Infectious Diseases. 2015;15(5):565-580. https://doi.org/10.1016/S1473-3099(14)71073-4 Meta-analysis documenting the eighty percent decline in HPV infection rates among young women following vaccine introduction in countries with high coverage. Establishes the achievable endpoint of a fully committed early detection and prevention program and demonstrates that the pattern of institutional delay documented in this chapter is not biologically or organizationally inevitable.

9. Sabatino SA, Thompson TD, White MC, et al. Cancer Screening Test Receipt: United States, 2018. MMWR Morbidity and Mortality Weekly Report. 2021;70(2):29-35. https://doi.org/10.15585/mmwr.mm7002a1 CDC county-level analysis of colorectal cancer screening rates documenting the range from forty to eighty percent and the sixty-seven percent national average. Establishes the geographic and demographic pattern of screening inequity that characterizes every early detection technology examined in this chapter and motivates the equity argument developed more fully in Chapter 12.

10. Kinar Y, Kalkstein N, Akiva P, et al. Development and Validation of a Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts: A Binational Retrospective Study. Journal of the American Medical Informatics Association. 2016;23(5):879-890. https://doi.org/10.1093/jamia/ocv195. Hornbrook MC, Goshen R, Choman E, et al. Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data. Digestive Diseases and Sciences. 2017;62(10):2719-2727. https://doi.org/10.1007/s10620-017-4722-8 ColonFlag derivation across 606,403 Israeli patients and validation in 30,674 UK patients, achieving AUC 0.82 with detection maintained for blood tests drawn two years before diagnosis. The Geisinger deployment study documents the eightfold improvement in cancer detection rate, establishing that the algorithm performs substantially better in clinical practice than standard screening alone.

11. Gould MK, Huang BZ, Tammemagi MC, et al. Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. https://doi.org/10.1164/rccm.202007-2791OC LungFlag validation across 6,505 non-small cell lung cancer cases and 189,597 controls at Kaiser Permanente, achieving AUC 0.856 with forty percent sensitivity at ninety-five percent specificity in the nine to twelve month pre-diagnosis window. The algorithm outperforms both USPSTF categorical eligibility criteria and the PLCOm2012 quantitative model, establishing the superiority of machine learning pattern recognition over threshold-based risk assessment.

12. Singh V, Chaganti S, Siebert M, et al. Deep Learning-Based Identification of Patients at Increased Risk of Cancer Using Routine Laboratory Markers. Scientific Reports. 2025;15:12661. https://doi.org/10.1038/s41598-025-97331-6. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 The Siemens Healthineers Deep Profiler study demonstrating simultaneous detection of colorectal, liver, and lung cancer from thirty-three standard CBC and CMP parameters in a single model, establishing the technical feasibility of multi-cancer detection from routine blood draws without new tests or equipment. Cancer Statistics 2024 provides the mortality scale against which the consortium deployment argument is calibrated.

Chapter 3

1. Hanahan D, Weinberg RA. Hallmarks of Cancer: The Next Generation. Cell. 2011;144(5):646-674. https://doi.org/10.1016/j.cell.2011.02.013 The landmark framework paper defining the biological hallmarks of cancer, including angiogenesis, immune evasion, metabolic reprogramming, and sustained proliferative signaling. Establishes the biological basis for why each hallmark produces measurable systemic changes in routine blood parameters, providing the mechanistic foundation for the algorithmic approach described in this chapter.

2. Stone RL, Nick AM, McNeish IA, et al. Paraneoplastic Thrombocytosis in Ovarian Cancer. New England Journal of Medicine. 2012;366(7):610-618. https://doi.org/10.1056/NEJMoa1110352. Virdee PS, Moschandreas J, Sheppard JJ, et al. The Full Blood Count Blood Test for Colorectal Cancer Detection: A Systematic Review, Meta-Analysis, and Critical Appraisal. Cancers. 2020;12(9):2348. https://doi.org/10.3390/cancers12092348 Stone et al. documents the tumor-driven IL-6 signaling mechanism causing paraneoplastic thrombocytosis and its clinical significance as a pre-diagnostic marker. Virdee et al. provides a meta-analysis of fifty-three studies confirming that elevated platelet count is among the six CBC components consistently associated with pre-diagnostic colorectal cancer, with the elevation appearing two to four years before diagnosis.

3. Gould MK, Huang BZ, Tammemagi MC, et al. Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. https://doi.org/10.1164/rccm.202007-2791OC. Sanchez-Salcedo P, et al. Exploring the Neutrophil to Lymphocyte and Platelet to Lymphocyte Ratios as Biomarkers for Lung Cancer Development. European Respiratory Journal. 2015;46(Suppl 59):PA4241. https://doi.org/10.1183/13993003.congress-2015.PA4241 Gould et al. documents the NLR trajectory in the LungFlag validation cohort, establishing the consistent annual progression rate of 2.56 percent in lung cancer patients versus 0.27 percent in controls. Sanchez-Salcedo confirms that the longitudinal CBC inflammatory signature predates lung cancer diagnosis by a year or more, establishing the biological basis for the detection window.

4. Edgren G, Bagnardi V, Bellocco R, et al. Pattern of Declining Hemoglobin Concentration Before Cancer Diagnosis. International Journal of Cancer. 2010;127(6):1429-1436. https://doi.org/10.1002/ijc.25122. Goldshtein I, Nguyen AM, Chodick G. Variations in Hemoglobin Before Colorectal Cancer Diagnosis. European Journal of Cancer Prevention. 2010;19(5):342-344. https://pubmed.ncbi.nlm.nih.gov/20543703 Edgren et al. documents the pattern of hemoglobin decline preceding cancer diagnosis in more than one million Scandinavian blood donors, finding that the decline for colorectal cancer begins three to four years before diagnosis. Goldshtein et al. quantifies the rate at 0.28 grams per deciliter per six-month period while values remain within normal reference range, the primary pre-diagnostic signal exploited by the ColonFlag algorithm.

5. Sharma A, Smyrk TC, Levy MJ, et al. Fasting Blood Glucose Levels Provide Estimate of Duration and Progression of Pancreatic Cancer Before Diagnosis. Gastroenterology. 2018;155(2):490-500. https://doi.org/10.1053/j.gastro.2018.04.025. Kyle RA, Gertz MA, Witzig TE, et al. Review of 1027 Patients with Newly Diagnosed Multiple Myeloma. Mayo Clinic Proceedings. 2003;78(1):21-33. https://doi.org/10.4065/78.1.21 Sharma et al. documents the fasting glucose divergence in pancreatic cancer patients beginning thirty to thirty-six months before clinical diagnosis, confirming the mechanistic role of insulin-producing cell destruction by the tumor. Kyle et al. establishes the characteristic cross-panel myeloma signature, documenting the presence of elevated calcium, creatinine, and total protein alongside declining hemoglobin and albumin in the pre-diagnostic period.

6. Kinar Y, Kalkstein N, Akiva P, et al. Development and Validation of a Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts: A Binational Retrospective Study. Journal of the American Medical Informatics Association. 2016;23(5):879-890. https://doi.org/10.1093/jamia/ocv195 The foundational ColonFlag development and validation study, establishing gradient boosting on twenty CBC parameters as capable of detecting colorectal cancer from blood tests drawn up to two years before diagnosis with AUC 0.82 in 606,403 Israeli patients and comparable performance in 30,674 UK patients. The study’s description of specific feature combinations and their predictive weight provides the basis for the detective analogy developed in this chapter.

7. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016:785-794. https://doi.org/10.1145/2939672.2939785 The primary methodological paper for XGBoost, the gradient boosting framework used in LungFlag and many of the other cancer detection algorithms described in this book. Documents the ensemble learning approach in which hundreds of simple decision rules are combined into a single powerful prediction, providing the technical foundation for the plain-language explanation of gradient boosting in this chapter.

8. Topol EJ. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nature Medicine. 2019;25(1):44-56. https://doi.org/10.1038/s41591-018-0300-7 Topol’s comprehensive review of artificial intelligence applications in medicine, establishing the specific cognitive tasks at which machine learning algorithms outperform human clinicians, particularly the detection of subtle patterns across large numbers of correlated variables. Provides the conceptual framework for the chapter’s argument about why algorithmic blood reading is not a replacement for physician judgment but an extension of analytical capacity that humans alone cannot match.

9. Edgren G, Bagnardi V, Bellocco R, et al. Pattern of Declining Hemoglobin Concentration Before Cancer Diagnosis. International Journal of Cancer. 2010;127(6):1429-1436. https://doi.org/10.1002/ijc.25122. Aoki J, Numata A, Miyamoto T, et al. Machine Learning Model Predicts Abnormal Lymphocytosis Associated with CLL. JCO Clinical Cancer Informatics. 2025;9:e2400197. https://doi.org/10.1200/CCI-24-00197. Kyle RA, et al. A Long-Term Study of Prognosis in Monoclonal Gammopathy of Undetermined Significance. New England Journal of Medicine. 2002;346(8):564-569. https://doi.org/10.1056/NEJMoa01133202 Three studies establishing the variable detection windows for different cancer types: Edgren et al. for colorectal cancer (three to four years), Aoki et al. for chronic lymphocytic leukemia (several years of longitudinal CBC trajectory divergence), and Kyle et al. for the MGUS-to-myeloma continuum where protein gap on CMP is detectable two to five years before malignant transformation. Together these establish that the detection window is cancer-specific and often substantially longer than conventional screening intervals.

10. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820. Chen Q, Rui X, Bian N, et al. Clinical Data Prediction Model to Identify Patients with Early-Stage Pancreatic Cancer. JCO Clinical Cancer Informatics. 2021;5:279-287. https://doi.org/10.1200/CCI.20.00137 Cancer Statistics 2024 provides the stage-specific survival rates used throughout this section. Chen et al. establishes that an XGBoost model on Optum EHR data achieves AUC 0.84 for pancreatic cancer detection, identifying fifty-eight percent of late-stage patients a median of twenty-four months before their actual diagnosis at a stage where survival outcomes would have been substantially better.

11. Cai G, Yu K, Ding PR, et al. AI-Based Models Enabling Accurate Diagnosis of Ovarian Cancer Using Laboratory Tests in China. Lancet Digital Health. 2024;6(3):e176-e186. https://doi.org/10.1016/S2589-7500(23)00245-5. Araujo DC, Bergamasco MD, Okano LT, et al. Unlocking the Complete Blood Count as a Risk Stratification Tool for Breast Cancer. Scientific Reports. 2024;14:10841. https://doi.org/10.1038/s41598-024-61215-y Cai et al. documents the AUC 0.949 internal and 0.882 to 0.884 external validation performance of multi-classifier AI models for ovarian cancer detection from fifty-two laboratory variables across eleven thousand patients, establishing the high end of the algorithmic performance range. Araujo et al. documents the more modest AUC 0.64 for breast cancer detection from CBC, establishing the lower performance boundary and the biological rationale for the variation across cancer types.

12. Hornbrook MC, Goshen R, Choman E, et al. Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data. Digestive Diseases and Sciences. 2017;62(10):2719-2727. https://doi.org/10.1007/s10620-017-4722-8 Kaiser Permanente validation of ColonFlag in an American population after original development in an Israeli population, achieving AUC 0.80 and odds ratio of 34.7 at ninety-nine percent specificity. Documents the cross-population generalizability of the algorithm and establishes the prospective validation at multiple independent health systems as the appropriate next step before broad deployment, which is the central policy argument of this book.

13. Goshen R, Mizrahi L, Akiva P, et al. Computer-Assisted Flagging of Individuals at High Risk of Colorectal Cancer Using the ColonFlag Test. JCO Clinical Cancer Informatics. 2018;2:1-8. https://doi.org/10.1200/CCI.17.00130 Prospective deployment of ColonFlag at Geisinger Health System documenting the application to 25,610 patients overdue for screening, the flagging of 706 high-risk patients, and the eight percent cancer detection rate among the 104 who completed colonoscopy, compared to the approximately one percent rate in standard screening. The eightfold improvement in detection rate is the primary clinical evidence for the real-world efficacy of blood-based algorithmic cancer detection.

Chapter 4

1. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 The definitive annual compilation of U.S. cancer incidence, mortality, and five-year survival rates by stage at diagnosis, drawn from the NCI SEER database. Primary source for all survival figures cited in this chapter.

2. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for colorectal cancer mortality of approximately 53,000 annually and stage-specific five-year survival of ninety-one percent at Stage I versus fourteen percent at Stage IV. The stability of the fourteen percent figure across four decades of treatment advances is the quantitative argument for shifting focus from treatment to earlier detection.

3. Kinar Y, Kalkstein N, Akiva P, et al. Development and Validation of a Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts. Journal of the American Medical Informatics Association. 2016;23(5):879-890. https://doi.org/10.1093/jamia/ocv195. Evenden L, El-Mahdi N, Fanshawe T, et al. Use of ColonFlag Score for Prioritisation of Endoscopy in Colorectal Cancer. BMJ Open Gastroenterology. 2021;8(1):e000541. https://doi.org/10.1136/bmjgast-2020-000541. Goshen R, Mizrahi L, Akiva P, et al. Computer-Assisted Flagging of Individuals at High Risk of Colorectal Cancer Using the ColonFlag Test. JCO Clinical Cancer Informatics. 2018;2:1-8. https://doi.org/10.1200/CCI.17.00130 Kinar et al. documents ColonFlag development across 606,403 patients achieving AUC 0.82. Evenden et al. provides the clinical deployment figures used in this chapter: sensitivity 88 percent, specificity 71 percent, PPV 9.15 percent, NPV 99.45 percent, drawn from the UK deployment study. Goshen et al. documents the eightfold improvement in cancer detection at Geisinger Health System.

4. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for lung cancer mortality of approximately 127,000 annually, the eighty-five percent late-stage diagnosis rate, and the survival gap of sixty percent at localized versus eight percent at distant stage.

5. Gould MK, Huang BZ, Tammemagi MC, et al. Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. https://doi.org/10.1164/rccm.202007-2791OC LungFlag validation across 6,505 non-small cell lung cancer cases and 189,597 controls achieving AUC 0.856 with forty percent sensitivity at ninety-five percent specificity. Documents outperformance of USPSTF categorical criteria and capture of patients who would not qualify for low-dose CT screening under current eligibility guidelines.

6. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for hepatocellular carcinoma mortality of approximately 26,000 annually and stage-specific survival of approximately thirty-eight percent at localized versus three percent at distant stage.

7. Kwok KN, Cheung KM, Lam SJL, et al. Development of a Novel Routine Blood-Based AI Model for Hepatocellular Carcinoma Screening: A Territory-Wide Study. ESMO Gastrointestinal Oncology. 2025;10:100241. https://www.esmogastro.org/article/S2949-8198(25)00110-4/fulltext. Clusmann J, et al. Machine Learning Predicts Liver Cancer Risk from Routine Clinical Data. medRxiv. 2024. https://doi.org/10.1101/2024.11.03.24316662 Kwok et al. documents AUC 0.894 in a territory-wide Hong Kong validation of 75,000-plus patients outperforming AFP. Clusmann et al. documents AUC 0.88 in 900,000-plus individuals from UK Biobank and NIH All of Us, establishing cross-population generalizability.

8. Fang T, Wang Y, Yin X, et al. Diagnostic Sensitivity of NLR and PLR in Early Diagnosis of Gastric Cancer. Journal of Immunology Research. 2020:9146042. https://doi.org/10.1155/2020/9146042 Documents that NLR and PLR from the routine CBC differential outperform conventional tumor markers CEA and CA19-9 for early-stage gastric cancer detection, establishing the superiority of machine learning on routine blood values over existing tumor marker approaches.

9. Ke X, Shi X, Li Z, et al. Predicting Early Gastric Cancer Risk Using Machine Learning. Digital Health. 2024;10. https://doi.org/10.1177/20552076241240905. Wong TCB, Lam SJL, Cheung KM, et al. AI Blood Signature in Common Blood Tests for Detection of Gastric Cancer in a Cohort of 190,000 Individuals. Journal of Clinical Oncology. 2023;41(16_suppl):Abstract 1500. Ke et al. documents XGBoost achieving validation AUC 0.901 with AUC 0.970 for tumor marker-negative patients. Wong et al. documents seventy-nine to ninety-six percent sensitivity in 193,000 patients. Western population validation remains the primary outstanding requirement.

10. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for pancreatic cancer mortality of approximately 51,000 annually and the stage-specific survival gap of fifty percent at localized versus three percent at distant stage. The more than eighty percent late-stage diagnosis rate establishes pancreatic cancer as the most compelling argument for the clinical urgency of blood-based early detection.

11. Sharma A, Smyrk TC, Levy MJ, et al. Fasting Blood Glucose Levels Provide Estimate of Duration and Progression of Pancreatic Cancer Before Diagnosis. Gastroenterology. 2018;155(2):490-500. https://doi.org/10.1053/j.gastro.2018.04.025. Chen Q, Rui X, Bian N, et al. Clinical Data Prediction Model to Identify Patients with Early-Stage Pancreatic Cancer. JCO Clinical Cancer Informatics. 2021;5:279-287. https://doi.org/10.1200/CCI.20.00137 Sharma et al. documents fasting glucose divergence beginning more than two years before pancreatic cancer diagnosis with reversal after surgical resection confirming causality. Chen et al. documents machine learning identifying fifty-eight percent of future late-stage patients a median of twenty-four months before their actual diagnosis.

12. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for ovarian cancer mortality of approximately 13,000 annually and stage-specific survival of ninety-three percent at Stage I versus thirty-one percent at Stage IV.

13. Stone RL, Nick AM, McNeish IA, et al. Paraneoplastic Thrombocytosis in Ovarian Cancer. New England Journal of Medicine. 2012;366(7):610-618. https://doi.org/10.1056/NEJMoa1110352. Cai G, Yu K, Ding PR, et al. AI-Based Models Enabling Accurate Diagnosis of Ovarian Cancer Using Laboratory Tests in China. Lancet Digital Health. 2024;6(3):e176-e186. https://doi.org/10.1016/S2589-7500(23)00245-5 Stone et al. establishes the IL-6-driven paraneoplastic thrombocytosis mechanism and the more than twentyfold elevated ovarian cancer risk associated with thrombocytosis on routine CBC. Cai et al. documents AUC 0.949 internal and 0.882 to 0.884 external validation in 11,000 patients, the largest ovarian cancer blood algorithm published to date.

14. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for kidney cancer mortality of approximately 15,000 annually and stage-specific survival of ninety-three percent at localized versus nineteen percent at distant stage.

15. Li H, Li J, Jia Y, et al. Machine Learning Screening of Kidney Cancer: An Eight-Indicator Blood Test Panel. Current Oncology. 2022;29(12):9135-9149. https://doi.org/10.3390/curroncol29120715. Zhou Y, Walter FM, Singh H, et al. Identifying Opportunities for Timely Diagnosis of Bladder and Renal Cancer via Abnormal Blood Tests. British Journal of General Practice. 2022;72(714):e19-e25. https://doi.org/10.3399/BJGP.2021.0282 Li et al. documents the random forest classifier achieving AUC 0.932 with sensitivity and specificity both above eighty-six percent in 743 renal cell carcinoma patients. Zhou et al. confirms that eight routine blood test abnormalities increase six to eight months before kidney cancer diagnosis in 4,533 patients.

16. Koshiaris C, Van den Bruel A, Oke JL, et al. Early Detection of Multiple Myeloma in Primary Care Using Blood Tests. British Journal of General Practice. 2018;68(674):e586-e593. https://doi.org/10.3399/bjgp18X698357 Documents the pre-diagnostic myeloma signature in 2,703 UK primary care cases: hemoglobin declining three years before diagnosis, ESR elevated two years before with odds ratio 5.7, and hypercalcemia carrying odds ratio 11.4 for subsequent myeloma diagnosis.

17. Fan G, Zhang L, Zhao W, et al. Routine Blood Biomarkers for the Detection of Multiple Myeloma Using Machine Learning. International Journal of Laboratory Hematology. 2022;44(3):558-566. https://doi.org/10.1111/ijlh.13806. Cai J, Li H, Zeng X, et al. Construction of the Prediction Model for Multiple Myeloma Based on Machine Learning. International Journal of Laboratory Hematology. 2024;46(5):918-926. https://doi.org/10.1111/ijlh.14324 Fan et al. documents AdaBoost achieving AUC 0.968 and 92.6 percent accuracy. Cai et al. documents random forest achieving training AUC 0.956 and test AUC 0.875. Together these establish the technical feasibility of highly accurate myeloma detection from routine blood biomarkers.

18. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for combined leukemia mortality of approximately 24,000 annually across all forms.

19. Hauser RG, Bhatt DL, He J, et al. A Machine Learning Model to Predict Future Diagnosis of CML With Retrospective EHR Data. American Journal of Clinical Pathology. 2021;156(6):1142-1148. https://doi.org/10.1093/ajcp/aqab086. Aoki J, Numata A, Miyamoto T, et al. Machine Learning Model Predicts Abnormal Lymphocytosis Associated with CLL. JCO Clinical Cancer Informatics. 2025;9:e2400197. https://doi.org/10.1200/CCI-24-00197 Hauser et al. documents XGBoost achieving AUC 0.87 to 0.96 for CML with basophil percentage as the most informative pre-diagnostic feature. Aoki et al. documents random forest achieving AUC 0.92 for CLL on 1,090,707 patients over seven years of longitudinal blood count data.

20. Ahmed R, Hameed Ullah A, Qureshi TZ, et al. The Outcome of Hodgkin Lymphoma With Reference to Prognostic Markers. Cureus. 2022;14(8):e28509. https://doi.org/10.7759/cureus.28421 Documents ESR elevated in 56.7 percent and LDH deranged in 51.6 percent of Stage III Hodgkin lymphoma patients at diagnosis, establishing the consistency and prevalence of the pre-diagnostic inflammatory blood signature across lymphoma subtypes.

21. Christensen M, Vistisen D, Hilden J, et al. Predicting Hematological Malignancies Using Complete Blood Cell Counts From Primary Care. Hemasphere. 2023;7(Suppl):e72792be. https://doi.org/10.1097/01.HS9.0000977488.72792.be. Christensen ME, Elnegaard S, Lund PE, et al. Blood Sampling Patterns in Primary Care Change Several Years Before a Cancer Diagnosis. Acta Oncologica. 2024;63:28559. https://doi.org/10.2340/1651-226X.2024.28559 Christensen et al. 2023 documents AUC 0.84 at six months and 0.85 with five years of blood count history for hematological malignancy prediction in 663,184 patients. Christensen et al. 2024 documents the striking finding that blood testing activity increases five or more years before hematological cancer diagnosis.

22. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for bladder cancer mortality of approximately 17,000 annually and stage-specific survival of approximately seventy percent at localized versus eight percent at distant stage.

23. Tsai IJ, Tsai TH, Wu CF, et al. Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data. Diagnostics. 2022;12(1):203. https://doi.org/10.3390/diagnostics12010203. Zhou Y, Walter FM, Singh H, et al. Identifying Opportunities for Timely Diagnosis of Bladder and Renal Cancer via Abnormal Blood Tests. British Journal of General Practice. 2022;72(714):e19-e25. https://doi.org/10.3399/BJGP.2021.0282 Tsai et al. documents LightGBM achieving AUC 0.88 to 0.92 for bladder cancer detection. Zhou et al. confirms the blood test inflection pattern six to eight months before diagnosis in thousands of UK patients.

24. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for esophageal cancer mortality of approximately 16,000 annually and five percent distant-stage survival, among the lowest of any cancer in this group.

25. Singh V, Chaganti S, Siebert M, et al. Deep Learning-Based Identification of Patients at Increased Risk of Cancer Using Routine Laboratory Markers. Scientific Reports. 2025;15:12661. https://doi.org/10.1038/s41598-025-97331-6 The Siemens Deep Profiler study confirming that pan-cancer inflammatory signatures in standard blood panels are detectable months before clinical cancer diagnosis across multiple tumor types including gastrointestinal cancers, establishing the biological basis for the esophageal cancer detection approach while acknowledging that dedicated large-population model development remains outstanding.

26. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820 Source for thyroid cancer incidence of approximately 44,000 new cases annually and mortality of approximately 2,000, establishing the favorable overall prognosis that reflects the predominance of well-differentiated forms alongside the minority of aggressive subtypes driving the mortality burden.

27. Chien MN, Yang PS, Hsu YC, et al. Predicting Thyroid Cancer and Hypothyroidism Using Machine Learning on Routine Blood Tests. Frontiers in Endocrinology. 2023;14:1086024. https://doi.org/10.3389/fendo.2023.1086024 Documents machine learning models achieving AUC approximately 0.91 for thyroid cancer and dysfunction detection using TSH trajectory combined with CBC and metabolic panel values in multi-institutional datasets, establishing technical feasibility and natural fit with existing clinical TSH monitoring infrastructure.

28. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820. Kinar Y, et al. Journal of the American Medical Informatics Association. 2016;23(5):879-890. Gould MK, et al. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. The 100,000 to 175,000 annual lives saveable estimate is calculated by multiplying annual deaths for each of the thirteen cancers by the stage-shift survival benefit documented in Cancer Statistics 2024, applying fifty percent population penetration as a conservative assumption, and using algorithm performance comparable to the deployed ColonFlag and LungFlag systems. Full methodology at conqueringcancer.xyz.

Chapter 5

1. Kinar Y, Kalkstein N, Akiva P, et al. Development and Validation of a Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts: A Binational Retrospective Study. Journal of the American Medical Informatics Association. 2016;23(5):879-890. https://doi.org/10.1093/jamia/ocv195 The foundational ColonFlag development and validation study, training gradient boosting and random forest models on 606,403 Israeli patients and validating on 30,674 UK patients. Achieves AUC 0.82 with detection maintained for blood tests drawn up to two years before diagnosis. Establishes the training methodology and cross-population validation approach that all subsequent deployments built upon.

2. Goshen R, Mizrahi L, Akiva P, et al. Computer-Assisted Flagging of Individuals at High Risk of Colorectal Cancer Using the ColonFlag Test. JCO Clinical Cancer Informatics. 2018;2:1-8. https://doi.org/10.1200/CCI.17.00130 The Maccabi Healthcare Services prospective deployment study. Applied ColonFlag to 79,671 patients overdue for colorectal cancer screening, flagged 688 in the highest 0.87 percentile, and found 19 colorectal cancers among 254 who underwent colonoscopy, plus 15 additional cancers identified through code matching. Establishes the first large-scale real-world deployment results and the 7.5 percent detection rate in the colonoscopy-completing flagged population.

3. Goshen R, Mizrahi L, Akiva P, et al. Computer-Assisted Flagging of Individuals at High Risk of Colorectal Cancer Using the ColonFlag Test. JCO Clinical Cancer Informatics. 2018;2:1-8. https://doi.org/10.1200/CCI.17.00130 The 15 additional cancers identified through code matching in the Maccabi deployment, representing cancers diagnosed outside the Maccabi system and recorded retrospectively in the electronic medical record. These cases confirm that the flagged population carried genuinely elevated cancer risk beyond what colonoscopy completion alone captured, supporting the algorithm’s clinical validity.

4. Hornbrook MC, Goshen R, Choman E, et al. Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data. Digestive Diseases and Sciences. 2017;62(10):2719-2727. https://doi.org/10.1007/s10620-017-4722-8. Geisinger Health System demographic and service area data from: Geisinger Health System. About Geisinger. https://www.geisinger.org/about-geisinger Hornbrook et al. documents the ColonFlag validation and deployment at Geisinger Health System. Geisinger serves over three million patients across 45 counties in rural and semi-rural Pennsylvania, with 32 of those counties designated rural and a service area household income 15 percent below the national average, establishing the demographic context that makes the eightfold improvement in cancer detection particularly significant for underserved populations.

5. Hornbrook MC, Goshen R, Choman E, et al. Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data. Digestive Diseases and Sciences. 2017;62(10):2719-2727. https://doi.org/10.1007/s10620-017-4722-8 Documents the Geisinger deployment results: ColonFlag applied to 25,610 patients overdue for screening, 706 flagged as high-risk, 104 completing colonoscopy with eight percent cancer detection rate versus approximately one percent in standard population screening. The eightfold improvement in detection rate is the primary clinical evidence for the real-world efficacy of blood-based algorithmic cancer detection in a US health system context.

6. Evenden L, El-Mahdi N, Fanshawe T, et al. Use of ColonFlag Score for Prioritisation of Endoscopy in Colorectal Cancer. BMJ Open Gastroenterology. 2021;8(1):e000541. https://doi.org/10.1136/bmjgast-2020-000541 The UK COVID-19 pandemic deployment study, applying ColonFlag to triage patients on disrupted colonoscopy waiting lists. Establishes the triage use case as distinct from but complementary to the population screening use case demonstrated at Maccabi and Geisinger, demonstrating the algorithm’s clinical utility across different deployment contexts and healthcare system structures.

7. Evenden L, El-Mahdi N, Fanshawe T, et al. Use of ColonFlag Score for Prioritisation of Endoscopy in Colorectal Cancer. BMJ Open Gastroenterology. 2021;8(1):e000541. https://doi.org/10.1136/bmjgast-2020-000541 Source for the UK deployment clinical performance figures: sensitivity 88 percent, specificity 71 percent, PPV 9.15 percent, NPV 99.45 percent. These are the most complete clinical diagnostic accuracy figures published for any ColonFlag deployment and provide the basis for the sensitivity, specificity, PPV, and NPV discussion in Chapter 4.

8. Gould MK, Huang BZ, Tammemagi MC, et al. Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. https://doi.org/10.1164/rccm.202007-2791OC The foundational LungFlag validation study across 6,505 non-small cell lung cancer cases and 189,597 controls at Kaiser Permanente Southern California. Describes the study design, patient selection, blood test feature set, and XGBoost methodology, establishing the scientific basis for the clinical performance figures cited throughout this chapter and Chapter 4.

9. Gould MK, Huang BZ, Tammemagi MC, et al. Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. https://doi.org/10.1164/rccm.202007-2791OC. Yang HC, Duan F, Bitterman DS, et al. Age-Based Screening for Lung Cancer Surveillance in the US. JAMA Network Open. 2025;8(11):e2546222. https://doi.org/10.1001/jamanetworkopen.2024.56088 Gould et al. documents AUC 0.856 with forty percent sensitivity at ninety-five percent specificity in the nine to twelve month pre-diagnosis window, establishing the primary performance figures for LungFlag. Yang et al. documents that only 35.1 percent of 997 lung cancer patients met USPSTF LDCT screening criteria, with the excluded 65 percent disproportionately female and Asian, establishing the magnitude of the population gap that blood-based detection addresses.

10. Kinar Y, et al. Journal of the American Medical Informatics Association. 2016;23(5):879-890. Goshen R, et al. JCO Clinical Cancer Informatics. 2018;2:1-8. Hornbrook MC, et al. Digestive Diseases and Sciences. 2017;62(10):2719-2727. Evenden L, et al. BMJ Open Gastroenterology. 2021;8(1):e000541. Gould MK, et al. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. The five-step deployment template described in this chapter is synthesized from the published deployment literature for ColonFlag across three countries and the LungFlag validation study. Together these five studies represent the complete evidentiary record for what prospective algorithm deployment requires and what it produces, establishing the template that the consortium model in Chapter 10 proposes to apply to the remaining validated cancer detection algorithms.

Chapter 7

1. Kinar Y, Kalkstein N, Akiva P, et al. Development and validation of a predictive model for detection of colorectal cancer in primary care by analysis of complete blood counts: a binational retrospective study. J Am Med Inform Assoc. 2016;23(5):879-890. https://doi.org/10.1093/jamia/ocv195 The original ColonFlag development paper. The Maccabi Healthcare Services research team trained an ensemble of decision trees on the complete blood count history of more than 600,000 Israeli patients aged 40 and older. Feature engineering transformed each patient’s raw CBC values into trajectory features measuring 18-month and 36-month rates of change, a design choice that proved central to the algorithm’s power. The finished model was then tested on a completely separate UK primary care database (The Health Improvement Network), where it maintained AUC of 0.82 and 94% specificity. This cross-population validation, testing an Israeli-trained model on British patients, established that the blood signatures ColonFlag reads are a feature of colorectal cancer biology rather than of any one health system’s patient population. Adding ColonFlag on top of fecal occult blood testing more than doubled cancer detection. The paper’s longitudinal feature engineering approach has become the de facto standard for all subsequent blood-based cancer detection algorithms.

2. Chabon JJ, Hamilton EG, Kurtz DM, et al. Integrating genomic features for non-invasive early lung cancer detection. Nature. 2020;580(7802):245-251. https://doi.org/10.1038/s41586-020-2140-0 Although the LungFlag development and validation at Kaiser Permanente Southern California on 200,000 patients remains the most clinically advanced lung cancer blood algorithm, the Chabon paper represents the complementary line of research using cell-free DNA methods alongside routine bloodwork. The authors developed a machine learning classifier (Lung-CLiP) that integrates cell-free DNA fragmentation patterns, somatic mutation signatures, and clinical features. While more expensive than routine CBC-based approaches, the study’s methodology for handling the biological noise inherent to cell-free DNA provides important context for any lung cancer algorithm. The design demonstrates that for lung cancer specifically, the inflammatory CBC signature that LungFlag exploits and the genomic signature that cell-free DNA methods exploit are complementary rather than competing, and a consortium-scale lung cancer algorithm should eventually integrate both.

3. Kwok KN, Clusmann J, Unger K, et al. Development of a novel routine blood-based AI model for hepatocellular carcinoma risk stratification. medRxiv. 2024. Preprint. https://doi.org/10.1101/2024.11.03.24316662 The Kwok team integrated six data modalities (demographics, electronic health records, lifestyle, routine blood and urine biomarkers, common genomics, and targeted metabolomics) from the UK Biobank and US All of Us Research Program, covering over 900,000 individuals and 983 hepatocellular carcinoma cases. The integrated random forest model achieved AUROC 0.88 on both internal and external test sets, significantly outperforming published state-of-the-art risk scores (HCCdetect, aMAP, ADRESS-HCC). The study is methodologically important for several reasons: it demonstrates that combining data modalities improves performance over any single source, it held up across ethnic subgroups (addressing a persistent fairness concern in cancer algorithms), and it used two geographically separated cohorts for development and validation rather than relying on an internal split. Outlier truncation at the 99.9th percentile on continuous variables and explicit pre-registered handling of missingness illustrate the careful methodology that rare cancer algorithms require.

4. Ke X, Chen J, Lin Y, et al. Predicting early gastric cancer risk using machine learning: a multi-center retrospective study. Digit Health. 2024;10:20552076241240905. https://doi.org/10.1177/20552076241240905 The XHGC20 XGBoost model distinguishes gastric cancer from precancerous lesions using 20 routine clinical laboratory tests performed at general surgery admission, all covered by Chinese national medical insurance. The study’s exceptional methodological feature is its use of precancerous lesion controls (atrophic gastritis, intestinal metaplasia) rather than healthy volunteers, producing a clinically realistic comparator that mirrors the diagnostic question physicians actually face. Overall AUC of 0.901 rises to 0.888 for early-stage gastric cancer specifically, and 0.970 in the critical subgroup of patients with negative traditional tumor markers (CEA and CA19-9), the patients currently missed by standard screening. The model was validated on an independent test set with 5-fold cross-validation on hyperparameters, and SHAP values identified which laboratory tests contributed most to predictions. Limitations include single-center recruitment and absence of prospective external validation, but the design choices represent the current state of the art for building clinically useful gastric cancer detection algorithms.

5. Chen Q, Cherry DR, Nalawade V, et al. Clinical data prediction model to identify patients with early-stage pancreatic cancer. JCO Clin Cancer Inform. 2021;5:279-287. https://doi.org/10.1200/CCI.20.00137 The Chen team built an XGBoost model on Optum electronic health record data from 2008 to 2017 to detect signatures of early-stage pancreatic cancer in the year before clinical diagnosis. From 50,707 patients with pancreatic cancer, they classified cases as early-stage if they received major pancreatic surgery. Each of 3,322 early-stage cases was matched to 16 controls, with a 13- to 1-month pre-diagnosis feature window to prevent leakage from the diagnostic workup itself. The final model used 582 predictive features spanning physician notes (via natural language processing), procedures, diagnoses, medications, and demographics. The study’s methodological importance is twofold: it demonstrates the feasibility of identifying pancreatic cancer signatures up to a year before diagnosis, and it explicitly acknowledges that while discrimination was favorable, low pancreatic cancer incidence in unselected populations produces a high false-positive burden, motivating tiered screening strategies rather than direct population deployment. The 13-to-1 window design is now standard.

6. Bast RC Jr, Lu Z, Han CY, et al. Biomarkers and strategies for early detection of ovarian cancer. Cancer Epidemiol Biomarkers Prev. 2020;29(12):2504-2512. https://doi.org/10.1158/1055-9965.EPI-20-1057 This comprehensive review by a leading ovarian cancer research team synthesizes the evidence for blood-based ovarian cancer detection, including the strong platelet count signal driven by tumor-stimulated thrombopoietin production. The review documents that elevated platelet count precedes ovarian cancer diagnosis by more than eighteen months and carries an odds ratio above twenty for subsequent diagnosis, making it among the strongest single-marker signals in the cancer screening literature. The paper also critically appraises the CA-125 tumor marker test, currently the only blood-based ovarian cancer surveillance tool in clinical use, and documents its inadequacy for early-stage detection. Machine learning models combining routine blood values with CA-125 have reached AUC 0.95 to 0.97 in published studies, performance levels that would be extraordinary for any cancer screening test. The review identifies ovarian cancer as among the highest-priority candidates for algorithmic deployment because the unmet clinical need is so large and the published evidence so strong.

7. Li H, Lin J, Xiao Y, et al. Machine learning screening of kidney cancer: an eight-indicator blood test using a naive Bayes classifier. Curr Oncol. 2022;29(12):9135-9149. https://doi.org/10.3390/curroncol29120722 This single-center proof of concept at Dazhou Central Hospital in Sichuan Province used eight routine blood indicators (albumin, total protein, hemoglobin, blood urea nitrogen, creatinine, uric acid, hs-CRP, and alkaline phosphatase) to distinguish 743 patients with clear cell renal cell carcinoma from 500 age-matched controls. The naive Bayes classifier was chosen for interpretability. Reported AUC of 0.93 with sensitivity and specificity both above 86% is striking given the low cost of the input panel, all of which appears on standard CMP and CBC panels available in any hospital. The study’s limitations are also instructive: target group is small for a moderate-signal cancer, controls are demographically narrow and drawn from a single Chinese hospital, no external validation at other institutions has been attempted, and the study did not evaluate whether the algorithm distinguishes kidney cancer from its clinical mimics (chronic kidney disease, kidney stones, benign cysts). The study is a valuable starting point and an incomplete foundation for deployment.

8. Koshiaris C, Van den Bruel A, Oke JL, et al. Early detection of multiple myeloma in primary care using blood tests: a case-control study in primary care. Br J Gen Pract. 2018;68(674):e586-e593. https://doi.org/10.3399/bjgp18X698357 This UK primary care matched case-control study of 2,703 myeloma cases and 12,157 controls from the Clinical Practice Research Datalink identified the pre-diagnostic blood test abnormalities and symptoms most strongly associated with subsequent multiple myeloma diagnosis. Conditional logistic regression on the matched design examined symptom prevalence and abnormal blood tests up to five years before diagnosis. Hemoglobin began declining measurably three years before diagnosis in many patients, and elevated calcium carried an odds ratio of more than 11 for subsequent myeloma diagnosis. The methodology’s strengths include use of a large validated primary care database, matched design controlling for practice-level confounding, and transformation of relative risk estimates into positive predictive values clinicians can actually use. While the study did not build a deployable machine learning model, it established the feature set and pre-diagnostic window (up to five years) that subsequent myeloma algorithms have built on, including models that achieved AUC of 0.957 to 0.968.

9. Aoki J, Kaya C, Khalid O, et al. Machine-learning model predicts lymphocytosis associated with chronic lymphocytic leukemia from routine complete blood count data. JCO Clin Cancer Inform. 2025;9:e2400197. https://doi.org/10.1200/CCI-24-00197 The Aoki team trained a random survival forest classifier on over one million adults (filtered from 5.7 million) tested at Sonic Healthcare USA laboratories between 2017 and 2023, predicting CLL-associated lymphocytosis within five years. The ground-truth outcome was defined as a new onset of absolute lymphocyte count at least 5 × 10/L with at least 40% relative lymphocytosis, chosen because ICD-10 coding consistently under-captures indolent hematologic malignancies. Twelve engineered features (initial, maximum, last, and slope of absolute lymphocyte count, white blood cell count, and platelet count, plus age and sex) achieved AUC of 0.92. Critically, feature importance analysis showed that slope features (rates of change over time) contributed more to predictions than any single cross-sectional value, confirming the design principle that blood-test cancer algorithms should treat longitudinal trajectory as a first-class input. The cohort was split 80:20 for training and held-out testing, with bootstrap confidence intervals on performance.

10. Christensen M, Larsen MH, Hansen PJ, et al. Predicting hematological malignancies using complete blood count and five-year history. HemaSphere. 2023;7(S3):5114. EHA2023 abstract. https://doi.org/10.1097/01.HS9.0000976968.49938.10 This Danish population-based study drew from the Copenhagen Primary Care Laboratory database (112 million test results from 2000 to 2016, covering roughly 20% of the Danish population) linked to the Danish Cancer Registry. The analytic cohort of 663,184 adults supported a SuperLearner ensemble classifier predicting any hematological malignancy within 6, 12, and 24 months. Matching was done at the event level rather than the patient level, with up to five non-cancer CBCs per cancer-preceding CBC matched on sex and age within four years. AUC reached 0.85 at six months, 0.81 at twelve months, and 0.75 at twenty-four months using only the index CBC. Adding five-year CBC history produced marginal gains (AUCs 0.85, 0.81, 0.75), suggesting that the index CBC carries most of the predictive signal. Decision curve analysis confirmed positive net benefit across a wide range of clinically relevant threshold probabilities and clear superiority over current WHO referral criteria.

11. Tsai IJ, Shen WC, Lee CL, Wang HD, Lin CY. Machine learning in prediction of bladder cancer on clinical laboratory data. Diagnostics (Basel). 2022;12(1):203. https://doi.org/10.3390/diagnostics12010203 The Tsai team compared five machine learning algorithms (decision tree, random forest, support vector machine, XGBoost, LightGBM) on 1,336 patients with diagnoses of cystitis, bladder cancer, kidney cancer, uterine cancer, or prostate cancer. The design’s key strength is its control group: rather than healthy volunteers, the comparison was against the clinically confusable conditions a urologist actually sees. Feature selection combined Waikato Environment for Knowledge Analysis correlation-based and wrapper-based filters with forward selection, yielding a final panel of eight values (calcium, alkaline phosphatase, albumin, urine ketones, urine occult blood, creatinine, ALT, diabetes status). LightGBM achieved accuracy of 84.8% to 86.9%, sensitivity of 84% to 87.8%, and AUC of 0.88 to 0.92 across comparisons. The authors appropriately position the tool as a pre-screening aid rather than a replacement for cystoscopy, whose sensitivity remains 88% to 100%. The study is the only routine-chemistry-only machine learning approach for bladder cancer with competitive performance.

12. Virdee PS, Marian IR, Mansouri A, et al. The full blood count blood test for colorectal cancer detection: a systematic review, meta-analysis, and critical appraisal. Cancers (Basel). 2020;12(9):2348. https://doi.org/10.3390/cancers12092348 Although focused on colorectal cancer, the Virdee systematic review established the methodological framework for evaluating any gastrointestinal cancer algorithm built on routine blood work, including esophageal cancer. The PRISMA-compliant review of 53 eligible articles synthesized evidence on full blood count components associated with colorectal cancer and critically appraised 13 existing prediction models. Risk of bias assessment used QUADAS-2 and CHARMS frameworks. Meta-analysis found that red blood cell indices, white cell count, and platelet counts show consistent associations with gastrointestinal malignancy. Prediction model c-statistics ranged from 0.72 to 0.91, with the authors emphasizing the common gap between training-set performance and independently validated performance. For esophageal cancer specifically, no large-population study of the ColonFlag scale exists, which is why the recipe in this chapter specifies initial algorithm development rather than deployment-scale validation. The Virdee methodology should guide the consortium’s esophageal cancer development study design.

13. Lugner M, Rawshani A, Helleryd E, Eliasson B. Identifying top ten predictors of type 2 diabetes through machine learning analysis of UK Biobank data. Sci Rep. 2024;14:2102. https://doi.org/10.1038/s41598-024-52023-5 Although this study predicts type 2 diabetes rather than thyroid cancer, its methodology is directly relevant to the thyroid cancer recipe because it demonstrates how to handle a chronic endocrine condition with overlapping blood signatures. The UK Biobank random forest analysis of 448,277 adults identified the top 10 baseline predictors of 10-year incident type 2 diabetes from hundreds of candidate variables. Variable selection followed a rigorous two-stage process: first excluding all post-baseline variables to preserve prediction validity, then applying expert-judgment manual review to eliminate administrative variables without health relevance. Variables with greater than 70% missingness were removed. Random forest permutation-based variable importance identified HbA1c as the strongest predictor, with TSH-like signals (gamma-glutamyl transferase in the diabetes case) contributing meaningfully. The methodological framework, particularly the handling of trajectory features and missing data in endocrine screening, transfers directly to thyroid cancer algorithm development, where TSH trajectory is the analogous primary feature.

Chapter 8

1. Heron M. Deaths: Leading Causes for 2020. National Vital Statistics Reports. 2021;70(9):1-113. https://www.cdc.gov/nchs/data/nvsr/nvsr70/nvsr70-09-508.pdf. Centers for Disease Control and Prevention. Diabetes Report Card 2022. Atlanta: Centers for Disease Control and Prevention, US Department of Health and Human Services, 2022. https://www.cdc.gov/diabetes/library/reports/reportcard.html National mortality data establishing the scale of non-cancer chronic disease deaths in the United States: cardiovascular disease 700,000, sepsis 270,000, chronic kidney disease 57,000 with 130,000 on dialysis annually, heart failure 68,000, Type 2 diabetes 89,000 directly plus hundreds of thousands through complications. Together these establish that non-cancer chronic diseases kill more than three times as many Americans annually as the thirteen cancers in Chapter 4.

2. Weng SF, Reps J, Kai J, et al. Can Machine-Learning Improve Cardiovascular Risk Prediction Using Routine Clinical Data? PLOS ONE. 2017;12(4):e0174944. https://doi.org/10.1371/journal.pone.0174944 Neural network trained on 378,256 UK primary care patients achieving AUC 0.764 versus 0.728 for the Pooled Cohort Equations, correctly identifying 355 additional cardiovascular events per 100,000 patients. Documents that machine learning on routine clinical data substantially outperforms traditional cardiovascular risk scoring, establishing the case for replacing threshold-based evaluation with algorithmic pattern recognition.

3. Tonelli M, Wiebe N, Culleton B, et al. Chronic Kidney Disease and Mortality Risk: A Systematic Review. Journal of the American Society of Nephrology. 2006;17(7):2034-2047. https://doi.org/10.1681/ASN.2005101085. Madjid M, Fatemi O. Components of the Complete Blood Count as Risk Predictors for Coronary Heart Disease. Texas Heart Institute Journal. 2013;40(1):17-29. https://pubmed.ncbi.nlm.nih.gov/23467296 Tonelli et al. documents the independent cardiovascular mortality risk associated with declining kidney function markers in routine blood panels. Madjid and Fatemi establishes that neutrophil-to-lymphocyte ratio, white blood cell subtypes, and platelet indices from routine CBC independently predict coronary heart disease events, confirming that the blood markers driving cardiovascular prediction are standard panel values already collected universally.

4. Razavian N, Blecker S, Schmidt AM, et al. Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors. Big Data. 2015;3(4):277-287. https://doi.org/10.1089/big.2015.0020. Lugner M, Carlsson Petri KC, Lehmann R, et al. Identifying Top Ten Predictors of Type 2 Diabetes Through Machine Learning Analysis of UK Biobank Data. Scientific Reports. 2024;14:2102. https://doi.org/10.1038/s41598-024-52023-5 Razavian et al. documents machine learning predicting Type 2 diabetes onset three to five years before diagnosis while glucose remained below diagnostic thresholds, achieving AUC 0.80 from routine lab values. Lugner et al. documents AUC 0.90 on 448,277 UK Biobank participants using a ten-feature model including HbA1c, fasting glucose, GGT, HDL, and metabolic markers, establishing the blood signature of developing diabetes.

5. Knowler WC, Barrett-Connor E, Fowler SE, et al. Reduction in the Incidence of Type 2 Diabetes with Lifestyle Intervention or Metformin. New England Journal of Medicine. 2002;346(6):393-403. https://doi.org/10.1056/NEJMoa012512 The landmark Diabetes Prevention Program randomized controlled trial establishing that modest lifestyle intervention reduces diabetes incidence by 58 percent and metformin reduces it by 31 percent in prediabetic patients. The 58 percent reduction is the clinical evidence base for the chapter’s argument that earlier algorithmic identification of prediabetic patients, when the window for prevention is still open, represents one of the highest-value interventions in preventive medicine.

6. Tangri N, Grams ME, Levey AS, et al. Multinational Assessment of Accuracy of Equations for Predicting Risk of Kidney Failure: A Meta-Analysis. Journal of the American Medical Association. 2016;315(2):164-174. https://doi.org/10.1001/jama.2015.18202 Meta-analysis documenting that machine learning predicting CKD progression from routine comprehensive metabolic panel values achieves C-statistics of 0.83 to 0.88, enabling Stage 1 to 2 detection years before conventional diagnostic thresholds are crossed. Establishes the two to five year detection window for kidney disease from routine blood chemistry and confirms the clinical utility of trajectory-based rather than threshold-based evaluation.

7. Tangri N, Kucirka LM, Appel LJ, Grams ME, et al. Validation of the Klinrisk Machine Learning Model in the CANVAS Program and CREDENCE Trial. Kidney International. 2022;102(6):1284-1292. https://doi.org/10.1016/j.kint.2022.08.026 Documents Klinrisk validation in the CANVAS and CREDENCE clinical trial populations, achieving AUC 0.88 and outperforming the KDIGO heatmap at every time interval. References the subsequent validation on 4.8 million US adults across commercial, Medicare, and Medicaid populations and the October 2025 CE-mark from Roche for the navify platform, establishing Klinrisk as the most commercially advanced non-cancer blood algorithm currently deployed.

8. Segar MW, Jaeger BC, Patel KV, et al. Development and Validation of Machine Learning-Based Race-Specific Models to Predict 10-Year Risk of Heart Failure: A Multicohort Analysis. Circulation. 2021;143(24):2370-2383. https://doi.org/10.1161/CIRCULATIONAHA.120.053134 Oblique random survival forest trained on 19,080 participants from ARIC, Dallas Heart Study, Jackson Heart Study, and MESA cohorts, all free of heart failure at baseline. Achieves C-index 0.88 to 0.89 for 10-year incident heart failure prediction using blood values collected at a single routine visit, substantially outperforming established clinical risk scores including ARIC-HF, MESA-HF, and PCP-HF.

9. McDonagh TA, Metra M, Adamo M, et al. 2021 ESC Guidelines for the Diagnosis and Treatment of Acute and Chronic Heart Failure. European Heart Journal. 2021;42(36):3599-3726. https://doi.org/10.1093/eurheartj/ehab368 ESC heart failure guidelines establishing SGLT2 inhibitor benefit of 30 to 40 percent reduction in incident heart failure in high-risk metabolic patients, blood pressure control benefit of 30 to 50 percent reduction over a decade, and statin therapy benefit of 20 to 30 percent reduction in high-risk populations. Establishes that the interventions most capable of preventing heart failure are most effective at the five to ten year horizon, precisely the horizon at which the Segar et al. model achieves C-index 0.88.

10. Rhee C, Dantes R, Epstein L, et al. Incidence and Trends of Sepsis in US Hospitals Using Clinical vs Claims Data, 2009-2014. Journal of the American Medical Association. 2017;318(13):1241-1249. https://doi.org/10.1001/jama.2017.13836 Establishes sepsis incidence and mortality in the United States at approximately 270,000 deaths annually, confirming sepsis as the leading cause of in-hospital death and one of the highest-mortality non-cancer conditions amenable to algorithmic early warning. Documents the rapid progression from early sepsis to septic shock that makes hours-earlier detection through blood value algorithms clinically meaningful.

11. Adams R, Henry KE, Sridharan A, et al. Prospective, Multi-Site Study of Patient Outcomes After Implementation of the TREWS Machine Learning-Based Early Warning System for Sepsis. Nature Medicine. 2022;28(7):1455-1460. https://doi.org/10.1038/s41591-022-01894-0 Prospective study of TREWS deployment across five Johns Hopkins hospitals demonstrating 18.7 percent relative mortality reduction compared to standard clinical alert systems among more than 6,000 patients. Documents the real-world clinical outcomes of a deployed non-cancer blood algorithm and establishes the template for prospective deployment study design applicable to the cardiovascular, diabetes, and kidney disease algorithms described in this chapter.

12. Powell LW, Seckington RC, Deugnier Y. Haemochromatosis. Lancet. 2016;388(10045):706-716. https://doi.org/10.1016/S0140-6736(15)01315-X Comprehensive review of hemochromatosis epidemiology, blood signature, and treatment outcomes. Documents the five to ten year pre-diagnostic window during which transferrin saturation, ferritin, liver enzymes, and glucose rise through routine metabolic panel values, and confirms that simple phlebotomy therapy initiated early prevents every organ complication, establishing hemochromatosis as the highest-value lowest-cost algorithmic detection opportunity in the beyond-cancer program.

13. Khera AV, Won HH, Peloso GM, et al. Diagnostic Yield and Clinical Utility of Sequencing Familial Hypercholesterolemia Genes in Patients With Severe Hypercholesterolemia. Journal of the American College of Cardiology. 2016;67(22):2578-2589. https://doi.org/10.1016/j.jacc.2016.03.520 Documents the distinctive lipid signature of familial hypercholesterolemia, the dramatic underdiagnosis of the condition despite its LDL elevation being visible on every lipid panel, and the near-complete cardiovascular disease prevention achievable with aggressive statin therapy started in the twenties or thirties. Establishes the case for population-level algorithmic flagging of persistent LDL elevation patterns as a high-priority public health intervention.

14. Tangri N, et al. Kidney International. 2022;102(6):1284-1292. https://doi.org/10.1016/j.kint.2022.08.026. Roche press release: Roche receives CE mark for Klinrisk on the navify platform, October 2025. Documents the full Klinrisk pathway from development through validation on 4.8 million US adults to CE-mark regulatory approval, establishing the commercial and regulatory template that the remaining non-cancer algorithms need to follow. The navify platform integration demonstrates that deployment into existing laboratory reporting workflows is technically achievable at scale without requiring new clinical infrastructure.

15. Adams R, Henry KE, Sridharan A, et al. Prospective, Multi-Site Study of Patient Outcomes After Implementation of the TREWS Machine Learning-Based Early Warning System for Sepsis. Nature Medicine. 2022;28(7):1455-1460. https://doi.org/10.1038/s41591-022-01894-0 The prospective TREWS deployment study, establishing that a non-cancer blood algorithm deployed in a real clinical setting produces a measurable, statistically significant reduction in patient mortality. The 18.7 percent relative mortality reduction in more than 6,000 patients represents the strongest real-world clinical outcome evidence for any non-cancer blood algorithm and validates the broader methodology of the beyond-cancer detection program.

16. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820. Heron M. Deaths: Leading Causes for 2020. National Vital Statistics Reports. 2021;70(9):1-113. Combined mortality data establishing the 393,000 annual cancer deaths from the thirteen target cancers and the 1.2 million-plus annual deaths from the major non-cancer conditions described in this chapter. The 400,000 to 675,000 combined preventable deaths estimate applies the same methodology used for the cancer estimates, multiplying annual deaths by the published efficacy of early intervention and assuming fifty percent population penetration as a conservative baseline.

Chapter 9

1. Adalsteinsson VA, Ha G, Freeman SS, et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nature Communications. 2017;8(1):1324. https://doi.org/10.1038/s41467-017-00965-y Establishes the scientific basis for circulating tumor DNA as a blood-based confirmation tool, demonstrating high concordance between ctDNA genetic signatures and primary tumor mutations. Foundational for the chapter’s plain-language explanation of liquid biopsy as a supplement to imaging confirmation across all thirteen cancer pathways.

2. Kinar Y, Kalkstein N, Akiva P, et al. Development and validation of a predictive model for detection of colorectal cancer by analysis of complete blood counts. Journal of the American Medical Informatics Association. 2016;23(5):879-890. https://doi.org/10.1093/jamia/ocv195. Goshen R, et al. Computer-Assisted Flagging of Individuals at High Risk of Colorectal Cancer Using the ColonFlag Test. JCO Clinical Cancer Informatics. 2018;2:1-8. https://doi.org/10.1200/CCI.17.00130 ColonFlag development and Geisinger deployment studies establishing colonoscopy as the primary confirmation and curative tool for algorithm-flagged colorectal cancer patients, with the eightfold improvement in cancer detection rate documented in the prospective clinical deployment.

4. Chernyak V, Fowler KJ, Kamaya A, et al. Liver Imaging Reporting and Data System (LI-RADS) Version 2018: Imaging of Hepatocellular Carcinoma in At-Risk Patients. Radiology. 2018;289(3):816-830. https://doi.org/10.1148/radiol.2018181494 Establishes LI-RADS classification for MRI-detected liver lesions, documenting that certain MRI findings are diagnostic for hepatocellular carcinoma without biopsy in patients with known liver disease. Provides the imaging framework for algorithm-flagged liver cancer patients at the five to eight millimeter detection threshold.

5. Cristescu R, Lee J, Nebozhyn M, et al. Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes. Nature Medicine. 2015;21(5):449-456. https://doi.org/10.1038/nm.3850 Documents the TP53 and CDH1 mutation profiles in gastric cancer that underpin liquid biopsy confirmation panels used for algorithm-flagged patients where endoscopy is medically contraindicated, establishing the molecular basis for blood-based supplementary confirmation.

6. Hewitt MJ, McPhail MJ, Possamai L, et al. EUS-guided FNA for diagnosis of solid pancreatic neoplasms: a meta-analysis. Gastrointestinal Endoscopy. 2012;75(2):319-331. https://doi.org/10.1016/j.gie.2011.08.049. Sah RP, Nagpal SJS, Mukhopadhyay D, Chari ST. New insights into pancreatic cancer-induced paraneoplastic diabetes. Nature Reviews Gastroenterology and Hepatology. 2013;10(7):423-433. https://doi.org/10.1038/nrgastro.2013.49 Hewitt et al. documents endoscopic ultrasound-guided fine needle aspiration sensitivity of 86.8 percent and specificity of 95.8 percent across 8,246 patients. Sah et al. documents the glucose disruption mechanism beginning two to three years before diagnosis, establishing the long detection window that makes the pancreatic cancer confirmation pathway actionable despite its technical challenges.

7. Kyle RA, Rajkumar SV. Monoclonal gammopathy of undetermined significance. New England Journal of Medicine. 2006;354(13):1362-1369. https://doi.org/10.1056/NEJMra062583 Landmark NEJM study of MGUS as myeloma precursor in 1,384 patients over 11,000 person-years, establishing serum protein electrophoresis as the appropriate first confirmation test for algorithm-flagged patients with the myeloma-consistent blood pattern and documenting the pre-malignant continuum that makes early detection clinically actionable.

8. Hallek M, Cheson BD, Catovsky D, et al. iwCLL guidelines for diagnosis, indications for treatment, response assessment, and supportive management of CLL. Blood. 2018;131(25):2745-2760. https://doi.org/10.1182/blood-2017-09-806398 International Working Group CLL guidelines establishing peripheral blood flow cytometry as the primary diagnostic instrument, defining the non-invasive pathway from algorithm flag to blood test to diagnosis that makes leukemia the simplest confirmation pathway among the thirteen cancers.

9. Roschewski M, Dunleavy K, Pittaluga S, et al. Circulating tumour DNA and CT monitoring in patients with untreated diffuse large B-cell lymphoma. Lancet Oncology. 2015;16(5):541-549. https://doi.org/10.1016/S1470-2045(15)70106-3 Demonstrates that ctDNA concentration correlates with lymphoma tumor burden and predicts relapse before imaging changes appear, establishing liquid biopsy as a supplementary blood confirmation layer for algorithm-flagged lymphoma patients awaiting definitive nodal biopsy.

10. Thomassin-Naggara I, Poncelet E, Jalaguier-Coudray A, et al. Ovarian-Adnexal Reporting Data System MRI (O-RADS MRI) Score for Risk Stratification of Sonographically Indeterminate Adnexal Masses. JAMA Network Open. 2020;3(1):e1919896. https://doi.org/10.1001/jamanetworkopen.2019.19896 Establishes O-RADS MRI scoring for ovarian mass risk stratification, documenting detection of lesions as small as five to eight millimeters and outperformance of transvaginal ultrasound for complex masses. Provides the imaging framework while confirming why laparoscopic tissue access remains the necessary but procedurally significant confirmation step.

11. Silverman SG, Pedrosa I, Ellis JH, et al. Bosniak Classification of Cystic Renal Masses, Version 2019. Radiology. 2019;292(2):475-488. https://doi.org/10.1148/radiol.2019182646 2019 revision of the Bosniak Classification for cystic renal masses on MRI, providing risk stratification guidance for algorithm-flagged patients where targeted renal MRI identifies an indeterminate lesion. Documents that renal MRI outperforms CT for lesions below one centimeter, the detection threshold relevant to the early detection window.

12. Panebianco V, Narumi Y, Altun E, et al. Multiparametric MRI for Bladder Cancer: Development of VI-RADS. European Urology. 2018;74(3):294-306. https://doi.org/10.1016/j.eururo.2018.04.029 Establishes VI-RADS scoring for bladder cancer on MRI, providing the framework for staging and muscle invasion assessment in algorithm-flagged patients and determining whether transurethral resection or cystoscopy-only is the appropriate approach.

13. Enzinger PC, Mayer RJ. Esophageal cancer. New England Journal of Medicine. 2003;349(23):2241-2252. https://doi.org/10.1056/NEJMra035010 Establishes the molecular profile of esophageal cancer including the TP53 mutation prevalence exceeding seventy percent, providing the basis for liquid biopsy confirmation in algorithm-flagged esophageal cancer patients, and documents the favorable outcomes of early endoscopic resection compared to late-stage surgical intervention.

14. Tessler FN, Middleton WD, Grant EG, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS). Journal of the American College of Radiology. 2017;14(5):587-595. https://doi.org/10.1016/j.jacr.2017.01.046 Establishes ACR TI-RADS scoring for thyroid nodule risk stratification on neck ultrasound, providing the framework used to triage algorithm-flagged thyroid cancer patients from ultrasound finding to fine needle aspiration biopsy decision, substantially reducing unnecessary thyroidectomy through standardized risk classification.

15. Prenuvo. Polaris Study Results. Presented at: American Association for Cancer Research Annual Meeting 2025; Chicago, IL. May 2025. Reported in: Prenuvo press release, May 30, 2025. https://hitconsultant.net/2025/05/30/prenuvo-full-body-mri-detects-cancers-missed-by-standard-screenings The Prenuvo Polaris study of 1,011 asymptomatic individuals, finding biopsy-confirmed cancers in 2.2 percent of participants with a 99.8 percent negative predictive value over one year. Provides the quantitative basis for the chapter’s claim that a negative scan in an algorithm-flagged patient carries genuine reassurance while structured surveillance continues.

Chapter 10

1. Goshen R, Mizrahi L, Akiva P, et al. Computer-Assisted Flagging of Individuals at High Risk of Colorectal Cancer Using the ColonFlag Test. JCO Clinical Cancer Informatics. 2018;2:1-8. https://doi.org/10.1200/CCI.17.00130. Hornbrook MC, Goshen R, Choman E, et al. Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data. Digestive Diseases and Sciences. 2017;62(10):2719-2727. https://doi.org/10.1007/s10620-017-4722-8 The Maccabi and Geisinger deployment studies that together constitute the five-step template described in this chapter. Goshen et al. documents the prospective Maccabi deployment finding 19 colorectal cancers among 254 colonoscopies. Hornbrook et al. documents the eightfold improvement in cancer detection at Geisinger. Together they establish the replicable template the consortium model is designed to apply to the remaining algorithms.

2. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820. Kaiser Permanente. About Kaiser Permanente. https://about.kaiserpermanente.org Cancer Statistics 2024 establishes the mortality scale against which consortium deployment must be measured. Kaiser Permanente membership and data infrastructure figures from institutional reporting, establishing the data scale available to consortium health systems for algorithm training and validation.

3. Kinar Y, Kalkstein N, Akiva P, et al. Development and Validation of a Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts. Journal of the American Medical Informatics Association. 2016;23(5):879-890. https://doi.org/10.1093/jamia/ocv195. Gould MK, Huang BZ, Tammemagi MC, et al. Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. https://doi.org/10.1164/rccm.202007-2791OC ColonFlag and LungFlag as the reference implementations for the prospective validation study design the consortium model proposes to apply to the remaining algorithms. The two to three year timeline from consortium formation to broad deployment is derived from the actual development timelines of these two algorithms, compressed by the parallel multi-institution structure.

4. U.S. Food and Drug Administration. Software as a Medical Device (SaMD). https://www.fda.gov/medical-devices/digital-health-center-excellence/software-medical-device-samd. Tangri N, et al. Validation of the Klinrisk Machine Learning Model in the CANVAS Program and CREDENCE Trial. Kidney International. 2022;102(6):1284-1292. https://doi.org/10.1016/j.kint.2022.08.026 FDA regulatory framework for software as a medical device and the 510(k) clearance pathway applicable to cancer detection algorithms. Tangri et al. documents the Klinrisk kidney disease algorithm as the most advanced non-cancer algorithm in the commercial deployment pipeline, receiving CE-mark from Roche in October 2025, establishing the regulatory precedent for algorithms in this family.

5. Singh V, Chaganti S, Siebert M, et al. Deep Learning-Based Identification of Patients at Increased Risk of Cancer Using Routine Laboratory Markers. Scientific Reports. 2025;15:12661. https://doi.org/10.1038/s41598-025-97331-6 The Siemens Healthineers Deep Profiler study, which simultaneously detects colorectal, liver, and lung cancer from 33 standard CBC and CMP parameters in a single model, provides the basis for the chapter’s claim about the near-zero marginal computational cost of running trained algorithms on existing blood panel data.

6. Tangri N, et al. Kidney International. 2022;102(6):1284-1292. Roche. Roche receives CE mark for Klinrisk on the navify platform. Press release, October 2025. The Roche-Klinrisk commercial partnership, resulting in CE-mark approval for the navify platform in October 2025, documents the industry partnership model referenced in this chapter. Establishes that diagnostic companies with existing commercial infrastructure, regulatory expertise, and clinical laboratory relationships are natural consortium partners for the algorithmic early detection program.

7. Moyer VA, U.S. Preventive Services Task Force. Screening for Lung Cancer: U.S. Preventive Services Task Force Recommendation Statement. Annals of Internal Medicine. 2014;160(5):330-338. https://doi.org/10.7326/M13-2771 Documents the USPSTF recommendation process and its downstream effect on insurer coverage decisions, establishing the evidence standard required for professional society endorsement and the coverage policy changes that follow. Provides the regulatory and policy context for the chapter’s claim that consortium validation evidence is the prerequisite for systematic coverage of blood-based cancer detection algorithms.

8. Kinar Y, et al. Journal of the American Medical Informatics Association. 2016;23(5):879-890. Goshen R, et al. JCO Clinical Cancer Informatics. 2018;2:1-8. Gould MK, et al. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. The ColonFlag timeline from initial publication in 2016 to multi-country clinical deployment by 2018, and the LungFlag timeline from validation publication in 2021 to deployment readiness, provide the empirical basis for the three-year consortium timeline projected in this chapter. The parallel multi-institution structure of the consortium model is the mechanism by which sequential single-institution timelines are compressed.

Chapter 11

1. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820. Reck M, Rodríguez-Abreu D, Robinson AG, et al. Pembrolizumab versus Chemotherapy for PD-L1-Positive Non-Small-Cell Lung Cancer. New England Journal of Medicine. 2016;375(19):1823-1833. https://doi.org/10.1056/NEJMoa1606774. Gould MK, et al. Budget Impact Model for LungFlag. Journal of Clinical Oncology. 2024;42(16_suppl):10534. https://doi.org/10.1200/JCO.2024.42.16_suppl.10534 Cancer Statistics 2024 provides stage-specific lung cancer survival rates. KEYNOTE-024 establishes pembrolizumab pricing and survival benefit with Merck 2023 revenues exceeding $25 billion from Keytruda. Gould et al. budget impact model projects net savings of $2.87 million over five years per health system and 22 fewer deaths annually from LungFlag deployment.

2. Yabroff KR, Lund J, Kepka D, Mariotto A. Economic Burden of Cancer in the United States. Cancer Epidemiology, Biomarkers and Prevention. 2011;20(10):2006-2014. https://doi.org/10.1158/1055-9965.EPI-11-0650. Siegel RL, et al. Cancer Statistics, 2024. Yabroff et al. provides stage-specific cancer treatment cost estimates. Cancer Statistics 2024 provides stage-specific survival rates, establishing the cost and outcome differentials between Stage I and Stage IV colorectal cancer treatment cited in the table.

3. Mariotto AB, Yabroff KR, Shao Y, Feuer EJ, Brown ML. Projections of the Cost of Cancer Care in the United States: 2010-2020. Journal of the National Cancer Institute. 2011;103(2):117-128. https://doi.org/10.1093/jnci/djq495. Siegel RL, et al. Cancer Statistics, 2024. Mariotto et al. provides the cost framework for cancer treatment by stage. Cancer Statistics 2024 establishes pancreatic cancer stage-specific survival rates of fifty percent at localized versus three percent at distant stage.

4. Siegel RL, et al. Cancer Statistics, 2024. Tew WP. Ovarian Cancer in the Older Woman. Journal of Geriatric Oncology. 2016;7(5):354-361. https://doi.org/10.1016/j.jgo.2016.07.009 Sources for ovarian cancer stage-specific survival rates and treatment cost estimates used in the table.

5. United States Renal Data System. 2023 USRDS Annual Data Report: Epidemiology of Kidney Disease in the United States. National Institutes of Health, 2023. https://www.usrds.org/annual-data-report. Tangri N, et al. Kidney International. 2022;102(6):1284-1292. USRDS documents the $90,000 per patient per year dialysis cost and 130,000 annual new dialysis patients. Tangri et al. documents Klinrisk validation on 4.8 million US adults, establishing the algorithmic basis for earlier kidney disease detection and the delay-dialysis savings calculation.

6. American Diabetes Association. Economic Costs of Diabetes in the U.S. in 2022. Diabetes Care. 2023;46(7):1423-1437. https://doi.org/10.2337/dci23-0085. Knowler WC, et al. New England Journal of Medicine. 2002;346(6):393-403. https://doi.org/10.1056/NEJMoa012512 American Diabetes Association establishes $327 billion annual diabetes cost and $16,750 annual per-patient excess expenditure. Knowler et al. documents the $3,500 DPP intervention cost and fifty-eight percent prevention rate, producing the forty-to-one return on investment cited in this chapter.

7. Virani SS, et al. Heart Disease and Stroke Statistics, 2020 Update. Circulation. 2020;141(9):e139-e596. https://doi.org/10.1161/CIR.0000000000000757. Cholesterol Treatment Trialists’ Collaboration. Lancet. 2010;376(9753):1670-1681. https://doi.org/10.1016/S0140-6736(10)61350-5 Heart Disease and Stroke Statistics establishes $240 billion annual cardiovascular disease cost and acute heart attack hospitalization costs. CTT Collaboration documents twenty-five to thirty-five percent risk reduction from statins across 170,000 trial participants.

8. Torio CM, Moore BJ. National Inpatient Hospital Costs. HCUP Statistical Brief No. 204. AHRQ, 2016. Adams R, et al. Nature Medicine. 2022;28(7):1455-1460. https://doi.org/10.1038/s41591-022-01894-0 Torio and Moore establish sepsis as the most expensive condition in US hospitals at $62 billion annually. Adams et al. documents the TREWS eighteen percent mortality reduction, confirming that earlier sepsis recognition reduces both mortality and treatment costs.

9. Gould MK, et al. Journal of Clinical Oncology. 2024;42(16_suppl):10534. https://doi.org/10.1200/JCO.2024.42.16_suppl.10534 The LungFlag budget impact model, the only published per-health-system financial analysis in the algorithmic blood detection family, projecting net savings of $2.87 million over five years per health system from a single algorithm applied to blood tests already being drawn. Provides the specific quantitative basis for the chapter’s return-on-investment argument.

10. Baicker K, Chandra A. The Labor Market Effects of Rising Health Insurance Premiums. Journal of Labor Economics. 2006;24(3):609-634. https://doi.org/10.1086/505049 Establishes the economic framework for understanding why health insurers face systematic disincentives to invest in prevention programs whose benefits accrue to future insurers or to Medicare, the insurance turnover dynamic that systematically underinvests in prevention described in this chapter.

11. Mariotto AB, et al. Journal of the National Cancer Institute. 2011;103(2):117-128. American Diabetes Association. Diabetes Care. 2023;46(7):1423-1437. Virani SS, et al. Circulation. 2020;141(9):e139-e596. Torio CM, Moore BJ. HCUP Statistical Brief No. 204. 2016. Combined disease cost sources establishing the aggregate economic burden of cancer, diabetes, cardiovascular disease, and sepsis approaching $1 trillion per year, the denominator against which the cost of the algorithmic detection program is compared.

12. Knowler WC, et al. New England Journal of Medicine. 2002;346(6):393-403. Herman WH, et al. The 10-Year Cost-Effectiveness of Lifestyle Intervention or Metformin for Diabetes Prevention. Diabetes Care. 2012;35(4):723-730. https://doi.org/10.2337/dc11-1468 Knowler et al. documents the DPP cost and prevention rate. Herman et al. provides the ten-year cost-effectiveness analysis confirming net savings over a decade from lifestyle intervention relative to treating diabetes and its complications, establishing the forty-to-one return on investment cited in this chapter.

Chapter 12

1. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820. DeSantis CE, Miller KD, Goding Sauer A, et al. Cancer Statistics for African Americans, 2019. CA: A Cancer Journal for Clinicians. 2019;69(3):211-233. https://doi.org/10.3322/caac.21555 Cancer Statistics 2024 provides overall cancer mortality disparities by race. DeSantis et al. documents the specific cancer mortality rates for Black Americans across major cancer types, establishing the twenty percent higher cancer death rate for Black men and the approximately twofold higher cervical cancer mortality for Black women compared to white women cited in this chapter.

2. Islami F, Guerra CE, Minihan A, et al. American Cancer Society’s Report on the Status of Cancer Disparities in the United States, 2021. CA: A Cancer Journal for Clinicians. 2021;71(4):299-317. https://doi.org/10.3322/caac.21671 Comprehensive analysis of cancer disparities by income, race, and geography in the United States. Documents that cancer mortality rates in the most deprived counties are nearly twice those of the least deprived counties, establishing the income-based cancer mortality gap cited in this chapter and confirming that the gap is explained primarily by differential access to screening and treatment rather than tumor biology.

3. Henley SJ, Anderson RN, Thomas CC, Massetti GM, Peaker B, Richardson LC. Invasive Cancer Incidence, 2004-2013, and Deaths, 2006-2015, in Nonmetropolitan and Metropolitan Counties — United States. MMWR Surveillance Summaries. 2017;66(14):1-13. https://doi.org/10.15585/mmwr.ss6614a1 CDC surveillance data establishing that cancer death rates in the most rural counties were thirty-seven percent higher than in the most urban counties between 2012 and 2015, with the rural-urban gap growing over time. Documents the geographic dimension of cancer mortality inequity that the algorithmic blood test program is particularly positioned to address.

4. DeSantis CE, et al. Cancer Statistics for African Americans, 2019. CA: A Cancer Journal for Clinicians. 2019;69(3):211-233. https://doi.org/10.3322/caac.21555. Siegel RL, et al. Cancer Statistics, 2024. Documents the persistent disparity in cervical cancer mortality between Black and white women, approximately twofold, forty years after cervical cancer became largely preventable through Pap smear screening and HPV vaccination. Establishes the historical pattern of early detection tools that benefit advantaged populations first and reach disadvantaged populations last or least effectively.

5. American Lung Association. State of Lung Cancer 2023. https://www.lung.org/research/state-of-lung-cancer. Jemal A, Fedewa SA. Lung Cancer Screening With Low-Dose Computed Tomography in the United States—2010 to 2015. JAMA Oncology. 2017;3(9):1278-1281. https://doi.org/10.1001/jamaoncology.2016.6416 American Lung Association documents five to six percent LDCT screening utilization among eligible Americans more than a decade after validation. Jemal and Fedewa document that screening utilization is lowest among populations with the highest smoking rates and lowest healthcare access, confirming the systematic inequity in lung cancer screening uptake cited in this chapter.

6. Hornbrook MC, Goshen R, Choman E, et al. Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data. Digestive Diseases and Sciences. 2017;62(10):2719-2727. https://doi.org/10.1007/s10620-017-4722-8. Geisinger Health System. About Geisinger. https://www.geisinger.org/about-geisinger Hornbrook et al. documents the ColonFlag deployment at Geisinger achieving eightfold improvement in cancer detection among flagged patients completing colonoscopy. Geisinger demographic data establishes that the health system serves a predominantly rural, lower-income population, making it the most direct evidence that algorithmic blood test detection can reach underserved populations through existing blood draw infrastructure.

7. Goshen R, Mizrahi L, Akiva P, et al. Computer-Assisted Flagging of Individuals at High Risk of Colorectal Cancer Using the ColonFlag Test. JCO Clinical Cancer Informatics. 2018;2:1-8. https://doi.org/10.1200/CCI.17.00130 The Maccabi deployment study documents 688 patients flagged as high-risk of whom 254 completed colonoscopy, a thirty-seven percent follow-through rate in the Israeli context. The chapter’s fifteen percent estimate for the Geisinger deployment reflects the 104 of 706 flagged patients who completed colonoscopy, establishing the follow-through gap that patient navigation is designed to close.

8. Wells KJ, Battaglia TA, Dudley DJ, et al. Patient Navigation: State of the Art or Is It Science? Cancer. 2008;113(8):1999-2010. https://doi.org/10.1002/cncr.23815. Percac-Lima S, Grant RW, Green AR, et al. A Culturally Tailored Navigator Program for Colorectal Cancer Screening in a Community Health Center. Journal of General Internal Medicine. 2009;24(2):211-217. https://doi.org/10.1007/s11606-008-0864-x Wells et al. provides a systematic review of patient navigation programs establishing their effectiveness in improving screening completion and follow-up rates. Percac-Lima et al. documents thirty to fifty percent improvement in colonoscopy completion rates in low-income and minority populations through navigation programs, providing the evidence base for the chapter’s argument that navigation must be built into the consortium’s clinical infrastructure from day one.

9. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations. Science. 2019;366(6464):447-453. https://doi.org/10.1126/science.aax2342 Landmark Science paper documenting racial bias in a widely deployed healthcare algorithm, demonstrating that algorithms trained on biased data can systematically underestimate the needs of Black patients. Establishes the technical and ethical requirement for demographic-stratified performance reporting in the consortium’s validation studies and the necessity of diverse training populations as a prerequisite for equitable algorithm deployment.

10. Senator Hubert Humphrey, testimony before the Senate Committee on Labor and Public Welfare, March 1971. Cited in: Rettig RA. Cancer Crusade: The Story of the National Cancer Act of 1971. Princeton: Princeton University Press, 1977. Humphrey’s 1971 Senate testimony establishing that the cancer mortality gap does not require new scientific breakthroughs to close, only more equitable distribution of existing knowledge and tools. The passage is used in this chapter to connect the historical equity argument of the National Cancer Act debate to the contemporary algorithmic early detection program, establishing that the equity imperative in cancer medicine is not a new insight but a persistent failure to act on a known truth.

11. Islami F, et al. CA: A Cancer Journal for Clinicians. 2021;71(4):299-317. Siegel RL, et al. Cancer Statistics, 2024. Combined sources establishing the scale of the cancer mortality gap across underserved populations and the pattern of differential screening uptake that drives it. Provides the quantitative basis for the chapter’s argument that the patients who will benefit most from the algorithmic program are precisely the patients least likely to be reached by conventional screening programs, and that deliberate equity-centered design is required to change that pattern.

Chapter 13

1. Kinar Y, Kalkstein N, Akiva P, et al. Development and Validation of a Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts. Journal of the American Medical Informatics Association. 2016;23(5):879-890. https://doi.org/10.1093/jamia/ocv195. Hornbrook MC, Goshen R, Choman E, et al. Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data. Digestive Diseases and Sciences. 2017;62(10):2719-2727. https://doi.org/10.1007/s10620-017-4722-8. Gould MK, Huang BZ, Tammemagi MC, et al. Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine. 2021;204(4):445-453. https://doi.org/10.1164/rccm.202007-2791OC. Adams R, Henry KE, Sridharan A, et al. Prospective Multi-Site Study of Patient Outcomes After Implementation of the TREWS Machine Learning-Based Early Warning System for Sepsis. Nature Medicine. 2022;28(7):1455-1460. https://doi.org/10.1038/s41591-022-01894-0 The foundational deployment and validation studies establishing the current state of the algorithmic early detection program: ColonFlag development and Geisinger deployment (eightfold improvement in detection), LungFlag validation (AUC 0.856, 40% sensitivity at 9-12 months pre-diagnosis), and TREWS sepsis deployment (18.7% mortality reduction across five hospitals). Together these establish that the program is not a research proposal but a clinical reality already producing patient benefit.

2. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820. Heron M. Deaths: Leading Causes for 2020. National Vital Statistics Reports. 2021;70(9):1-113. https://www.cdc.gov/nchs/data/nvsr/nvsr70/nvsr70-09-508.pdf Cancer Statistics 2024 establishes the 393,000 annual deaths from the thirteen target cancers. Heron establishes the combined mortality from the major non-cancer diseases in the program. Together these establish the basis for the 400,000 to 675,000 preventable deaths estimate, representing 13 to 22 percent of all annual US deaths.

3. Association of American Medical Colleges. The Complexities of Physician Supply and Demand: Projections From 2021 to 2036. Washington, DC: AAMC, 2023. https://www.aamc.org/media/75236/download. Bureau of Labor Statistics. Occupational Outlook Handbook: Registered Nurses. US Department of Labor, 2023. https://www.bls.gov/ooh/healthcare/registered-nurses.htm AAMC projects a shortage of up to 86,000 physicians by 2036, driven by aging physicians, increased patient demand, and training pipeline constraints. The Bureau of Labor Statistics projects demand for more than 190,000 additional registered nurses per year through 2031. Together these establish the healthcare workforce capacity crisis that makes algorithmic early detection a workforce intervention as well as a patient outcome intervention.

4. Siegel RL, Giaquinto AN, Jemal A. Cancer Statistics, 2024. CA: A Cancer Journal for Clinicians. 2024;74(1):12-49. https://doi.org/10.3322/caac.21820. Bluethmann SM, Mariotto AB, Rowland JH. Anticipating the Silver Tsunami: Prevalence Trajectories and Comorbidity Burden Among Older Cancer Survivors in the United States. Cancer Epidemiology, Biomarkers and Prevention. 2016;25(7):1029-1036. https://doi.org/10.1158/1055-9965.EPI-16-0133 Cancer Statistics 2024 documents current cancer incidence and mortality. Bluethmann et al. projects the thirty percent increase in annual cancer diagnoses by 2040 driven by the aging of the baby boom generation, establishing the convergence of rising cancer burden and shrinking healthcare workforce capacity that makes algorithmic early detection an urgent workforce capacity issue.

5. 21st Century Cures Act, Pub. L. No. 114-255, § 3060, 130 Stat. 1033 (2016). US Food and Drug Administration. Clinical Decision Support Software: Guidance for Industry and Food and Drug Administration Staff. Washington, DC: FDA, 2022, updated 2026. https://www.fda.gov/media/133650/download The 21st Century Cures Act establishes the four-criterion test for clinical decision support software exempt from FDA device regulation. FDA’s 2026 updated guidance documents the key interpretive standard: the central question is whether the clinician can independently evaluate the basis for the recommendation, not whether the tool uses artificial intelligence. Risk stratification tools that generate scores for physician review, with transparent underlying data, are addressed directly in the guidance.

6. US Food and Drug Administration. Clinical Decision Support Software: Guidance for Industry and Food and Drug Administration Staff. Washington, DC: FDA, 2022, updated 2026. https://www.fda.gov/media/133650/download. D’Agostino RB Sr, Vasan RS, Pencina MJ, et al. General Cardiovascular Risk Profile for Use in Primary Care. Circulation. 2008;117(6):743-753. https://doi.org/10.1161/CIRCULATIONAHA.107.699579 FDA guidance documents that software providing a risk probability or risk score for a disease can fall within enforcement discretion policies for software performing calculations routinely used in clinical practice. D’Agostino et al. establishes the Framingham Risk Score as the standard model for physician-directed cardiovascular risk stratification, the analogue to blood-based cancer risk scores, which is used in clinical practice without FDA device clearance because the physician, not the calculator, makes the treatment decision.

7. Louis PC. Recherches sur les effets de la saignée dans quelques maladies inflammatoires. Paris: de Mignaret, 1835. Warner JH. Therapeutic Explanation and the Edinburgh Bloodletting Controversy. Medical History. 1980;24(3):241-258. https://doi.org/10.1017/S0025727300040539. Aberle DR, Adams AM, Berg CD, et al. Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening. New England Journal of Medicine. 2011;365(5):395-409. https://doi.org/10.1056/NEJMoa1102873. American Lung Association. State of Lung Cancer 2023. https://www.lung.org/research/state-of-lung-cancer Louis’s 1828 statistical findings against bloodletting and the fifty-year abandonment lag documented by Warner establish the historical pattern of knowing and not acting. The NLST lung cancer screening validation (2011) and the American Lung Association’s documentation of fewer than six percent screening utilization more than a decade later establish that the same pattern is repeating with current early detection tools, and that deliberate institutional action is the only mechanism that compresses the timeline.