Real-World Evidence Sources for Drug Safety: Registries and Claims Data Explained

Safety Signal Detection Calculator

Input Parameters

Enter values to calculate the minimum patient population needed to detect adverse events

Adverse Event Rate

Example: 1 in 10,000 means an event rate of 1/10,000

Confidence Level

Statistical Power

How It Works

Based on FDA guidance (2022), registry data requires 500,000 patients to detect a signal compared to 1 million for claims data when data completeness is 87% vs 52%.

Key Insights from Article:

Registries have 87% completeness for lab results vs 52% for claims data
Claims data requires 2x the population size to detect the same signal
Combining both reduces false positives by 40%

Registry Data requires -- patients

Claims Data requires -- patients

Ratio: Claims data requires --x more patients than registries

When a new drug hits the market, the real test doesn’t start until after approval. Clinical trials involve thousands of patients over a few years. But once millions of people start taking it, rare side effects, long-term risks, and interactions with other conditions can show up-things trials simply can’t catch. That’s where real-world evidence comes in. Two of the most powerful tools for tracking drug safety outside of trials are patient registries and claims data. Together, they give regulators, doctors, and drugmakers a clearer picture of how medications behave in the real world-not just in controlled settings.

What Exactly Is Real-World Evidence?

Real-world evidence (RWE) isn’t lab data or controlled trial results. It’s what happens when drugs are used by real people in everyday life. This includes everything from how often someone takes their pill, to whether they end up in the hospital, to how their blood pressure changes over five years. The U.S. Food and Drug Administration (FDA) officially defined RWE in 2018 as clinical evidence derived from real-world data (RWD)-information collected during routine healthcare.

This isn’t new. The FDA has used hospital records and pharmacy logs to spot safety issues since the 1980s. But it wasn’t until the 21st Century Cures Act in 2016 that RWE became a formal part of drug regulation. Since then, the FDA has approved 12 drugs or new uses between 2017 and 2021 where RWE played a key role. Five of those approvals relied directly on claims data or registry data.

In Europe, the European Medicines Agency (EMA) launched Darwin EU in 2021-a network connecting health databases across 15 countries to monitor medicine safety in real time. Today, it covers over 120 million people. This shift means drug safety isn’t just about waiting for doctors to report bad reactions. It’s about proactively mining massive datasets to find hidden risks before they become public health crises.

How Patient Registries Work

Patient registries are structured databases that collect detailed, standardized information about people with specific diseases or those using certain drugs. Think of them as long-term medical diaries, but for groups of patients. There are two main types: disease registries and product registries.

Disease registries track people with conditions like cystic fibrosis, Parkinson’s, or cancer. They record everything: age, genetic mutations, lab results, imaging scans, treatments tried, and how patients feel over time. Product registries focus on people taking a specific drug-recording not just the medication, but also side effects, dosing changes, and outcomes.

One powerful example is the Cystic Fibrosis Foundation Patient Registry. It helped identify that ivacaftor, a drug for cystic fibrosis, worked better-and had fewer side effects-in patients with certain genetic mutations. That detail never showed up in the original trials because those patients were too rare to be included in large numbers.

Registries are rich in clinical detail. They often include lab values, imaging reports, and even patient-reported symptoms like fatigue or pain levels. According to ISPOR, registry data has 87% completeness for lab results, compared to just 52% in claims data. That’s why they’re so valuable for spotting subtle safety signals.

But they’re not perfect. Most registries are voluntary, so participation rates range from 60% to 80%. That means the data might not represent everyone-especially those who can’t access specialty clinics or don’t speak the local language. And they’re expensive. Setting up a national disease registry can cost $1.2 million to $2.5 million upfront, with $300,000 to $600,000 a year just to keep it running. About 35% of academic registries shut down within five years due to funding gaps.

What Claims Data Tells Us

Claims data is different. It’s not collected for research-it’s generated every time a doctor bills an insurance company. Every prescription filled, every hospital stay, every lab test ordered gets coded and sent to a payer. That’s the raw material for claims data.

It includes ICD-10 diagnosis codes, CPT procedure codes, and NDC codes for medications. Because it’s tied to billing, it’s nearly complete for inpatient visits (95-98% coverage) and covers millions of people. IBM MarketScan tracks 200 million lives. Optum has 100 million. Medicare alone covers over 60 million Americans, with records going back 15+ years.

This makes claims data perfect for spotting rare events. If a drug causes a heart attack in 1 out of 10,000 users, you need a huge pool to see it. In 2015, the FDA analyzed 1.2 million Medicare beneficiaries over five years to check if entacapone (used for Parkinson’s) raised heart risks. They found no clear link. In 2014, they used 850,000 records to review olmesartan for cardiovascular risks in diabetics.

Claims data also helped approve palbociclib (Ibrance) for new patient groups in 2019. The FDA looked at claims, electronic health records, and safety reports to confirm it was safe in older patients and those with other chronic conditions.

But here’s the catch: claims data is thin on clinical detail. It tells you someone was diagnosed with diabetes, but not their HbA1c level. It shows a prescription for a blood thinner, but not whether the patient actually took it. Lab values? Only 45-60% complete. Patient-reported outcomes? Almost never included.

And coding errors are common. The Agency for Healthcare Research and Quality (AHRQ) estimates 15-20% of diagnosis codes are wrong-either due to human error, billing incentives, or vague documentation. That leads to false alarms. One study found 22% of initial safety signals from claims data turned out to be false positives after clinical review.

A city built from billing codes with alert balloons and FDA monitoring shield.

Registries vs. Claims Data: The Trade-Offs

You can’t pick one over the other. Each has strengths the other lacks.

Registries give depth. They’re like a high-resolution MRI of a small group. They capture the nuances: how a drug affects someone with liver disease, or whether a side effect worsens over time. But they cover maybe 1,000 to 50,000 patients at most.

Claims data gives breadth. It’s a satellite view of millions. It can catch a rare kidney injury in 1 in 50,000 users. But it can’t tell you why that injury happened.

For rare adverse events, registries are more efficient. Because their data is cleaner and more complete, you need about 500,000 patients to detect a signal. With claims data, you need a million. That’s because of missing lab values, unclear diagnoses, and coding gaps.

The FDA’s 2022 guidance says you can’t just run a statistical model on claims data and call it evidence. You have to account for biases-like “immortal time bias,” where patients who survive the first month get counted as “safe,” even if they later have a reaction. Proper methods can reduce this bias by 35-50%.

The best approach? Combine them. In June 2023, the International Council for Harmonisation (ICH) released new guidelines recommending hybrid analysis. When you cross-check a signal from claims data with registry data, false positives drop by 40%. That’s why the FDA now routinely asks for both when reviewing post-market safety studies.

How Regulators Use This Data Today

The FDA’s Sentinel Initiative is the gold standard. Launched in 2008, it connects 11 major healthcare systems and 3 claims processors to monitor safety across 300 million patient records. It’s not passive. Sentinel runs automated alerts for unusual patterns-like a spike in liver failures after a new drug launch.

In January 2024, the FDA released draft guidance requiring registries to meet 80% data completeness for key variables like lab results and dosing. That’s a big step. It means registries can no longer be “nice to have”-they need to meet scientific standards.

The EMA’s Darwin EU network now pulls data from 32 databases across Europe. It’s designed to respond to safety concerns in weeks, not years. When a new diabetes drug showed possible pancreatitis signals in 2023, Darwin EU used claims and registry data from 10 countries to assess risk in under 12 weeks.

Pharmaceutical companies are investing more too. In 2017, only 3-5% of pharmacovigilance budgets went to RWE. By 2023, that jumped to 8-12%. Oncology leads the way-38% of RWE submissions use registries because cancer patients are tracked closely in specialized centers. Cardiovascular drugs use claims data most often (45% of submissions) because heart disease is so widespread and well-coded in billing systems.

A scientist connecting registry and claims data with glowing bridges to reduce false alarms.

What’s Next for Drug Safety Monitoring?

The future isn’t just registries and claims. It’s blending them with new data streams. Novartis piloted using wearable devices-like smartwatches tracking heart rate and activity-to monitor safety for Entresto, a heart failure drug. If a patient’s resting heart rate drops suddenly after starting the drug, that could signal a problem before they even feel symptoms.

AI is speeding things up. A 2024 study in JAMA Network Open showed AI algorithms reduced false safety signals by 28% by spotting patterns humans miss. The FDA’s REAL program, launched in 2023, aims to standardize registry data for 20 priority diseases by 2026-starting with rare diseases where traditional methods fail.

But challenges remain. Data privacy laws like HIPAA and GDPR make sharing hard. Standardizing codes across systems still takes up 40-60% of project time. And not every hospital or clinic has the tech or staff to contribute.

Still, the direction is clear: drug safety is moving from passive reporting to active surveillance. Registries and claims data aren’t just backup tools anymore. They’re the backbone of modern pharmacovigilance.

Why This Matters for Patients

You might think this is all about regulators and drug companies. But it’s not. Every time a new safety warning appears on a medication label-like “may increase risk of liver damage in elderly patients”-it likely came from one of these systems.

A patient on a new blood pressure drug might have a mild reaction. If they’re in a registry, that gets recorded. If they’re on Medicare, it shows up in claims. If enough similar cases appear, regulators investigate. That’s how dangerous drugs get pulled or labeled correctly.

It’s also why some drugs get approved faster for rare conditions. Without registries, companies couldn’t prove a drug works in 500 patients with a genetic disorder. Without claims data, they couldn’t prove it’s safe in 10 million people with high blood pressure.

The bottom line? Real-world evidence doesn’t replace clinical trials. It completes them. And for patients, that means safer, smarter, more personalized care.

What’s the difference between claims data and registry data?

Claims data comes from insurance billing records and includes diagnosis codes, procedures, and prescriptions. It covers millions of people but lacks clinical details like lab results or patient symptoms. Registry data is collected directly from patients or clinics and includes detailed medical information-like imaging, lab values, and patient-reported outcomes-but typically covers smaller groups, often under 50,000 people.

Can claims data detect rare side effects?

Yes, but only if the dataset is large enough. Claims data can detect side effects occurring in 1 in 10,000 patients, but it usually requires a population of at least 1 million to be statistically reliable. Registries can detect the same signal with half the population because their data is cleaner and more complete.

Why do regulators prefer combining registry and claims data?

Combining both reduces false positives by up to 40%. Claims data flags potential safety signals across large populations, while registries provide clinical context to confirm if the signal is real. For example, a spike in hospitalizations for kidney failure might be a coding error in claims data-registry data can show whether those patients actually had abnormal kidney function tests.

Are patient registries reliable?

They can be, but only if well-designed. High-quality registries have standardized data collection, regular audits, and high participation rates (above 70%). Many academic registries struggle with funding and shut down within five years. National registries like the SEER cancer registry or the Cystic Fibrosis Foundation registry are considered highly reliable due to consistent funding and strict protocols.

How is real-world evidence used in drug approvals today?

Since 2017, the FDA has approved 12 drugs or new uses where RWE played a key role. Five of those approvals relied on claims or registry data to confirm safety in broader populations or to support expanded use. For example, registry data helped approve pembrolizumab for additional cancer types in 2017, and claims data supported palbociclib’s use in older patients in 2019.

Comments

Jody Kennedy

This is the stuff that actually saves lives, not just lab reports in a vacuum. I’ve seen patients get flagged because a registry caught a weird reaction that no trial ever saw-like that one woman on the new diabetes med who started having seizures only after six months. No one would’ve known without her doctor updating her registry profile. Real-world data isn’t sexy, but it’s the unsung hero of pharmacovigilance.

And yeah, claims data? It’s messy, but it’s massive. When you’ve got 300 million records, even a 1-in-50,000 side effect starts looking like a pattern. We need both. Period.

December 26, 2025 at 08:13

christian ebongue

claims data is basically the internet’s version of ‘i swear i took my pill’ but with more coding errors. 🤦‍♂️

registries are like your weird aunt who remembers every detail of your last checkup. even if you forgot to tell her you started drinking again.

December 28, 2025 at 05:15

jesse chen

I’ve worked with both types of data-claims and registries-and honestly, they’re like peanut butter and jelly. One alone? Kinda dry. Together? Perfect.

Claims tell you what happened. Registries tell you why it happened. And when you cross-reference them? You stop chasing ghosts. I’ve seen false signals vanish the moment you pull in the actual lab values from a registry. It’s like turning on a light in a dark room.

Also, the FDA’s new 80% completeness rule? Long overdue. Too many registries are just glorified spreadsheets with half the fields blank. If you’re gonna call it science, at least collect the data right.

December 30, 2025 at 03:01

Joanne Smith

Let’s be real-claims data is the equivalent of reading a novel where every character’s name is spelled wrong, half the chapters are missing, and the ending was written by someone who fell asleep at the keyboard.

Registries? At least they know what a comma is. And they remember that Mrs. Rodriguez’s HbA1c was 8.7 last month, not ‘probably diabetic’ because the coder clicked the wrong box.

Also, I love that Novartis is using smartwatches now. Next stop: implantable sensors that text the FDA when your liver starts screaming. I’m here for it.

But let’s not pretend this is easy. Data privacy? A nightmare. Interoperability? Still a joke. Still… better than waiting for a doctor to file a paper form in 2024.

December 31, 2025 at 13:44

Prasanthi Kontemukkala

As someone who works with rural health clinics in India, I see how hard it is to get data into registries-people don’t have access, language barriers, no internet. But claims data? Even in the smallest clinics, they bill. So it’s the only thing we have.

That’s why combining both matters so much. We can’t ignore the gaps. Maybe we need community health workers helping patients update their own registry entries via simple apps. Not everyone has a specialist, but everyone has a phone.

This isn’t just about regulation-it’s about equity. If we only listen to the data from big hospitals, we miss the people who need help the most.

January 2, 2026 at 03:15

Alex Ragen

Ah, yes-the sacred marriage of bureaucratic inertia and algorithmic hubris.

Claims data, a monument to the American healthcare system’s obsession with billing over biology.

Registries, the last bastion of clinical dignity in an age of data nihilism.

And yet, we pretend that a statistical model trained on misclassified ICD-10 codes can somehow reveal the hidden truths of human physiology?

How quaint.

Perhaps we should ask: who benefits from this illusion of precision? The patient? Or the regulatory apparatus that demands ‘evidence’ while ignoring the ontological fragility of the human body?

...I’ll be in the corner, sipping my artisanal matcha and contemplating the void.

January 3, 2026 at 18:18

Lori Anne Franklin

OMG I didn’t realize registries could cost over a million to start?! That’s wild.

And claims data having 20% wrong codes? No wonder we get so many false alarms.

But I love that they’re using wearables now-my grandma’s watch tracks her heart rate and she doesn’t even know it! Maybe one day, the system will just know when something’s off before we even get to the doctor.

Also, ‘immortal time bias’? That sounds like a fantasy novel title. But yeah, that’s so real. People forget that stats can lie if you don’t know how to ask the right questions.

January 5, 2026 at 05:58

Bryan Woods

The integration of real-world evidence into regulatory decision-making represents a paradigm shift in pharmacovigilance. The convergence of structured registry data with large-scale claims datasets enables a more robust, scalable, and clinically nuanced assessment of post-marketing safety profiles. This approach mitigates the limitations inherent in each data source individually and enhances the fidelity of signal detection.

Furthermore, the implementation of standardized data collection protocols, as advocated by the FDA and ICH, ensures methodological rigor and reproducibility across diverse healthcare ecosystems. This is not merely an operational improvement-it is an epistemological advancement in the science of drug safety.

January 6, 2026 at 16:55

Ryan Cheng

Just wanted to say thanks for writing this. It’s rare to see a breakdown that actually makes sense without jargon overload.

I work with older patients on blood thinners, and I’ve seen how claims data flags ‘possible bleeding events’-but without the registry’s lab values or patient notes, we’re just guessing. One time, a guy got flagged for a GI bleed, but his registry had him on a new probiotic that changed his stool color. Totally harmless. Without the registry? He’d have been sent to the ER for nothing.

And yeah, funding registries is tough. But if we don’t invest in them, we’re just betting lives on error-prone billing codes. That’s not science. That’s roulette.

Let’s keep pushing for better data-not just more of it.

January 7, 2026 at 11:30