Designing a Clinical Performance Study IVD: Bias Control, Endpoints, and QUADAS-2 Methodology Introduction

6 min read

Clinical performance studies are central to demonstrating that an In Vitro Diagnostic (IVD) Medical Device deliver’s clinically meaningful results in its intended context of use. Under the EU In Vitro Diagnostic Regulation (IVDR), performance claims must be supported not only by analytical robustness but by credible clinical evidence that reflects real-world decision-making. As a result, study design has become one of the most scrutinized elements of IVDR Performance Evaluation.

Notified Bodies increasingly assess whether clinical performance data is trustworthy, applicable, and aligned with intended purpose. Favourable results alone are not sufficient if study design introduces bias, uses inappropriate reference standards, or applies endpoints that lack clinical relevance. Designing studies with these regulatory expectations in mind is therefore essential to building a defensible Performance Evaluation Report (PER) under IVDR.

The Role of Clinical Performance Studies in IVDR

Within the IVDR framework, performance evaluation rests on three pillars: scientific validity, analytical performance, and clinical performance. Clinical performance studies provide the evidence that analytical accuracy translates into meaningful clinical correlation. They demonstrate how test results relate to a clinical condition or outcome in the intended population and use environment. Additionally, well-designed clinical performance studies help demonstrate real-world clinical value, strengthening confidence in performance claims among regulators and healthcare stakeholders.

Guidance such as MDCG 2022-2 reinforces that clinical evidence must be appropriate to intended purpose and risk class. Clinical performance studies are therefore a primary mechanism for justifying evidence sufficiency under IVDR Performance Evaluation, particularly for higher-risk or clinically impactful IVDs.

Intended Purpose as the Foundation of Study Design

A defensible clinical performance study begins with a precise definition of intended purpose. Intended purpose determines the clinical pathway, target population, comparator methods, and acceptable performance trade-offs. A screening test, for example, carries different evidentiary expectations than a confirmatory diagnostic test, even when measuring the same analyte.

Misalignment between intended purpose and study design is a frequent source of Notified Body questions. Clearly anchoring study populations, endpoints, and reference standards to intended use strengthens the credibility and applicability of clinical performance evidence.

Bias Control: A Core Credibility Requirement

Bias can arise unintentionally through study design choices and can significantly distort apparent performance. Common sources include patient selection bias, spectrum bias, verification bias, and observer bias. Under IVDR scrutiny, unaddressed bias often undermines confidence in otherwise strong datasets.

Manufacturers are expected to identify potential bias sources and explain how they are mitigated through inclusion criteria, blinding procedures, consistent application of reference standards, and transparent reporting. Proactive bias control is a key determinant of whether clinical performance evidence is considered reliable within a PER.

QUADAS-2 as a Review Lens

Although not mandated by IVDR, QUADAS-2 is widely used to assess the quality of diagnostic accuracy studies. Developed by the University of Bristol, it evaluates risk of bias and applicability across four domains: patient selection, flow and timing, index test and reference standard.

An authoritative overview is available at QUADAS-2 (University of Bristol). Designing studies that would perform well under a QUADAS-2-style assessment helps align evidence with how reviewers evaluate credibility, even when QUADAS-2 is not explicitly cited.

Additionally, applying QUADAS-2 principles during study design can help ensure representative patient selection, appropriate reference standards, and real-world test conditions, thereby strengthening evidence credibility, reducing bias risk, and supporting more robust and defensible Performance Evaluation under the IVDR.

Endpoint Selection and Clinical Relevance

Endpoints define how performance is measured and interpreted. Under IVDR, endpoints must reflect clinically meaningful performance rather than statistical convenience. Sensitivity, specificity, and predictive values should be selected based on how results influence clinical decisions within the intended pathway.

For example, predictive values are particularly important in low-prevalence settings, while sensitivity may be prioritized where missed diagnoses carry significant risk. Clearly justifying endpoint selection strengthens the defensibility of IVDR Performance Evaluation by demonstrating alignment between performance metrics and clinical utility.

Reference Standards and Applicability

The reference standard establishes the benchmark against which the index test is evaluated. Weak or inconsistently applied reference standards are a common source of regulatory concern. IVDR requires manufacturers to justify their selection of the reference standard and to acknowledge its limitations transparently, especially when no single gold standard exists.

Applicability is equally important. Study results must reasonably reflect real-world use, including patient demographics, disease prevalence, specimen types, and operational conditions. A defensible study explains how these factors align with intended use and where limitations may affect generalizability.

Integration into the Performance Evaluation Report

Clinical performance studies gain regulatory value only when integrated coherently into the IVDR PER. This integration requires clear mapping between claims, endpoints, evidence, and conclusions, along with balanced discussion of limitations and uncertainty.

Such coherence aligns with broader clinical and performance evaluation practices, where evidence is managed as a unified system rather than as isolated datasets.

Managing Residual Uncertainty Through Post-Market Evidence

No pre-market study can eliminate all uncertainty. IVDR addresses this through structured Post-Market Surveillance (PMS) and Post-Market Performance Follow-up (PMPF). Clinical performance study design should anticipate how post-market data will confirm or refine performance assumptions over time.

Lifecycle integration of pre-market and post-market evidence strengthens the credibility of performance conclusions and supports ongoing updates to the PER under IVDR.

Conclusion

Designing clinical performance studies under the IVDR requires more than technical execution; it requires regulatory awareness, clinical insight, and methodological discipline. Bias control, endpoint selection, justification of the reference standard, and applicability assessment directly determine whether clinical performance evidence is trusted.

When these elements are addressed systematically, clinical performance studies strengthen IVDR Performance Evaluation and contribute meaningfully to a defensible PER. Integrated with analytical performance, scientific validity, and PMS, they form part of a resilient, lifecycle-based evidence system that supports both regulatory compliance and confidence in diagnostic decision-making.

How Freyr Supports Clinical Performance Study Design Under IVDR

Designing IVDR-compliant clinical performance studies requires careful alignment between intended purpose, methodological rigor, and regulatory expectations. Freyr supports IVD manufacturers in planning and executing clinical performance studies that generate credible, applicable evidence and withstand Notified Body scrutiny under the EU IVDR.

Freyr’s experts assist with study design strategy, bias identification and mitigation, endpoint and reference standard selection, QUADAS-2–informed quality assessment, and integration of study results into the IVDR Performance Evaluation Report. For support with clinical performance study design, evidence strategy, or IVDR performance evaluation readiness, speak to a Freyr expert to discuss your regulatory challenges.

About the Author

Dr. Radhika Ramachandran leads the Global Regulatory Medical Writing Center of Excellence (CoE) at Freyr Inc., delivering regulatory documentation and strategies across global regulatory frameworks for medical devices and in vitro diagnostics (IVDs). With over a decade of experience in MedTech, clinical research, and regulatory strategy, she specializes in developing and reviewing high-impact regulatory documents aligned with global standards, including EU MDR and EU IVDR. She provides strategic consulting and tailored regulatory writing solutions to MedTech companies, supporting regulatory submissions and lifecycle documentation. Dr. Radhika holds a Ph.D. in Biotechnology and is a Certified Medical Writer, with contributions to over 1,500+ regulatory documents. Her current focus includes leveraging artificial intelligence and digital health to transform regulatory medical writing.

Frequently Asked Questions (FAQs)

What is a clinical performance study for an IVD under the EU IVDR, and why is it scrutinised?

Under the IVDR, clinical performance evidence demonstrates how an IVD’s results relate to a clinical condition in the intended population and setting, alongside scientific validity and analytical performance. Reviewers judge credibility and applicability, not just good numbers, so the study should reflect real clinical workflows, decision points, and users, and clearly link endpoints and methods to each claim.

Which biases most commonly distort diagnostic accuracy results, and how do you control them?

Selection or spectrum bias occurs when enrolled patients don’t reflect the real-world case mix; verification bias appears when only some participants receive the reference standard; observer bias arises when interpretation is unblinded. Control them with consecutive or randomised sampling, pre-specified criteria, consistent reference testing (or transparent adjustment), blinding, and a clear flow diagram documenting exclusions and timing.

How do you choose endpoints (sensitivity, specificity, predictive values) that are clinically meaningful?

Start from intended use: screening often prioritizes sensitivity to reduce missed cases, while confirmatory testing may prioritize specificity to avoid false positives. In low-prevalence settings, predictive values can matter more to clinicians than sensitivity alone. Predefine acceptance criteria, justify trade-offs using clinical consequences, and report uncertainty (for example, confidence intervals), not just point estimates.

How can QUADAS-2 strengthen study design before you run the study?

Use QUADAS-2 as a pre-study stress test: check that patient selection is representative, the index test is conducted and interpreted as in practice (ideally blinded), the reference standard is appropriate and applied consistently, and flow/timing avoids differential work-up. Addressing these domains early lowers risk of bias and makes evidence more transferable and defensible in your PER.