Peer Reviewed: Medical Device Validation

## ABSTRACT

The valid rational in developing statistical sampling for design verification and validation of a medical device product performance is to demonstrate the probability of conformance to specification of the device performance. AQL sampling plans are not suitable for testing in the verification and validation phases. Therefore, here, a non-parametric binomial distribution model and a NTI model are used to determine the sample size needed in order to demonstrate a specified the PCS at a given confidence level for a characteristic with attribute data and variable data, respectively. A practical step by step process on selecting and applying statistical sampling plans and acceptance criteria for the verification and validation is also presented and then applied to some cases related to medical devices products and processes.

## INTRODUCTION

Food and Drugs Administration (FDA) requires, via Sec. 820.30 of Title 21 of Code of Federal Regulations (CRF), medical device manufacturers that want to market certain categories of medical in the USA to establish and maintain procedures to control the design of the device (U.S. FDA, 2014). In essence, design controls are simple and logical steps to ensure that what is developed is what is meant to be developed, and that the final product meets customer’s needs and expectations. When a device product reaches at the stage where its hardware or software prototype is either fully functional, the FDA 21 CFR 820.30 Design Control requires medical device manufacturers to perform design verification and design validation processes. These are to confirm that the device design via examination and objective evidence, and to ensure that the design and development critical specifications or outputs for the proper function of the device have met the design and development input requirements and are capable of meeting the requirements for the specified application or intended use, and safety requirements (U.S. FDA, 2011). In executing design verification and validation (V&V) Sec. 820.50 of Title 21 of CRF required manufacturers establish and maintain procedures for identifying *valid statistical techniques* required for the acceptability of process capability and product characteristics. Sampling plans shall be written and based on a valid statistical rationale.

The paper will provide a direction for determining validation and design verification sampling plans and tables that may be used for attributes and variables data. The sampling plans provided must be able to demonstrate that specified reliability or probability of conformance to specification (PCS) levels are met with the desired level of confidence.

## STATISTICAL SAMPLING PLANS

The V&V assumes that its requirements have not been met unless testing demonstrates they are so. The available plans for use in manufacturing or routine inspection are Acceptable Quality Limit (AQL) sampling plans. AQL sampling plan is a statistical method used to test the quality level that would (e.g. 95% of the time) be accepted by estimating a characteristic of the product population through a sample. The rationale behind the AQL sampling plan is that the lot is assumed to be good right from the beginning until proven bad, biased towards the manufacturer’s risk:

*H*_{0}: probability [non-conformance] ≤ Assigned AQL

*H*_{1}: probability [non-conformance] > Assigned AQL

Conformance or non-conformance of the product characteristic is generally defined as the number of passes or fails that occurred in a sample size divided by the sample size, respectively. The manufacturer will “accept” a lot if *H*_{0} is not rejected. Fail to reject shows that there is no statistically significant evidence that the lot which is assumed good is good. Without more information, we usually accept the lot as a good lot. But this is under the idea that they are looking for evidence that lot is not good. When the AQL sampling plan is applied to design V&V and manufacturers do not reject *H*_{0}, what can be said about the PCS of the design performance? The typical AQL sampling plans applied to demonstrate whether or not the PCS of a system is good enough to meet its goal do not technically allow us to conclude the PCS of the system is good just because null is not rejected.

In V&V phases manufacturers will have to demonstrate whether or not the PCS of a system is good enough to meet its goal with a specific confidence level with the assumption that the requirements have not been met unless testing demonstrates they are so. Therefore, AQL sampling plans are not suitable for testing in the V&V phases. Thus, if manufacturers want to demonstrate how good the PCS of a product performance, first assume that the requirements have not been met, and then try to gather evidence to the contrary, i.e. evidence that suggests they are so. Therefore, the null hypothesis must be stated as the following:

*H*_{0}: probability [non-conformance] > desired non-conformance level

*H*_{1}: probability [non-conformance] ≤ desired non-conformance level

The hypotheses above can be written in term of PCS as follows:

*H*_{0}: PCS < desired PCS level

*H*_{1}: PCS ≥ desired PCS level

Validation will be passed if *H*_{0} is rejected. The rejection criterion would be the maximum number of failures, *X _{c}*, found in a sample of size

*N*with a desired PCS level, should be such that

Probability [*X* ≤ *X _{c }*|

*N*, desired PCS level] = 1- Confidence Level.

*X *is the number of failures. This is the probability of passing the demonstration test although the device does not meet the requirement, i.e. consumer’s risk (Pardo, 2013).

The basic principle of demonstration is to demonstrate if a product characteristic performs as designed from a sample of devices that is tested under conditions which are considered to be representative of their operational use. Test results are measured by determining if the product was passed or failed to meet its specification as percent of units conforming to requirements characteristic. Based on the results of such a test, a decision is taken on the acceptability of the population of devices which the sample represents, that is, future production items. In any sampling test, there are risks to both the producer and the consumer that a wrong decision can be reached. The degree of risk will vary according to such factors as the sample size and test duration and must therefore be agreed and specified when planning demonstration tests.

## PASS-FAIL TEST BASED ON THE NON-PARAMETRIC BINOMIAL DISTRIBUTION FOR ATTRIBUTE DATA

There are two types of data to be evaluated in V&V tests of each of product, component or process characteristic, i.e. variables (quantitative) data and attributes (pass/fail) data. In general, these characteristics are the critical to quality characteristics of the product performance. A method widely used in practice to determine the sample size needed in order to demonstrate a specified PCS at a given confidence level for a characteristic with attribute data is based on non-parametric binomial (NPB) distribution model (Guo et al., 2013). To use the binomial distribution model to predict the PCS for devices, the trials in the sample must meet the following conditions. Each trial has only one of two possible outcomes and must be independent; the outcome of one trial cannot influence an outcome of another trial. All trials have the same PCS, i.e. each trial must come from an identical device or devices with an identical condition.

Determining the PCS of a device poses a unique challenge. Therefore, the test planner must have the knowledge necessary to determine the sample size that must be tested to demonstrate a desired PCS of the population at some acceptable level of confidence. The calculations are based on the Binomial Distribution and the following formula:

Where *CL* is the confidence level,* f* is the maximum number of failures, *N* is the sample size, and *R *is the demonstrated PCS which is equal to 1 – proportion non-conformance. 1 − *CL* is the probability of *f* or fewer failures occurring in test of *N* units or the probability of passing the demonstration test although the device does not meet the requirement. Therefore, the NPB equation determines the sample size by controlling for the error to pass non-conformance devices. If no units failed the test is called success-run testing. If *i* = 0 (no devices failed), the CL is defined as 1 – *R ^{N}*. Sampling plans for V&V will ordinarily provide greater confidence than those used in normal production. Given any three of variables in equation (1), the remaining one can be solved. Attachment A provides a table of sample sizes for different combinations of PCS levels (

*R*), confidence levels (

*CL*), and maximum numbers of failures (

*f*). As a comparison to the data generated from a normally distributed population, capability (Ppk) of the process validation can be calculated as 1/3 of the inverse of the normal cumulative distribution for the corresponding reliability performance level and its results are shown in Appendix A.

**Example 1**: A geometric characteristic of a newly designed device is being validated. The risk of this characteristic is “minor” corresponding to a non-conformity that may cause the product to function poorly or cause an inconvenience but still fit for use. The recommended reliability performance level is 99.0% per “minor” risk of this characteristic. It is suggested confidence level is 90% corresponding to the design verification of a new product. A product engineer wants to design a zero-failure demonstration test in order to demonstrate a reliability of 99.0% at a 90% confidence level using the NPB method to determine the required sample size.

Thus, sampling plan is *R* = 99.0%, *CL* = 90%, and *f* = 0. Substituting these values to equation (1) will give the corresponding sample size 230 (Appendix A). This sample size will be collected randomly from the pilot production for this design verification. If those 230 devices are run for the required demonstration test and no failures are observed, i.e. null hypothesis that failures > 0 is rejected, then a PCS of 99.0% or higher with a 90% confidence level has been demonstrated. If the PCS of the system is less than or equal to 99.0%, the chance of passing this test is equal to 1 − *CL* = 10%, which is the error to pass non-conformance devices. Therefore, Equation (1) determines the sample size by controlling for the error to pass non-conformance devices.

Several other methods have been designed help engineers develop sampling plans for V&V tests such Cumulative Binomial, Exponential Chi-Squared, Life Testing and Non-Parametric Bayesian (Guo et al., 2013)

## A VARIABLE-DATA TEST BASED ON TOLERANCE INTERVALS FOR A NORMAL DISTRIBUTION

A method widely used in practice to determine the sample size needed in order to demonstrate a specified reliability at a given confidence level for a characteristic with variable data is based on a normal tolerance interval (NTI) model (Hahn and Meeker, 1991). A tolerance interval is a statistical interval within which, with some confidence level, a certain proportion of a sampled population falls. The endpoints of a tolerance interval are called upper and lower tolerance limits. If the demonstration test results are variable data, then calculate tolerance interval of the data; tolerance interval that covers at least a certain PCS of the device with confidence level should be within specification limits of the device characteristic to pass the V&V requirements.

In most cases a characteristic of the device can be addressed by three types of tolerance intervals: a two-sided interval, lower to one-sided interval, and upper one-sided interval. The corresponding tolerance intervals are defined by lower (L) and upper (U) tolerance limits which are computed from a series of *n* device characteristic measurements *Y*_{1},…,*Y*_{n} and described as follows:

where the

is the average value of *Y*, the *s* is the standard deviation of *Y*, the *k* factors are determined so that the intervals cover at least a certain R of the device with a certain *CL* (NIST/SEMATECH, 2013). Equation (2), (3) or (4) guarantees with the probability *CL* that *R* percent of the PCS measurements is contained in the interval, will not fall below a lower tolerance limit, or will not exceed an upper limit, respectively.

If the data are from a normally distributed population, an approximate value for the *k*_{2} factor as a function of *R* and *CL* for a two-sided tolerance interval is

where *Χ* ^{2}_{1-}_{CL}_{, }* _{ν}* is the critical value of the chi-square distribution with degrees of freedom

*ν*that is exceeded with probability

*CL*,

*z*

_{(1-}

_{R}_{)/2}is the critical value of the normal distribution associated with cumulative probability (1-

*R*)/2, and

*N*is the sample size. The quantity

*ν*represents the degrees of freedom used to estimate the standard deviation. Most of the time the same sample will be used to estimate both the mean and standard deviation so that

*ν*=

*N*- 1, but the formula allows for other possible values of

*ν*.

The calculation of an approximate *k _{1}* factor for one-sided tolerance intervals comes directly from the following set of formulas:

Given the *R*, the *CL*, and the *N*, factor *k*_{1 }can be found from equation (6). Appendices B-1 and B-2 provide tables of the combination of preferred *N*, and factor *k*_{1 }for different combinations of reliability performance levels (*R*), and confidence levels (*CL*). In addition, capability (*P _{pk}*) of the process validation can be calculated as 1/3 of the inverse of the normal cumulative distribution for the corresponding reliability performance level.

**Example 2: **Packaging seal strength for a new design is being verified. The one-sided specification limit of the seal strength is 10 lbs. minimum. Reliability performance level to be demonstrated is 99.6% with Confidence Level equal to 90% for one run.

Given R = 99.6% and *CL* = 90%, equations (6)-(8) will provide a combination of sample sizes and *k*_{1} factors: *N* = 20 and *k*_{1} =3.42; *N* = 30 and *k*_{1} = 3.25; *N* = 40 and *k*_{1} = 3.15, etc. (Appendix B-2)

The verification test was run based on sampling plan of *N* = 40 and *k*_{1} = 3.15. The data passed the normality test for the run: sample average = 13.1 lbs. and *s* = 0.6 lbs. Thus, the lower tolerance interval is 13.1 lbs. – 3.15 * 0.6 lbs. = 11.21 lbs. Since the lower interval was above the lower specification limit for the design verification run, the new design packaging seal passed.

## PROCESS STEPS – SELECTING THE SAMPLING PLAN AND ACCEPTANCE CRITERIA

Based on the NPB Distribution model and NTI model used to develop demonstration tests of PCS above this section will propose the flow how to determine a sampling plan and to make a decision whether the plan passes or fails. The process flow diagram of the selection of a sampling plan and acceptance criteria is shown in Figure 1.

**Step 1** is to determine the desired *R* and the overall *CL* for each product, component or process characteristic to be evaluated. *R* and *CL* must capture the probability of risk of the product characteristic that may cause some dissatisfaction or harm to users if the product characteristic does not conform to its specification. Many manufacturers rank the risk from cosmetic, minor, major to critical. Cosmetic risk may be defined as nonconformity detrimental that will not affect usability or functionality of the product and affects only appearance of the product. Minor risk may be defined as a nonconformity which may cause the product to function poorly or cause an inconvenience but still be fit for use or may possibly generate a complaint. Major risk may be defined as a nonconformity which may cause the product to be unfit for use significantly degrades the product’s function or performance or is very likely to generate a complaint. Critical risk may be defined as a nonconformity that is likely to present a hazard to health. For example, a product characteristic with critical, major, minor and cosmetic risk, respectively, shall have *R* levels > 99%, > 97%, > 95%, and > 90%, respectively, with confidence level must be greater than or equal to 90% in order to have at least the *R* > 80%.

**Step 2**is to identify data type of each product, component or process characteristic to be evaluated, i.e. either as variable or pass/fail data. In general, these are the critical quality characteristics of the product or process output.

**Step 3**is to select the sampling plan(s) to meet the desired *R* and *CL*. Selection for attribute data is provided in a table in Appendix A. Selection for variable is provided in tables in Appendices B-1 and B-2. Samples shall represent the behavior of process validation or design verification runs. Random sampling or other method, such as periodic sampling, stratified sampling, or rational sampling is commonly used to assure samples are representative of the entire run.

**Step 4 **is to perform verification and/or validation run(s) to collect test samples.The minimum size per length of each run should normally reflect the expected production run.

**Step 5 **is to perform statistical analysis of pass/fail data collected from Step 4. The verification and/or validation run passes if the number failed units is less than or equal to the maximum number of failures (acceptance number) in the table (Appendix* A*).

**Step 6** is to perform good fitness test on the variable type data if the data is normally distributed.

**Step 7** is to performance statistical data analysis by calculating NTI of the data, if the data pass normality test. NTI is calculated based on sample average, sample standard deviation and normal tolerance interval factor from Appendices B-1 or B-2. The interval should be within specification limits to pass the run:

I. If the specification has lower specification limit (LSL) only, then the run passes if (sample average – *k _{1}* * sample standard deviation) ≥ LSL.

II. If the specification has upper specification limit (USL) only, then the run passes (sample average +

*k** sample standard deviation) ≤ USL.

_{1}III. If the specification has two-sided specification limits, then the run passes if (sample average –

*k** sample standard deviation) ≥ LSL and (sample average +

_{2}*k** sample standard deviation) <= USL.

_{2}**Step 8** is to add more data using NTI Test sampling as in Step 3, If normality test in Step 6 fails, and then perform additional verification and/or validation runs. In this case the normal tolerance interval approach is probably not appropriate.

**Example 3: **Some changing within the IO Audio Driver, Wave File and Sound Manager interaction was performed in order to increase Surgical Equipment GUI performance. The data demonstrates that the prior to optimization the GUI reliability or PCS level (no freeze) was 93% with 95% confidence level. The target was to increase the PCS level to 99% with 95% confidence level.

Table to be used in this example is in Appendix and the sampling plans are *R* = 99% and *CL* = 95%. The corresponding sample size was 459 with failure to accept is 0 and reject is 1 or higher. Formal engineering testing via total 500 (rounded up) simulated surgery tests was done and the run passed.

**Example 4: **Fill volume of new filler with specification limits: 1000 - 1060 ml is being validated. PCS Level to be demonstrated is 99.0% with 99% overall Confidence level. Three runs at 90% confidence each will give about 99.9% overall. Overall confidence is calculated as (1 – (1 – 0.90)^{3}) * 100% = 99.9%.

Sampling plan selected from two-sided 90% confidence: *N* = 20, *k _{2}* = 2.15 for each run. Table to be used is Appendix B-1: Normal Tolerance Limit Factors (

*k*) for Two-sided Specification Limits.

_{2}Sampling plans are *R* = 99.0%, *CL* = 90% per run. The corresponding sample size is 20 and *k _{2}* = 2.15. The data passed the normality test for each run with the summary statistics:

Where is the sample average and *s* is the sample standard deviation. Since ± 2.15*s* was within the specification limit for all three runs, the plan passed. We can be 99.9% confident that the process produces more than or equal to 99.0% conforming units.

## CONCLUSIONS

In this article, practical sampling plans and their step-by-step procedure to select a suitable plan are developed based on NPB Distribution model and NTI model for attribute data and variable data, respectively. These solid statistical sampling plans that are required by regulatory are suitable for demonstrating the probability of conformance to specification of medical device performance in the design V&V stages.

## REFERENCES

- H. Guo, E. Pohl, and A. Gerokostopoulo, Determining the Right Sample Size for Your Test: Theory and Application.
*2013 Annual Reliability and Maintainability Symposium*, IEEE. - G.J. Hahn and W.Q. Meeker,
*Statistical Intervals: A Guide for Practitioners.*John Wiley & Sons, Inc., 1991. - NIST/SEMATECH, e-Handbook of Statistical Methods, 2013. (Available at: http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm accessed February 18, 2015)
- S. Pardo,
*Equivalence and Noninferiority Tests for Quality, Manufacturing and Test Engineers**,*Chapman and Hall/CRC, 2013. - U.S. FDA, Code of Federal Regulations Title 21, 2014. (Available at: http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=820 accessed February 18, 2015)
- U.S. FDA, Guidance for Industry Process Validation: General Principles and Practices
*Current Good Manufacturing Practices (CGMP)*, Revision 1, 2011. (Available at: http://www.fda.gov/downloads/Drugs/Guidances/UCM070336.pdf accessed February 18, 2015).

## FIGURE 1: Process Flow on Selecting and Applying Statistical Sampling Plan for Design V & V

## APPENDIX A: Attributes Sampling Plan - Non-parametric Pass-Fail Test

Hi,

I don't currently have access to a tool with the processing power for solving equation 1, would it be possible to get the calculated sample size for a plan with 95% confidence, 99% reliability, f=3 failures?

Appreciated,

Kate

Hi Dr Ferryanto,

Great article! Just one question, where is appendix A?

BR

Jimmy

Hi Jimmy,

Thank you for reaching out and pointing this out! The missing Appendix should now appear on the website correctly.

Hi Alexei,

Thanks for reading my article and finding typos.

Yes, B-1 should be for k2 and B2-2 should be for k1.

Thanks,

Liem

Hi Dr. Ferryanto,

I just wonder how do you do statistical sample plan for medical device design verification?Base of the definition of design verification is compare design output to design input to determin whether the design input is fullfiled. But according to you statistical samplig plan, do you meant I need to make several copies of a product design drawing to check the specification put on the paper is still meet the design criteria? Does specification by design can be changed by making copy? it is quite confusing me. I realy appreciate if you can help me to get out of the confusion.

Best Regards,

Bert

Hi Dr. Ferryanto,

Great information! Thank you! Where is Appendix A?

Dave

I believe there are typos in titles for appendices B-1 and B-2:

B-1 should have k2 and B-2 should have k1.

Hi Alexei,

Thanks for reading my article and found typos. Yes, B-1 should be k2 and B-2 should have k1. The correct statements are in the body of the article.

Thanks,

Liem

## Post new comment