Pressure ulceration has detrimental impacts on patients both physically and psychologically and is associated with significant economic implications for health services. It is therefore paramount that at-risk patients are identified before significant pressure-related tissue damage occurs in order to effectively implement primary preventive interventions (Mervis and Phillips, 2019). The use of pressure ulcer risk assessment tools (PURAT) in adult patients is highly recommended by the European Pressure Ulcer Advisory Panel (EPUAP et al, 2014), advocated as a ‘consideration’ by the National Institute for Health and Care Excellence (2014), but considered to have no impact on the incidence or severity of pressure ulcers by the Cochrane Collaboration (Moore and Patton, 2019).
The lack of consensus surrounding the value of PURAT indicates a potential lack of evidence for the clinimetric properties of the tools evaluated, specifically features of the tools identified in seminal work by Feinstein (1987): reliability, validity and sensitivity. Notably, the most commonly utilised risk assessment tools—the Waterlow and Braden tools—have been demonstrated to have low sensitivity and specificity in differentiating the levels of risk in patients, potentially limiting their clinical value (Qaseem et, al 2015). This article evaluates the clinimetrics of a PURAT developed by Nixon et al (2015): PURPOSE-T.
Validity
The validity of a tool refers to how effectively it measures the phenomena it was intended to assess (Charalambous et al, 2018). In the case of the PURPOSE-T, validity depends on its accuracy in identifying at-risk patients. The simplest form of validity is face validity, which refers to how much a tool appears to measure what it is intended to measure (Charalambous et al, 2018). Despite face validity being a poor determinant of overall validity of a tool, it is associated with greater compliance because users are more likely to be accepting of its utility—and this may lead to better outcomes for patients (Bannigan and Watson, 2009). This issue is particularly pertinent in evaluating PURAT because it has been suggested that clinical judgement is equally if not more effective than the use of tools; this indicates that a tool that stimulates clinical judgement may improve the accuracy of risk assessments, even if it contains flaws in content or criterion validity (Moore and Patton, 2019).
In an evaluation of PURPOSE-T by Coleman et al (2018) its content validity was assessed via field notes recorded by expert nurses; they noted that it appeared to include important risk factors and that it prompted skin checks by the tool, encouraging more careful skin assessment. A recent review by Anrys et al (2019) on risk factors for pressure ulcer (PU) development supports these observations, noting that daily skin inspection is essential for the timely identification of at-risk patients. Content validity is determined by how effectively the items included in a tool measure the intended outcome. In the case of PURPOSE-T, the content validity would be evidenced by how accurately the included risk factors reflect the true risk of PU development in patient populations (Charalambous et al, 2018). The risks included in the PURPOSE-T tool were drawn from a minimum data set, which was derived from a systematic review of PU risks, combined with consensus judgements from a multiprofessional panel (Coleman et al, 2014). Although the validity of consensus judgements is contentious, the potential for bias was mitigated by Coleman et al (2014) via the use of expertise from a range of specialties, and the risk factors selected for discussion were derived from a systematic review of the literature. Other than participant specialty, there were no other clear threats to the validity of decisions made via group consensus (Turnbull et al, 2018). Arguably, the clinical outcomes yielded following implementation of consensus-based tools are the only way to provide dependable evidence of validity (Fitch et al, 2001).
The systematic review used as a basis for the identification of risks used in the PURPOSE-T consensus panel identified issues in the literature surrounding PU risk identification (Coleman et al, 2013). Large numbers of independent risks were identified in the literature, such as the ‘over interpretation of results from individual studies'; however, three primary risk domains were ultimately identified: mobility, perfusion and skin status (Coleman et al, 2013). The assertion that many studies include too many risk factors, potentially leading to the overprediction of risk, was later supported in an evaluation of the popular Waterlow tool, which it was suggested included too many risk factors (Charalambous et al, 2018). Specifically, the study by Coleman et al (2013) did not find that the factors commonly considered to be indicative of risk—such as gender, age and medications—were significant predictors of risk.
Criterion validity is considered to be an important determinant of diagnostic accuracy in PURAT, and the literature evaluating risk assessment tools most often focuses on this form of validity. However, the different concepts of validity cannot always be differentiated, for example convergent or predictive validity (Kottner and Balzer, 2010). At present, there is no gold standard for assessing PU risk (NICE, 2014), making convergent validity evaluations dependent partly on clinical judgement. Confounding this, intervention following risk assessment may potentially impact the evaluations of predictive validity (Coleman et al, 2018). Due to the complexity of the issues, such as risk assessment for PUs, guidance has been published by the Medical Research Council on the evaluation of complex interventions suggesting that, due to the challenges inherent in the measurement of the clinical impact of certain interventions, consideration should be given to non-experimental forms of evidence to guide clinical practice (Skivington et al, 2018).
Currently, experimental evidence of the validity of PURPOSE-T is limited to the review by Coleman et al (2018), and the tool has not yet been evaluated or included in guidance by NICE or Cochrane. Clough (2015) undertook a local evaluation of the implementation of the PURPOSE-T in an NHS trust in the UK, identifying that is was associated with a 37% decrease in category 2 PUs over a 1-year period; it was also well liked by staff. Although this provides some evidence of face validity and potentially good compliance with the tool, the study was limited by the low sample size of 31.
The consistency of a tool to produce true positive assessments is known as sensitivity, and the consistency with which it produces true negatives is its specificity (Siedlecki and Albert, 2017). Determining the sensitivity and specificity of a tool depends on a definitive clinical outcome to enable determination of whether or not the assessment outcome was correct (Lalkhen and McCluskey, 2008). In the case of PURPOSE-T, this would depend on its ability to assess accurately that no risk is present when it is not, which would be evidenced by a lack of the development of a PU, and, on the contrary, the development of one where risk was assessed to exist. However, collecting accurate data on this presents ethical challenges because patients (without PUs) who are assessed as being at risk should be offered primary preventive interventions (NICE, 2014) and may therefore not present with a PU, indicating an accurate assessment of risk.
The EPUAP (2014) has advised that reliance should not be placed solely on PU risk assessment so it is possible that changes in clinical status may lead to the implementation of preventive interventions without formal reassessment of PU risk. In the event of patients becoming critically unwell, potentially drastic changes in risk for pressure ulceration may occur due to changes in a patient's mobility, perfusion and medical device-related risk. Notably, risk factors associated with changes in declining clinical status, mobility and perfusion, were associated with the lowest reliability scores in the PURPOSE-T evaluation by Coleman et al (2018). This may affect the accuracy of data collected on the sensitivity or specificity of tools such as PURPOSE-T. These may have been accurate at the time they were completed but, due to changes in patients' clinical status, they may no longer reflect the true risk for PUs.
Ultimately, evidencing the specificity and sensitivity of risk assessment tools remains methodologically challenging and the clinical value of tools such as PURPOSE-T with regard to their predictive value currently depends on long-term outcome data, which is yet to be gathered (Ferrante di Ruffano et al, 2012). Early adopters of the tool, such as Clough (2015), demonstrated that, following implementation of PURPOSE-T there was a significant reduction in PU incidence, which may indicate the high sensitivity of the tool in identifying at-risk patients, allowing effective preventive measures to be implemented. It has recently been argued that traditional indicators of statistical significance (specifically P values) should be abandoned, particularly in complex clinical trials that are difficult to reproduce (McShane et al, 2019). This is due to the frequently erroneous relationship of P values with clinical realities, combined with clinicians' dependence on them in the evaluation of trials (McShane et al, 2019). The issues surrounding statistical evaluations of the predictive value of PURPOSE-T are complicated by an ongoing lack of a gold standard tool for PU risk assessment, making comparisons with other tools limited in their indication of effectiveness (Siedlecki and Albert, 2017).
Coleman et al (2018) compared the convergent validity of the PURPOSE-T against that of the Waterlow and Braden tools. Notably, in comparison with these commonly used tools, medium to strong phi correlation coefficients were demonstrated with regard to determining PU risk, as well as identifying risk factors common to the three tools. The clinical significance of these findings is difficult to assess because many of the factors used in earlier tools were removed in PURPOSE-T following the initial systematic review (Coleman et al, 2014). Additionally, the data analysed using the phi coefficients rely on a normal distribution of data (or risk) (Mukaka, 2012), which may not necessarily be present in the populations that were assessed in the study, who may be already be considered ‘at risk’ due to being in an acute care setting. It is also important to note that, in Coleman et al's (2018) study, the assessment data were incomplete, which may have affected the statistical analysis of the correlation between PURPOSE-T and the other two assessment tools. In addition, the correlation coefficients calculated may not necessarily reflect the consistency of the tools to predict risk, but may simply reflect commonalities in the subjective judgement between users of the tool(s), which cannot be demonstrated statistically because there is no gold standard assessment with which to determine the presence of complex issues such as risk (Akoglu, 2018).
Children, psychiatric and critically unwell patients were excluded from the study by Coleman et al (2014), although these groups cumulatively represent a large proportion of the population and are considered to be particularly vulnerable to developing PUs (Crane et al, 2019). Based on the lack of testing, it could be argued that the validity of PURPOSE-T in the excluded patient groups is impossible to determine. However, ethical challenges surrounding medical research in these patient groups due to their vulnerable status can prevent effective experimental studies being completed to determine efficacy of tools such as PURPOSE-T (Hlavin et al, 2016). This lends further credence to recent assertions that experimental evidence is perhaps limited in its contribution to the evidence surrounding complex concepts such as PURAT (Skivington et al, 2018).
Reliability
The reliability of a risk assessment tool refers to the consistency of different users to obtain the same outcome using the same tool. Inter-rater reliability scores rely on comparison of multiple individuals using the same tool on the same patient at the same time, yielding data that helps to determine reliability (Siedlecki and Albert, 2017).
Coleman et al (2018) demonstrated good inter-rater reliability in 230 three-way paired assessments that involved ward, community and expert nurses. Overall, the data gained by Coleman et al (2018) appeared to provide compelling evidence of the reliability of PURPOSE-T. However, subcategories within the assessment showed poor reliability, specifically perfusion status (65.4% agreement), sensory perception (79.1%) and mobility assessment (59.2%). Although the overall reliability score of the tool was high, these three categories fall below the desired 80% agreement, which indicates inadequacies either in the training of the users or of the tool itself (Siedlecki and Albert 2017).
Nurses' limited knowledge of PUs has created difficulties internationally in the provision of PU care. In addition, poor knowledge has been associated with attitudes focused on treatment rather than prevention, poor understanding of risk factors and reduced multidisciplinary input (De Meyer et al, 2019; Fulbrook et al, 2019). This lack of knowledge may explain the poor inter-rater reliability scores in the identified three domains, which may arguably require greater knowledge to assess correctly. An alternative explanation for the reliability issues is a lack of effective methods to assess the domains that have poor agreement. Specifically, mobility assessment may be hindered by poor communication with patients (Coleman et al, 2018).
Current NICE (2014) guidance on PU risk assessment recommends the assessment of perfusion via observation of changes in skin colour and the presence of non-blanching erythema, which are known to be subject to observer bias (Parahoo, 2014). Arguably, these subjective assessment methods may be effective only once an injury exists and is of little value in assessing risk. The poor reliability of these aspects of PURPOSE-T may reflect the general poor clinimetric properties of the methods used to assess these specific issues (perfusion, mobility and sensory perception). Perfusion, in particular, is notoriously difficult to assess using non-invasive techniques (Goodall et al, 2019) in patients with poor communication (Coleman et al, 2018); it is also difficult to assess in situations where there is a lack of multidisciplinary input and therefore no alternative or expert input to aid the assessment of these more challenging risk factors (Fulbrook et al, 2019).
These issues may be confounded by the prevalent perception that PU management is strictly the domain of nurses, who have demonstrably poor knowledge of PU management (De Meyer et al, 2019). This lack of knowledge may prevent the timely involvement of the multidisciplinary team and, ultimately, the provision of effective patient care (De Meyer et al, 2019). The reliability of PURPOSE-T may be improved by the use of assessment methods that are less subjective and by improving the clarity of the language used in some of the sections, such as the mobility assessment; this would help to produce more accurate and reliable results.
Conclusion
Risk assessment tools for PU prevention remain dependent on the clinical judgement of the staff using the tools and a gold standard is yet to be developed (Moore and Patton, 2019). A large-scale evaluation of PURPOSE-T determined that it has good face validity as reported by experts and non-experts (Coleman et al, 2018). This correlates with the findings of the earlier study by Clough (2015), who reported that the tool was well accepted by staff based on its usability and content. The content validity of the tool was determined by a combination of data from a systematic review and the conclusions of a consensus meeting (Coleman et al, 2014). This controversially eliminated many risk factors previously considered to be important in determining the risk for PUs (Coleman et al, 2014), ultimately creating challenges in assessing convergent validity due to the dissimilarities between PURPOSE-T and older tools. This was confounded by statistical assumptions surrounding the distribution of risk factors within the study (Mukaka, 2012).
The inter-rater reliability of PURPOSE-T was demonstrably poor on three subdomains: perfusion status, mobility and sensory perception (Coleman et al, 2018). It is likely that this reflects clinimetric issues with the methods used to assess these specific risks and may be confounded by users' limited knowledge (Siedlecki and Albert, 2017). Overall, the tool appears to produce reliable and consistent assessment outcomes between experts and non-experts (Coleman et al, 2018). Determining the sensitivity and specificity of PURPOSE-T remains difficult, due to the ethical considerations inherent in obtaining definitive outcomes with which to compare initial risk assessments (Lalkhen and McCluskey 2008). Further research is needed to provide robust evidence on the clinimetric values of PURPOSE-T, including studies with patient populations in which it has not been tested, for example, children, critically ill individuals and psychiatric patients.