© 2003 European Society of Cardiology
Comparing methodological quality and consistency of international guidelines for the management of patients with chronic heart failure
Institute of Social Medicine, Epidemiology and Health Economics, Charité Hospital, Humboldt University of Berlin D-10098 Berlin, Germany
* Corresponding author. Tel.: +49-30-450-529002; fax: +49-30-450-529902. E-mail address: michael.kulig{at}charite.de
| Abstract |
|---|
|
|
|---|
Background: Guidelines (GLs) for the management of heart failure (HF) are of great importance in order to define and disseminate therapeutic recommendations based on scientific evidence. The aim was to analyse and to compare the methodological quality of HF GLs as well as to evaluate the consistency of therapeutic recommendations.
Methods: Eleven international GLs for the management of chronic HF were identified by search of the internet, electronic databases and references of published literature. Their methodological quality was assessed by two different appraisal instruments: (1) according to the US National Guideline Clearinghouse (NGC) on a scale from 0 to 17 points, (2) according to the German Guideline Clearinghouse (Agency for Quality in Medicine, AQUMED) on a scale from 0 to 44 points. Clinical criteria for assessment of the consistency of the recommendations included diagnostic testing, pharmacological and non-pharmacological treatment.
Results: The quality scores of the GLs varied substantially with a range of 1.5–15.5 points (NGC) and 8–30 points (AQUMED). The greatest variation was found in the dimensions development and evidence. Only 3 of the 11 GLs (
30%) were rated as methodologically well prepared. The recommendations on diagnostic procedures and medical management were rather consistent among the different GLs.
Conclusions: Published international GL recommendations on medical management of patients with chronic HF are broadly consistent. The methodological quality of the GLs, however, varies to a great extent. Improvement is needed in most methodological aspects, especially in the dimensions evidence and applicability.
Key Words: AQUMED, Agency for Quality in Medicine HF, heart failure GL, guideline NGC, National Guideline Clearinghouse
Received September 2, 2002; Revised December 23, 2002; Accepted February 17, 2003
| 1. Introduction |
|---|
|
|
|---|
Chronic heart failure (HF) is a common, costly and disabling syndrome with important prognostic implications. Surveys in the UK and elsewhere show that 1–2% of the general population and 9–15% of women and men over 65 have HF [1–4]. The prognosis is poor with a 5-year mortality of 50–60% in community studies [5]. The total annual treatment costs for HF are nearly 2% of the health care budget of the industrialized countries [6–9]. As the population ages and the prevalence of HF increases, expenditures related to the care of these patients will climb dramatically [10].
HF is a complex disease that may be managed in a variety of ways. In spite of the array of therapeutic options developed over the last decades, many patients with chronic HF do not seem to receive effective treatment. For example, large differences in drug use and dosing for patients with advanced HF are observed between (European) countries ranging from 25 to 62% [11]. None of these differences could be explained by differences in patient characteristics [12]. Another study showed that only 44% of ideal candidates from university hospitals in the USA were receiving ACE inhibitors in the recommended doses [13]. In an Italian study cohort of very old people with HF only 25% had a prescription for ACE inhibitors [14]. But the costs per year of life saved by vasodilators, and particularly by ACE, are very favourable (<$10 000) and much lower than those of most other well accepted medical strategies [15]. Pharmacological treatment of HF is very cost-effective and could reduce total treatment costs.
Over recent years, practice guidelines (GLs) have been developed in increasing numbers in most medical specialties. Due to this growing number of available GLs, discussions have evolved about the quality of the GLs, the level of evidence of the provided recommendations as well as the critical appraisal process itself. GLs are of great importance in order to define and disseminate therapeutic recommendations based on scientific evidence. GL application could result in a decrease of treatment variation, in improving outcomes, and limiting costs [16]. Scientifically valid GLs are based on a comprehensive and systematic review of the best available evidence, derive the recommendations from the best available evidence, and demonstrate explicitly how the recommendations are linked to the evidence [17]. Methodological principles for the development of GLs have been formulated [18–20] and several studies have shown an improvement in adherence to GL recommendations [16,21]. However, many GLs for various types of diseases are of unsatisfactory quality and do not meet internationally recognised criteria [21,22].
We evaluated and compared 11 international GLs for the management of patients with chronic HF by means of two different appraisal instruments [18,19]. In particular, we aimed to analyse the variety of methodological quality and the consistency of therapeutic recommendations.
| 2. Methods |
|---|
|
|
|---|
Eleven WHO [23], European [24–26], US [27–31], Canadian [32] and New Zealand [33] GLs published between 1994 and 2001 were identified by search of the internet, electronic databases and references of published literature (Table 1). Additionally, the former versions of the ACC/AHA and European Society of Cardiology GLs (published in 1995 and 1997, respectively), were also evaluated, because during the evaluation process these two GLs were updated.
|
First, we evaluated these GLs using the criteria of the National GL Clearinghouse, USA [18]. From the NGC's checklist, the guideline summary sheet with a total of 49 criteria, we selected all 12 key items that are clearly related to the methodological quality of creating a GL (Table 2). These items addressed evaluation criteria, such as the assessment of the methods to collect evidence, or the quality and strength of evidence, or the review methods. We grouped these criteria into three dimensions (development, evidence, and content and format). A quality score with a maximum of 17 points was constructed by adding up these items (Table 2). Items that we did not select were, for example, number of references or GL length, GL availability, adaptation from another GL, implementation plan for the GL, cost analysis or describing the major recommendations. The latter were considered when we evaluated the consistency of the diagnostic testing, pharmacological and non-pharmacological treatment recommendations.
|
Additionally, we assessed the methodological quality of GLs with a standardized evaluation instrument published by the German GL Clearinghouse of the Agency for Quality in Medicine (AQUMED) [19], a joint institution of the German Medical Association and the Association of the Statutory Health Insurance Physicians (NASHIP). The criteria of this instrument have been developed in accordance with British and Scottish appraisal instruments [20,34] meeting the requirements of evidence-based medicine methodology. Analogous to the NGC instrument, 44 criteria (items) were grouped into dimensions evaluating the quality of GL development (14 items), the evidence (7 items), the content and the format (17 items), and an additional dimension evaluating the applicability of the GL (6 items). A quality score with a maximum of 44 points was constructed by adding up the items.
In the AQUMED instrument the item evidence is part of the dimension development. In the NGC instrument the item evidence is described separately. Because of its great importance and for reasons of comparability, we described evidence as a separate dimension in both instruments. For the AQUMED instrument a user's guide is provided with a detailed explanation for each question to ensure consistent interpretation of the instrument [35].
Each GL was independently evaluated and scored by ES and MK and items of disagreement were checked and mediated by discussion. In addition, measures of inter-observer agreement (kappa for single items and intraclass correlation coefficient for the two summary scores) were calculated in order to estimate the magnitude of disagreement between the two different evaluators. The statistical calculations were performed using SPSS 10.0.
| 3. Results |
|---|
|
|
|---|
Key characteristics of the included GLs are listed descriptively in Table 1. The assessment of the methodological quality of the GLs with items of the NGC instrument and with the AQUMED instrument is presented in Tables 3 and 4. The agreement of the ratings between the two evaluators was very good for the single items of the NGC instrument, i.e. before the mediation process the average kappa was 0.86 with a range from 0.61 to 1. For the AQUMED instrument the agreement was moderate (average kappa was 0.54 with a range from 0.2 to 1). The agreement between the evaluators for the summary scores was very good for both instruments (intraclass correlation coefficient was 0.98 for the NGC instrument and 0.86 for the AQUMED). The quality scores varied substantially with a range of 1.5–15 points (NGC, 17 points possible) and 8.0–30 points (AQUMED, 44 points possible). The greatest variation we found in the dimensions development and evidence (identification and interpreting evidence). The ranking in the total quality scores and in the subscores was similar with both instruments (Pearson correlation coefficient: 0.88). Within each instrument we also found great consistency between the results of the total score and their respective subscores. None of the GLs reached the maximum of points.
|
|
We rated three GLs (SIGN, AHCPR and H Found New Zealand) as methodologically well prepared GLs. Their quality ranking was highest in both instruments. Particularly for these three GLs, the scores of the dimension evidence were significantly higher than that of the other GLs. They corresponded best to the requirements of evidence-based medicine methodology. The method of literature search, the realization of a systematic literature review and a rating scheme with type, quality and strength of evidence was stated. One GL was developed explicitly without links between recommendations and levels of supporting evidence (AMDA). This GL adopted evidence-based recommendations from other GLs (e.g. AHCPR), which described in detail the process of collecting and analysing evidence. In nearly all GLs, however, information was missing about the time point up to which literature was collected and included in the review process. The description of the methods that were used to reach consensus among the GL developers was not very detailed if it was mentioned at all. An external peer review after completion of the GL draft was rarely performed. In most cases only an internal peer review was done.
In GLs with a high development score, we could clearly identify responsible organizations and authors, whereas complete information about authors qualifications, functions during GL development, and correspondence addresses were only given in some GLs. Four GLs stated whether external funding or other support was received (AHCPR, Action HF, H Found New Zealand and HFSA). In three GLs, the problem of potential biases as a result of the influence of funding bodies was discussed (ACC/AHA, Action HF and H Found New Zealand). A date for reviewing and updating was mentioned in only three GLs (ACC/AHA, SIGN and New Zealand). The person(s) responsible for this update, however, could not be clearly identified. The applicability was tested for only one GL by means of a pilot study (AHCPR). The results of that pilot study led to an improvement in the GL-draft. A separate detailed documentation of the methods of GL development has been provided as a GL report for the AHCPR GL.
The variation within the subscore content and format was smaller than the variation within the total score and the other subscores. All GLs were written in a comprehensive language. There are, however, differences concerning the clearness of the recommendations and their unambiguous interpretation. In 5 of the 11 GLs, important key recommendations are particularly emphasized (AMDA, Eur Soc Card, SIGN, WHO and H Found New Zealand) and clinical algorithms are given in 7 of 11 GLs. The target population addressed in the GL was sufficiently described in most of the GLs (8 of 11 GLs). Potential benefits and harms are stated in 9 GLs. Specific recommendations for the physician to make a decision about out-patient and in-patient care were only given in three GLs (AHCPR, ACC/AHA and H Found New Zealand). Patients issues, e.g. problems of adherence to medicaments, difficulties in stopping smoking or losing weight were considered in four of the GLs (AHCPR, WHO, Action HF and H Found New Zealand). None of the GLs dealt with the cost implications of the recommended management.
Items asking about applicability were solely part of the AQUMED instrument. This point was not intensively discussed by any of the GLs. Only two GLs had a link to additional material for dissemination and implementation, e.g. information brochure for patients (AHCPR and H Found New Zealand). In one GL the problem of local adaptation was considered (SIGN). Criteria for evaluation of the applicability of the GL, e.g. monitoring compliance or measurable outcomes, were not stated in any of the GLs.
Comparisons between former and updated versions. For two GLs (ACC/AHA and Eur Soc Card) there are updated versions available. The latest versions were included in our comparison. In order to investigate the changes from the former versions we did the same evaluation of the methodological quality for those versions as well. For both GLs the updated versions ranked higher than the former versions. The ACC/AHA GL gained 4.5 points in the total NGC score and 7.5 in the AQUMED score and the Eur Soc Card GL gained 3.5 and 5 points, respectively. The improvement was mainly due to a better methodological quality with respect to the subscore development and evidence. For example, there was a description of the individuals who were involved in the GL development group or a description of the methods used to interpret and assess the strength of the evidence. Recommendations were added for the use of aldosterone and angiotensin II receptor antagonists and amlodipine.
The recommendations on medical management themselves are largely consistent (Table 5). Recommended diagnostic procedures do not differ. General measures for the medical management include salt restriction and exercise programmes. Pharmacological treatment is based on ACE inhibitors for all patients (also rendered asymptomatic). In case of fluid retention diuretics should be added. Recent GLs indicate that clinically stable patients (NYHA classes I–III) on ACE inhibitor, diuretics and/or digoxin should be considered for treatment with a beta-blocker. This last recommendation is the most evident change compared to earlier GLs. Further drugs are only recommended in specific situations. No conclusive evidence exists for the use of antithrombotic agents, angiotensin II receptor antagonists, implantable cardioverter defibrillators and other therapeutic strategies.
|
| 4. Discussion |
|---|
|
|
|---|
To our knowledge, international GLs for medical management of patients with chronic HF have not been evaluated previously for their methodological quality. Using our quality rating for 11 GLs, we graded three GLs as high quality. However, although the methodological quality of the GLs, varies to a great extent, the recommendations on medical management are generally consistent. The recommendations of the three highest ranked GLs are clearly evidence-based; however, important aspects of evidence-based methods were missing in the other GLs. It appears not sufficient to present a rating scheme for the quality and strength of evidence without adequately describing the process of collecting and analysing the evidence. GLs of methodologically higher quality could still improve their format and their presentation (AHCPR, ACC/AHA and Action HF). This important point for the practical application of the GLs is not adequately reflected by the subscores content and format. Two GLs met the scientific criteria, and were well-structured and easy to understand as well (SIGN and H Found New Zealand). However, we found many aspects that could be improved regarding the dimensions development and content and format. All GLs should integrate the topic applicability in order to be able to measure the implementation of the GL and the effect on the patients outcome. Studies evaluating the effect of a potential improvement on patients care should follow. Some studies have already shown that hospitalisation due to HF could be noticeably reduced [36]. In addition, economic evaluations such as cost analyses may be performed.
Besides the evaluation of the methodological quality of the GLs itself we were interested to compare the results of the two appraisal instruments. According to the similar ranking of the GLs in each of the quality scores and the high correlation between the two summary scores, both instruments seem to be equally suitable for an assessment of the methodological quality of GLs. However, an important problem of the AQUMED instrument has been the ambiguity of many of its items. Intense discussion was necessary to reach consensus for those items among our review group. Disagreements and the need for discussion were rare using the unambiguous criteria of the selected NGC items. The advantages of the NGC criteria are their clearness and their importance with respect to the methodological quality.
Some parts of the AQUMED instrument have not been thought out carefully enough. According to this instrument, a positive criterion for external review of the GL is the publication of the GL in a scientific journal. But the editors of the journals are often also members of the developmental board of the GL. By strict definition this is not an external review. Another point of discussion refers to the complexity of several appraisal criteria. We would appreciate more elaborately developed categories to answer those aspects. For example, the criterion Are the recommendations presented in a clear, consistent, unambiguous and comprehensible way? was difficult to answer with a simple yes or no. As a result the differentiation and expressiveness of the subscore content and format was rather low.
Our observation that the medical content of the recommendations did not vary to a great extent raises the following questions. Are the methodological aspects overestimated in the discussion of the development and quality of GLs? Are there any practical consequences for the daily work of physicians and the management of patients if a GL of high or low quality is used? We would strongly argue against the temptation to consider methodological quality as not relevant:
(1) Without using evidence-based methods, developers of GLs may adopt recommendations in an unsystematic and rather subjective way, and they may be biased by undeclared conflicts of interest. A GL developed using appropriate methods supports evidence-based health care—not only in specialty centres, but also in primary care. To be used and implemented in daily work, the format is an important factor. The format should be user-friendly, with an easy access to clearly stated single recommendations. Another aspect is the differentiation of the recommendations for subgroups of patients, e.g. patient with different severity of CHF, very old patients, or not-compliant patients. If not stated, it remains unclear for the intended user as to which group of his patients the GL is appropriate.
(2) For the same reasons, the recommendations should be graded according to their evidence levels in order to enable the user to assess the appropriateness of the recommendations for his individual patients.
(3) If there is no testing of the applicability of GLs, they become an end in themselves—work of and for some specialists.
(4) Furthermore, the content of GLs may not be adequately adjusted to important changes of therapeutic or diagnostic procedures, if a systematic and up-to-date review of the literature is not done.
Evaluation of the methodological quality of GLs should encourage developers to improve their GLs with specific focus on both the scientific evidence and the user-friendliness.
| Acknowledgements |
|---|
We thank I. Schröter and C. Mallmann for their valuable contributions.
| References |
|---|
|
|
|---|
- McDonagh T.A., Morrison C.E., et al. Symptomatic and asymptomatic left-ventricular systolic dysfunction in an urban population. Lancet (1997) 350:829–833.[CrossRef][Web of Science][Medline]
- Clarke K.W., Gray D., Hampton J.R. How common is heart failure? Evidence from PACT (prescribing analysis and cost) data in Nottingham. J Public Health Med (1995) 17(4):459–464.
[Abstract/Free Full Text] - Schocken D., Arrieta M., Leaverton P., Ross E. Prevalence and mortality rate of congestive heart failure in the United States. J Am Coll Cardiol (1992) 20:301–306.[Abstract]
- Gardin J.M., Siscovick D., AntonCulver H., et al. Sex, age, and disease affect echocardiographic left ventricular mass and systolic function in the free-living elderly. The Cardiovascular Health Study. Circulation (1995) 91(6):1739–1748.
[Abstract/Free Full Text] - Dargie H.J., McMurray J.J., McDonagh T.A. Heart failure—implications of the true size of the problem. J Intern Med (1996) 234(4):309–315.
- Ryden-Bergsten T., Andersson F. The health care costs of heart failure in Sweden. J Intern Med (1999) 246(3):275–284.[CrossRef][Web of Science][Medline]
- Szucs T.D., Sokolovic E. Economic significance of heart failure. An overview of costs and economics of therapy. Herz (2000) 25(5):538–546.[CrossRef][Web of Science][Medline]
- Stewart S, Jenkins A, Buchan S, McGuire A, Capewell S, McMurray JJ. The current cost of heart failure to the National Health Service in the UK. Eur J Heart Fail 2002 June;4(3):361–71.
- Levy E. From cost of illness to cost-effectiveness in heart failure. Eur Heart J (1998) 19 (Suppl.):2–4.
- O'Connell J.B. The economic burden of heart failure. Clin Cardiol (2000) 23(3 (Suppl.)):6–10.[Web of Science]
- Hobbs F.D. Primary care physicians: champions of or an impediment to optimal care of the patient with heart failure? Eur J Heart Fail (1999) 1(1):11–15.
[Free Full Text] - Van Veldhuisen D.J., Charlesworth A., Crijns H.J., Lie K.I., Hampton J.R. Differences in drug treatment of chronic heart failure between European countries. Eur Heart J (1999) 20(9):666–672.
[Abstract/Free Full Text] - Nohria A., Chen Y.T., Morton D.J., Walsh R., Vlasses P.H., Krumholz H.M. Quality of care for patients hospitalized with heart failure at academic medical centers. Am Heart J (1999) 137(6):1028–1034.[CrossRef][Web of Science][Medline]
- Gambassi G., Forman D.E., Lapane K.L., et al. Management of heart failure among very old persons living in long-term care: has the voice of trials spread? The SAGE Study Group. Am Heart J (2000) 139(1):85–93.[Web of Science][Medline]
- Gaspoz J.M. Costs and benefits of heart failure treatment. Schweiz Med Wochenschr (1999) 129(4):131–137.[Web of Science][Medline]
- Cluzeau F.A., Littlejohns P. Appraising clinical practice guidelines in England and Wales: the development of a methodologic framework and its application to policy. Jt Comm J Qual Improv (1999) 25(10):514–521.[Medline]
- Helou A., Lorenz W., Ollenschlager G., Reinauer H., Schwartz F.W. Methodological standards of the evidence-based approach of clinical guidelines development in Germany. Consensus between the scientific community, self-governed bodies and practice. Z Arztl Fortbild Qualitatssich (2000) 94(5):330–339.[Medline]
- National Guideline Clearinghouse, USA. Available from: http://www.guidelines.gov.
- Bundesärztkammer, Kassenärztliche Bundesvereinigung. Beurteilungskriterien für Leitlinien in der medizinischen Versorgung. Dtsch Aerztebl 1997; 94: A2154-2155, B1622-1623, C1754-1755. English version: Evaluation Criteria for Clinical Practice Guidelines. Available from: http://www.azq.de.
- Scottish Intercollegiate Guidelines Network. Clinical Guidelines—Criteria for Appraisal for National Use. Available from: http://pc47.cee.hw.ac.uk/sign/critmain.html.
- Shaneyfelt T.M., Mayo-Smith M.F., Rothwangl J. Are guidelines following guidelines? The methodological quality of clinical practice guidelines in the peer-reviewed medical literature. JAMA (1999) 281(20):1950–1951.
[Free Full Text] - Grilli R., Magrini N., Penna A., Mura G., Liberati A. Practice guidelines developed by specialty societies: the need for a critical appraisal. Lancet (2000) 355:103–106.[CrossRef][Web of Science][Medline]
- Concise Guide to the Management of Heart Failure. World Health Organization/Council on Geriatric Cardiology Task Force on Heart Failure Education 1994.
- Diagnosis and Treatment of Heart Failure due to Left Ventricular Systolic Dysfunction. Scottish Intercollegiate Guidelines Network 1999.
- European Society of Cardiology. Guidelines for the diagnosis and treatment of chronic heart failure. Task Force for the Diagnosis and Treatment of Chronic Heart Failure. Eur Heart J (2001) 22:1527–1560. W.J. Remme and K. Swedberg (Co-Chairmen).
[Free Full Text] - Leitlinien zur Therapie der chronischen Herzinsuffizienz. Deutsche Gesellschaft für Kardiologie – Herz- und Kreislaufforschung. Z Kardiol 1998;97(8):645–658.
- Heart Failure: Evaluation and Care of Patients with Left Ventricular Systolic Dysfunction. Agency of Health Care and Policy Research 1994.
- Heart Failure. Clinical Practice Guideline. American Medical Directors Association 1996.
- ACC/AHA Guidelines for the Evaluation of Management of Chronic Heart Failure in the Adult. Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee to Revise the 1995 Guidelines for the Evaluation and Management of Heart Failure) 2001.
- Consensus recommendations for the management of chronic heart failure. On behalf of the membership of the Advisory Council to improve outcomes nationwide in heart failure. Am J Cardiol 1999;83(2A):1A–38A.
- Heart Failure Society of America (HFSA). Practice guidelines. HFSA guidelines for management of patients with heart failure caused by left ventricular systolic dysfunction—pharmacological approaches. J Cardiac Fail (1999) 5:357–382.[CrossRef][Web of Science][Medline]
- Management of Heart Failure. The College of Physicians & Surgeons of Manitoba (Canada) 1999.
- A Guideline for the Management of Heart Failure. The National Heart Foundation of New Zealand, Cardiac Society of Australia and New Zealand and the Royal New Zealand College of General Practitioners Working Party 1997.
- Cluzeau F.A., Littlejohns P., Grimshaw J.M., Feder G. Appraisal instrument for clinical guidelines (1997) London: St. George's Hospital Medical School.
- Helou A., Ollenschlager G. Goals, possibilities and limits of quality evaluation of guidelines. A background report on the user manual of the Methodological Quality of Guidelines check list. Z Arztl Fortbild Qualitatssich (1998) 92(5):361–365.[Medline]
- Cline C.M., Isrealsson B.Y., Willenheimer R.B., Broms K., Erhardt L.R. Cost effective programme for heart failure reduces hospitalisation. Heart (1998) 80(5):442–446.
[Abstract/Free Full Text]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||