The US health care system is transforming how it pays for and delivers care. New payment models and benefit designs aim to promote and sustain improvements in health and care delivery, specifically focusing on better outcomes and lower costs. Supporting better payment policies is increasingly critical as rising costs affect the ability of individuals to afford coverage.
However, the transition to new models is not easy. Published studies of payment reforms have shown mixed results, leaving providers, payers, purchasers, and patients uncertain about how to proceed.
There is a strong consensus among health care stakeholders about the need for better evidence on the impact of payment reforms. Better evidence would give providers and the public more confidence in implementing payment and delivery reforms—and also knowing which ones to avoid. Evaluations can help stakeholders identify where to make necessary changes. In short, better evidence would ensure that we realize the promise of payment reforms to improve care and reduce costs.
To accelerate the development of better evidence, the Duke-Margolis Center for Health Policy established the Payment Reform Evidence Hub, with support from the Laura and John Arnold Foundation and guidance from a multistakeholder expert working group. To support the Hub’s efforts, we built an inventory consisting of all the evaluations we could locate, 355 in total, of various payment reform initiatives implemented by commercial health plans, Medicare, Medicaid, and other state programs. The inventory has helped us assess the state of available evidence and identify gaps.
What Specific Types Of Payment Reforms Need Better Evidence?
We categorized the reports and results of payment reform evaluations using the categories of payment reforms developed by the Health Care Payment Learning and Action Network (LAN). LAN Category 1 is traditional fee-for-service (FFS) arrangements. The other LAN payment reform categories are:
- Category 2: Pay-for-performance initiatives, generally consisting of rewards or penalties built on top of the traditional FFS system (that is, FFS payment adjustments based on measures of quality or value).
- Category 3: Payment reforms based on a FFS infrastructure that includes some accountability for results at the level of a population of patients, generally either for episodes of care or for all services. These payments are divided into two subcategories. Category 3A includes accountable care organizations (ACOs) that share savings or losses based on quality and total spending performance for a population of patients on top of FFS payments but do not require any downside risk for the accountable providers. Category 3B payments include bundled payments for an episode of care with some downside risk, again in conjunction with FFS payments for the providers involved.
- Category 4: Population-based payments mainly tied to patients rather than services. This requires partial or full capitation with substantial adjustments for quality performance—a major change away from FFS.
About 40 percent of evaluations of LAN Categories 2 – 4 address pay for performance and other Category 2 payments. About 60 percent focus on Category 3 and Category 4 payments. Of those, the plurality examine ACOs, a slightly smaller percentage focus on bundled payments or episode-based payments for procedures, and a very few look at Category 4 population-based payments not closely tied to FFS. This distribution is consistent with a 2016 LAN survey across a set of public and private payers.
Figure 1. Summary of the Hub inventory of evaluations according to LAN payment category, with examples of each payment category shown in the right-hand column
However, our findings raise questions about the overall rigor and diversity of the evidence presented in these studies. Of the private payer evaluations of Category 3 and 4 payments, 51 percent are “internal,” which meant they did not release details of how the reform was assessed or detailed results, or they were studies in which we could not identify the type of assessment done. Category 3A and Category 4 payments have the highest proportion of internal studies. We also have just 17 evaluations of Medicaid payment reforms, showing a strong need for more evaluations of state-run programs. Most of the rigorous evaluations that have been released publicly with detailed methodologies are from federal government programs or federally funded initiatives, as a result of statutory requirements for Medicare and Center for Medicare and Medicaid Innovation (Innovation Center) programs.
Our results show a substantial need for additional evidence across health care programs, especially given the importance of Category 3 and Category 4 payments for the future of payment reform and their growing prevalence in commercial and state programs. For example, the number of ACOs nationwide continues to grow, with more than 28 million people now covered by an accountable care arrangement. There are shifts to Category 4 payments by states and private insurers. Category 3 and Category 4 payments are likely to qualify as Medicare Access and CHIP Reauthorization Act (MACRA) alternative payment models (APMs), and Category 4 payments with significant “downside” financial risk may qualify as advanced APMs eligible for additional bonus payments. While pilots of these APMs in Medicare will eventually be evaluated, there remains a clear gap in evidence on similar or more advanced models outside of Medicare.
Geographic Disparities In Evaluations
The impact of payment reforms may differ across health care markets, based on differences in population demographics, rural versus urban areas, provider capabilities in the region, and other market features. We found substantial variation in the number of value-based payment reform evaluations by state, as shown in Figure 2. Evaluations are more likely to occur in states with large urban areas, and less so in more rural states.
In states with fewer evaluations, the vast majority of them are Medicare or Innovation Center multistate evaluations of national programs. (This is why the minimum number of observations for a state is 35, rather than zero.) These national evaluations may not be designed to provide the evidence necessary to draw conclusions about distinct consequences in particular states, especially less populous and rural states. This figure covers all types of payment reforms; there is further variation in terms of specific types of payment reforms across states.
Figure 2: Number of Payment Reform Evaluations by State
Evaluators can use a variety of study designs to develop relevant evidence, while also reflecting the needs, resources, and practical realities of the organizations involved. Some methods can deliver faster results at a lower cost, potentially enabling more rapid modification of reforms. Other methods take longer and require greater resources to support sophisticated analysis but can deliver more comprehensive and more reliable evidence. Figure 3 shows the distribution of study designs from the inventory, among the studies for which methods could be determined. No study design is dominant.
Figure 3: Study design of evaluations with known methodologies
The emphasis, then, should be on making it easier for individual organizations to conduct evaluations, regardless of the study design they choose. Supporting feasible improvements in both the design and basic transparency of reported evaluations will allow stakeholders to see best where and how reforms have worked elsewhere, fostering collective learning and allowing them to better implement their own reforms in the future. While the trade-off described in the paragraph above will always be present to some extent, there are federal, state, and commercial evaluations that have produced quality, publishable evidence in a timely manner.
Moving Forward: Steps Toward A Better Evidence Base
Our inventory of evaluations is intended to serve as a resource for identifying existing methods and evaluations that can inform future work, as well as to help assess findings. The inventory can be updated to reflect progress and encourage more evaluations.
The initial results from the inventory found particularly large gaps in evidence for state and commercial-based payment reforms, as well as for payment reforms in less urban geographies and many regions and populations. At the same time, a wide variety of payment reforms are underway that affect these populations. The large gaps between reform implementation and applicable evidence from those reforms are a significant obstacle to incorporating what we’ve learned into payment reform efforts underway now.
To help address these gaps, the Payment Reform Evidence Hub is taking steps designed to increase the capacity for implementing evaluations. We have consistently heard widespread interest from the public and private sectors in addressing the gaps in the evidence base. However, this interest is often not enough to overcome the significant challenges and barriers that make carrying out an evaluation difficult. Stakeholders frequently have concerns about data acquisition, control over information, statistical power, and the financial costs and time required to participate in payment reform evaluations. These findings suggest that it is essential to meet states and employers where they are in developing tools and other resources to support a range of evaluation approaches based on their needs and capabilities.
We are seeking to help states and employers connect with resources to assist them in carrying out practical but meaningful evaluations of payment reforms. There are opportunities for such evaluations in the payment reforms supported by LAN’s Action Collaboratives in primary care, episode payments, and other areas. Multipayer evaluations of programs such as the Comprehensive Primary Care Plus pilot can take advantage of existing data-sharing agreements that can produce aggregated analyses relevant to federal, state, and commercial stakeholders. Actuarial analyses that guide employers and states in their health plan contracting can be another potential source of useful evidence. Synthesizing their results with evaluations could provide a more informed context for future contracting decisions.
There are major opportunities for improving the evidence base on payment reforms. Much is at stake for health outcomes and health care costs and the success of further health care reform efforts.
Appendix: Methods For Constructing The Evaluation Inventory
We built the inventory of payment reform evaluations from multiple sources. First, we drew from three existing inventories: the Center for Medicare and Medicaid Services Innovation’s portfolio catalogue; the Patient-Centered Primary Care Collaborative list of innovations and programs; and the Catalyst for Payment Reform’s National Compendium on Payment Reform. We also integrated the 2014 RAND report on evaluations of pay-for-performance programs, ACOs, and bundled payment programs. Finally, we extended the RAND report by using its PubMed search terms to identify additional evaluations up to December 31, 2016. For the extended search, we included only evaluations of programs in the United States that had some reporting of results or outcomes.
We also identified information about reforms in publicly available resources. These resources included: academic journal articles identified through an ad hoc Google Scholar search, online public reports about Medicare demonstrations and pilots, state reports on Medicaid innovations, government reports, “gray” literature, and websites of payers and providers.
Finally, we used social media, with keyword searches such as #ACO. These searches identified organizations publicizing their alternative payment models. We ran the search using the December 31, 2016, cutoff date and aggregated the findings into our inventory.
This post is part of a project funded by the Laura and John Arnold Foundation. The post is an independent work product, and the views expressed are those of the authors and not necessarily those of the funder. We would also like to thank Rob Saunders and the Evidence Hub expert working group for their guidance in the development of this paper.