Further thoughts on data and policy indicators a-propos two recent papers on procurement regulation & competition: comments re (Tas: 2019a&b)

The EUI Robert Schuman Centre for Advanced Studies’ working papers series has two interesting recent additions on the economic analysis of procurement regulation and its effects on competition, efficiency and value for money. Both papers are by BKO Tas.

The first paper: ‘Bunching Below Thresholds to Manipulate Public Procurement’ explores the effects of a contracting authority’s ‘bunching strategy’ to seek to exercise more discretion by artificially estimating the value of future contracts just below the thresholds that would trigger compliance with EU procurement rules. This paper is relevant to the broader discussion on the usefulness and adequacy of current EU (and WTO GPA) value thresholds (see eg the work of Telles, here and here), as well as on the regulatory decisions that EU Member States face on whether to extend the EU rules to ‘below-threshold’ contracts.

The second paper: ‘Effect of Public Procurement Regulation on Competition and Cost-Effectiveness’ uses the World Bank’s ‘Benchmarking Public Procurement’ quality scores to empirically test the positive effects of improved regulation quality on competition and value for money, measured as increases in the number of bidders and the probability that procurement price is lower than estimated cost. This paper is relevant in the context of recent discussions about the usefulness or not of procurement benchmarks, and regarding the increasing concern about reduced number of bids in EU-regulated public tenders.

In this blog post, I reflect on the methodology and insights of both papers, paying particular attention to the fact that both papers build on datasets and/or indexes (TED, the WB benchmark) that I find rather imperfect and unsuitable for this type of analysis (regarding TED, in the context of the Single Market Scoreboard for Public Procurement (SMPP) that builds upon it, see here; regarding the WB benchmark, see here). Therefore, not all criticisms below are to the papers themselves, but rather to the distortions that skewed, incomplete or misleading data and indicators can have on more refined analysis that builds upon them.

Bunching Below Thresholds to Manipulate Procurement (Tas: 2019a)

It is well-known that the EU procurement rules are based on a series of jurisdictional triggers and that one of them concerns value thresholds—currently regulated in Arts 4 & 5 of Directive 2014/24/EU. Contracts with an estimated value above those thresholds are subjected to the entire EU procurement regulation, whereas contracts of a lower value are solely subjected to principles-based requirements where they are of ‘cross-border interest’. Given the obvious temptation/interest in keeping procurement shielded from EU requirements, the EU Directives have included an anti-circumvention rule aimed at preventing Member States from artificially splitting contracts in order to keep their award below the relevant jurisdictional thresholds (Art 5(3) Dir 2014/24). This rule has been interpreted expansively by the Court of Justice of the European Union (see eg here).

‘Bunching Below Thresholds to Manipulate Public Procurement’ examines the effects of a practice that would likely infringe the anti-circumvention rule, as it assesses a strategy of ‘bunching estimated costs just below thresholds’ ‘to exercise more discretion in public procurement’. The paper develops a methodology to identify contracting authorities ‘that have higher probabilities of bunching estimated values below EU thresholds’ (ie manipulative authorities) and finds that ‘[m]anipulative authorities have significantly lower probabilities of employing competitive procurement procedure. The bunching manipulation scheme significantly diminishes cost-effectiveness of public procurement. On average, prices of below threshold contracts are 18-28% higher when the authority has an elevated probability of bunching.’ These are quite striking (but perhaps not surprising) results.

The paper employs a regression discontinuity approach to determine the likelihood of bunching. In order to do that, the paper relies on the TED database. The paper is certainly difficult to read and hardly intelligible for a lawyer, but there are some issues that raise important questions. One concerns the authors’ (mis)understanding of how the WTO GPA and the EU procurement rules operate, in particular when the paper states that ‘Contracts covered by the WTO GPA are subject to additional scrutiny by international organizations and authorities (sic). Accordingly, contracts covered by the WTO GPA are less likely to be manipulated by EU authorities’ (p. 12).  This is simply an acritical transplant of considerations made by the authors of a paper that examined procurement in the Czech Republic, where the relevant threshold between EU covered and non-EU covered procurement would make sense. Here, the distinction between WTO GPA and EU-covered procurement simply makes no sense, given that WTO GPA and EU thresholds are coordinated. This alone raises some issues concerning the tests designed by the author to check the robustness of the hypothesis that bunching leads to inefficiency in procurement expenditure.

Another issue concerns the way in which the author equates open procedures to a ‘first price auction mechanism’ (which they are not exactly) and dismisses other procedures (notably, the restricted procedure) as incapable of ensuring value for money or, more likely, as representative of a higher degree of discretion for the contracting authority—which is a highly questionable assumption.

More importantly, I am not sure that the author understood what is in the TED database and, crucially, what is not there (see section 2 of Tas (2019a) for methodology and data description). Albeit not very clearly, the author presents TED as a comprehensive database of procurement notices—ie, as if 100% of procurement expenditure by Member States was recorded there. However, in the specific context of bunching below thresholds, the TED database is very likely to be incomplete.

Contracting authorities tendering contracts below EU thresholds are under no obligation to publish a contract notice (Art 49 Dir 2014/24). They could publish voluntarily, in particular in the form of a voluntary ex ante transparency (VEAT) notice, but that would make no sense from the perspective of a contracting authority that seeks to avoid compliance with EU rules by bunching (ie manipulating) the estimated contract value, as that would expose it to potential litigation. Most authorities that are bunching their procurement needs (or, in simple terms) avoiding compliance with the EU rules will not be reflected in the TED database at all, or will not be identified by the methodology used by Tas (2019a), as they will not have filed any notices for contracts below thresholds.

How is it possible that TED includes notices regarding contracts below the EU thresholds, then? Well, this is anybody’s guess, but mine is that a large proportion of those notices will be linked to either countries with a tradition of full transparency (over-reporting), to contracts where there are any doubts about the potential cross-border interest (sometimes assessed over-cautiously), or will be notices with mistakes, where the estimated value of the contract is erroneously indicated as below thresholds.

Even if my guess was incorrect and all notices for contracts with a value below thresholds were accurate and justified by the existence of a potential cross-border interest, the database cannot be considered complete. One of the issues raised (imperfectly) by the Single Market Scoreboard (indicator [3] publication rate) is the relatively low level of procurement that is advertised in TED compared to the (putative/presumptive) total volume of procurement expenditure by the Member States. Without information on the conditions of the vast majority of contract awards (below thresholds, unreported, etc), any analysis of potential losses of competitiveness / efficiency in public expenditure (due to bunching or otherwise) is bound to be misleading.

Moreover, Tas (2019a) is premised on the hypothesis that procurement below EU thresholds allows for significantly more discretion than procurement above those thresholds. However, this hypothesis fails to recognise the variety of transposition strategies at Member State level. While some countries have opted for less stringent below EU threshold regimes, others have extended the EU rules to the entirety of their procurement (or, perhaps, to contracts up to and including much lower values than the EU thresholds, to the exception of some class of ‘micropurchases’). This would require the introduction of a control that could refine Tas’ analysis and distinguish those cases of bunching that do lead to more discretion and those that do not (at least formally)—which could perhaps distinguish between price effects derived from national-only transparency from those of more legally-dubious maneuvering.

In my view, regardless of the methodology and the math underpinning the paper (which I am in no position to assess in detail), once these data issues are taken into account, the story the paper tries to tell breaks down and there are important shortcomings in its empirical strategy that, in my view, raise significant issues around the strength of its findings—assessed not against the information in TED, but against the (largely unknown, unrecorded) reality of procurement in the EU.

I have no doubt that there is bunching in practice, and that the intuition that it raises procurement costs must be right, but I have serious doubts about the possibility to reliably identify bunching or estimate its effects on the basis of the information in TED, as most culprits will not be included and the effects of below threshold (national) competition only will mostly not be accounted for.

(Good) Regulation, Competition & Cost-Effectiveness (Tas: 2019b)

It is also a very intuitive hypothesis that better regulation should lead to better procurement outcomes and, consequently, that more open and robust procurement rules should lead to more efficiency in the expenditure of public funds. As mentioned above, Tas (2019b) explores this hypothesis and seeks to empirically test it using the TED database and the World Bank’s Benchmarking Public Procurement (in its 2017 iteration, see here). I will not repeat my misgivings about the use of the TED database as a reliable source of information. In this second part, I will solely comment on the use of the WB’s benchmark.

The paper relies on four of the WB’s benchmark indicators (one further constructed by Djankov et al (2017)): the ‘bid preparation score, bid and contract management score, payment of suppliers score and PP overall index’. The paper includes a useful table with these values (see Tas (2019b: Table 4)), which allows the author to rank the countries according to the quality of their procurement regulation. The findings of Tas (2019b) are thus entirely dependent on the quality of the WB’s benchmark and its ability to capture (and distinguish) good procurement regulation.

In order to test the extent to which the WB’s benchmark is a good input for this sort of analysis, I have compared it to the indicator that results from the European Commission’s Single Market Scoreboard for Public Procurement (SMSPP, in its 2018 iteration). The comparison is rather striking …

Source: own elaboration.

Source: own elaboration.

Clearly, both sets of indicators are based on different methodologies and measure relatively different things. However, they are both intended to express relevant regulators’ views on what constitutes ‘good procurement regulation’. In my view, both of them fail to do so for reasons already given (see here and here).

The implications for work such as Tas (2019b) is that the reliability of the findings—regardless of the math underpinning them—is as weak as the indicators they are based on. Likely, plugging the same methods to the SMSPP instead of the WB’s index would yield very different results—perhaps, that countries with very low quality of procurement regulation (as per the SMSPP index) achieve better economic results, which would not be a popular story with policy-makers…  and the results with either index would also be different if the algorithms were not fed by TED, but by a more comprehensive and reliable database.

So, the most that can be said is that attempts to empirically show effects of good (or poor) procurement regulation remain doomed to fail or , in perhaps less harsh terms, doomed to tell a story based on a very skewed, narrow and anecdotal understanding of procurement and an incomplete recording of procurement activity. Believe those stories at your own peril…

World Bank's "Benchmarking Public Procurement 2017"

The World Bank has recently published its report Benchmarking Public Procurement 2017, where it presents a 'cross-country analysis in 180 economies on issues affecting how private sector does business with the government. The report covers two thematic pillars: the procurement process and complaint review mechanisms'.

The information is structured around eight main indicators, which cover the following areas:

  1. Needs assessment, call for tender, and bid preparation: The indicators assess the quality, adequacy, and transparency of the information provided by the procuring entity to prospective bidders.
  2. Bid submission phase: The indicators examine the requirements that suppliers must meet in order to bid effectively and avoid having their bid rejected.
  3. Bid opening, evaluation, and contract award phase: The indicators measure the extent to which the regulatory framework and procedures provide a fair and transparent bid opening and evaluation process, as well as whether, once the best bid has been identified, the contract is awarded transparently and the losing bidders are informed of the procuring entity’s decision.
  4. Content and management of the procurement contract: The indicators focus on several aspects during the contract execution phase related to the modification and termination of the procurement contract, and the procedure for accepting the completion of works.
  5. Performance guarantee: The indicators examine the existence and requirements of the performance guarantee.
  6. Payment of suppliers: The indicators focus on the time and procedure needed for suppliers to receive payment during the contract execution phase.
  7. Complaints submitted to the first-tier review body: The indicators explore the process and characteristics of filing a complaint before the first-tier review body.
  8. Complaints submitted to the second-tier review body: The indicators assess whether the complaining party can appeal a decision before a second-tier review body and, if so, the cost and time spent and characteristics for such a review. 

The report aims to make progress in the much needed collection of more information, particularly of statistical nature, about the procurement systems that exist around the world. In its own words, '[i]t aims to promote evidence-based decision making by governments and to build evidence in areas where few empirical data have been presented so far. As researchers recognize, “the comparison of different forms of regulation and quantitative measurement of the impact of regulatory changes on procurement performance of public entities will help reduce the costs of reform and identify and disseminate best practices.”' [with reference to Yakovlev, Tkachenko, Demidova & Balaeva, 'The Impacts of Different Regulatory Regimes on the Effectiveness of Public Procurement' (2015) 38 (11) International Journal of Public Administration 796-814].

The report also recognises some of its main substantive and methodological limitations (see p.26). However, even taking those into account, the benchmarking exercise seems rather imperfect and with limited potential to inform policy-making and reform. A couple of examples will illustrate why. 

First, in terms of the methodology for the scoring of procurement systems, I am not sure I understand the logic for the award of points or the scale used to weight the different criteria. For instance, when assessing the accessibility of the procurement process, procurement systems are awarded 1 point if bidders are required to register on a government registry of suppliers, and 0 points if there is no registration requirement. To my mind, this is contrary to what logic would dictate because a system that does not require previous or additional registration is more open than one that does.

Similarly, when assessing the existence and requirements for the provision of bid securities, procurement systems get get a score of 1 for either option they provide in a range of questions concerning whether a bid security or a bid declaration is required, whether the bid security amount is no more than a certain percentage of the contract value or value of the submitted bid, or no more than a certain flat amount; whether suppliers have choice regarding the form of bid security instrument; or if bidders are required to post a bid security instrument, whether there is a time frame for the procuring entity to return the instrument. Additionally, procurement systems are awarded an additional maximum of 1 point for each of the forms of bid security instrument they accept: cash deposit, bank guarantee, insurance guarantee (1/3 of a point each). This means that systems that have more flexibility in the way they regulate bid securities will get higher scores (which is fair enough), but that systems that do not require bid securities will get no points. This, for instance, makes the UK (50 points) lag behind Spain (94 points) in this indicator, despite the fact that the UK is recorded as having no bid security requirement and Spain being recorded as requiring a bid security proportionate to the value of the contract (I am not assessing this information which, at least in the case of Spain, requires some nuances). once again, this is contrary to what logic would dictate because procurement systems that do not require bid securities are more open and accessible (particularly to SMEs).

Second, in terms of the comparisons that can be made with the scores as published, I am not sure that the way the information is presented can actually help understand the drivers of different scores for different countries. Most points are awarded on the basis of a yes/no answer to given questions. Given that some questions are rather open-ended or simply confusing (eg the question concerning Criteria for bid evaluation queries whether the procurement system includes "Price and other qualitative elements", but all procurement systems get a score of 1 regardless of the answer), their ability to allow for comparisons is minimal. Moreover, the individual scoring for each criterion is not provided, which prevents direct comparisons even where questions are narrower and actually award different scores to different answers.

Overall, sadly, I am afraid that the report Benchmarking Public Procurement 2017 can only be seen as a first step towards creating a useful system and scoring matrix to benchmark all public procurement systems in the world. I would think that this is possible, particularly once the field work of information collection is in place (unless it was collected as direct responses to the questionnaire linked to the scoring rule) and that the published version of the report can be significantly improved solely on the basis of a better analysis of the raw information collected by the World Bank team. On that point, it is a shame that this information is not published by the World Bank and I would invite them to reconsider the possibility of publishing the database of raw information, so that more specific proposals on how to improve the scoring method without having to collect additional information can be developed.