Oracles as a sub-hype in blockchain discussions, or how my puppy helps me get to 10,000 steps a day

Photo: Rob Alcaraz/The Wall Street Journal.

Photo: Rob Alcaraz/The Wall Street Journal.

The more I think about the use of blockchain solutions in the context of public procurement governance—and, more generally, of public services delivery—the more I find that the inability for blockchain technology to reliably connect to the ‘real world’ is bound to restrict any potentially useful applications to back-office functions and the procurement of strictly digital assets.

This is simply because blockchain can only generate its desirable effects of tamper-evident record-keeping and automated execution of smart contracts built on top of it to the extent that it does not require off-chain inputs. Blockchain is also structurally incapable of generating off-chain outputs by itself.

This is increasingly widely-known and is generating a sub-hype around oracles—which are devices aimed at plugging blockchains to the ‘real world’, either by feeding the blockchain with data, or by outputting data from the blockchain (as discussed eg here). In this blog post, I reflect on the minimal changes that I think the development of oracles is likely to have in the context of public procurement governance.

Why would blockchain be interesting in this context?

Generally, the potential for the use of blockchain and blockchain-enabled smart contracts to improve procurement governance is linked to the promise that it can help prevent corruption and mistakes through the automation of decision-making through the procurement process and the execution of public contracts and the immutability (rectius, tamper-evidence) of procurement records. There are two main barriers to the achievement of such improvements over current processes and governance mechanisms. One concerns transactions costs and information asymmetries (as briefly discussed here). The other concerns the massive gap between the virtual on-chain reality and the off-chain real world—which oracles are trying to bridge.

The separation between on-chain and off-chain reality is paramount to the analysis of governance issues and the impact blockchain can have. If blockchain can only displace the focus of potential corrupt or mistaken intervention—by the public buyer, or by public contractors—but not eliminate such risks, its potential contribution to a revolution of procurement governance certainly reduces in various orders of magnitude. So it is important to assess the extent to which blockchain can be complemented with other solutions (oracles) to achieve the elimination of points of entry for corrupt or mistaken activity, rather than their displacement or substitution.

Oracle’s vulnerabilities: my puppy wears my fitbit

In simple terms, oracles are data interfaces that connect a blockchain to a database or a source of data (for a taxonomy and some discussion, see here). This makes them potentially unreliable as (i) the oracle can only be as good as the data it relies on and (ii) the oracle can itself be manipulated. There are thus, two main sources of oracle vulnerability, which automatically translate into blockchain vulnerability.

First, the data can be manipulated—like when I prefer to sit and watch some TV rather than go for a run and tie my fitbit to my puppy’s collar so that, by midnight, I have still achieved my 10,000 daily steps.* Second, the oracle itself can be manipulated because it is a piece of software or hardware that can be tampered with, and perhaps in a way that is not readily evident and which uncovering requires some serious IT forensics—like getting a friend to crack fitbit’s code and add 10,000 daily steps to my database without me even needing to charge my watch.**

Unlilke when these issues concern the extent to which I lie to myself about my healthy lifestyle, these two vulnerabilities are highly problematic from a public governance perspective because, unless the data used in the interaction with the blockchain is itself automatically generated in a way that cannot be manipulated (and this starts to point at a mirror within a mirror situation, see below), the effect of implementing a blockchain plus oracle simply seems to be to displace the governance focus where controls need to be placed towards the source of the data and the devices used to collect it.

But oracles can get better! — sure, but only to deal with data

The sub-hype around oracles in blockchain discussions basically follows the same trend as the main hype around blockchain. The same way it is assumed that blockchain is bound to revolutionise everything because it will get so much better than it currently is, there are emerging arguments about the almost boundless potential for oracles to connect the real world to the blockchain in so much better ways. I do not have the engineering or futurology credentials necessary to pass judgement on this, but it seems to me plain to see that—unless we want to add an additional layer about robotics (and pretty evolved robotics at that), so that we consider blockchain+oracle+robot solutions—any and all advances will remain limited to improving the way data is generated/captured and exploited within and outside the blockchain.

So, for everything that is not data-based or data-transformable (such as the often used example of event tickets, which in the end get plugged back to a database that determines their effects in the real world)—or, in other words, where moving digital tokes around does not generate the necessary effects in the real world—even much advanced blockchain+oracle solutions are likely to remain of limited use in the context of procurement and the delivery of public services. Not because the applications are not (technically) possible, but because they generate governance problems that merely replace the current ones. And the advantage is not necessarily obvious.

How far can we displace governance problems and still reap some advantages?

What do I mean that the advantage is not necessarily obvious? Well, imagine the possibility of having a blockchain+oracle control the inventory of a given consumable, so that the oracle feeds information into the blockchain about the existing level of stock and about new deliveries made by the supplier, so that automated payments are made eg on a per available unit basis. This could be seen as a possible application to avoid the need for different ways of controlling the execution of the contract—or even for the need to procure the consumable in the first place, if a smart contract in the blockchain (the same, or a separate one) is automatically buying them on the basis of a closed system (eg a framework agreement or dynamic purchasing system based on electronic catalogues) or even in the ‘open market’ of the internet. Would this not be advantageous from a governance perspective?

Well, I think it would be a matter of degree because there would still need to be a way of ensuring that the oracle is not tampered with and that what the oracle is capturing reflects reality. There are myriad ways in which you could manipulate most systems—and, given the right economic incentives, there will always be attempts to manipulate even the most sophisticated systems we may want to put in place—so checks will always be needed. At this stage, the issue becomes one of comparing the running costs of the system. Unless the cost of the blockchain+oracle+new checks (plus the cybersecurity needed to keep them up and properly running) is lower than the cost of existing systems (including inefficiencies derived from corruption and mistakes), there is no obvious advantage and likely no public interest in the implementation of solutions based on these disruptive technologies.

Which leads me to the new governance issue that has started to worry me: the control of ‘business cases’ for the implementation of blockchain-based solutions in the context of public procurement (and public governance more generally). Given the lack of data and the difficulty in estimating some of the risks and costs of both the existing systems and any proposed new blockchain solutions, who is doing the math and on the basis of what? I guess convincingly answering this will require some more research, but I certainly have a hunch that not much robust analysis is going on…

_____

* I do not have a puppy, though, so I really end up doing my own running…

** I am not sure this is technically doable, but hopefully it works for the sake of the example…

Further thoughts on data and policy indicators a-propos two recent papers on procurement regulation & competition: comments re (Tas: 2019a&b)

The EUI Robert Schuman Centre for Advanced Studies’ working papers series has two interesting recent additions on the economic analysis of procurement regulation and its effects on competition, efficiency and value for money. Both papers are by BKO Tas.

The first paper: ‘Bunching Below Thresholds to Manipulate Public Procurement’ explores the effects of a contracting authority’s ‘bunching strategy’ to seek to exercise more discretion by artificially estimating the value of future contracts just below the thresholds that would trigger compliance with EU procurement rules. This paper is relevant to the broader discussion on the usefulness and adequacy of current EU (and WTO GPA) value thresholds (see eg the work of Telles, here and here), as well as on the regulatory decisions that EU Member States face on whether to extend the EU rules to ‘below-threshold’ contracts.

The second paper: ‘Effect of Public Procurement Regulation on Competition and Cost-Effectiveness’ uses the World Bank’s ‘Benchmarking Public Procurement’ quality scores to empirically test the positive effects of improved regulation quality on competition and value for money, measured as increases in the number of bidders and the probability that procurement price is lower than estimated cost. This paper is relevant in the context of recent discussions about the usefulness or not of procurement benchmarks, and regarding the increasing concern about reduced number of bids in EU-regulated public tenders.

In this blog post, I reflect on the methodology and insights of both papers, paying particular attention to the fact that both papers build on datasets and/or indexes (TED, the WB benchmark) that I find rather imperfect and unsuitable for this type of analysis (regarding TED, in the context of the Single Market Scoreboard for Public Procurement (SMPP) that builds upon it, see here; regarding the WB benchmark, see here). Therefore, not all criticisms below are to the papers themselves, but rather to the distortions that skewed, incomplete or misleading data and indicators can have on more refined analysis that builds upon them.

Bunching Below Thresholds to Manipulate Procurement (Tas: 2019a)

It is well-known that the EU procurement rules are based on a series of jurisdictional triggers and that one of them concerns value thresholds—currently regulated in Arts 4 & 5 of Directive 2014/24/EU. Contracts with an estimated value above those thresholds are subjected to the entire EU procurement regulation, whereas contracts of a lower value are solely subjected to principles-based requirements where they are of ‘cross-border interest’. Given the obvious temptation/interest in keeping procurement shielded from EU requirements, the EU Directives have included an anti-circumvention rule aimed at preventing Member States from artificially splitting contracts in order to keep their award below the relevant jurisdictional thresholds (Art 5(3) Dir 2014/24). This rule has been interpreted expansively by the Court of Justice of the European Union (see eg here).

‘Bunching Below Thresholds to Manipulate Public Procurement’ examines the effects of a practice that would likely infringe the anti-circumvention rule, as it assesses a strategy of ‘bunching estimated costs just below thresholds’ ‘to exercise more discretion in public procurement’. The paper develops a methodology to identify contracting authorities ‘that have higher probabilities of bunching estimated values below EU thresholds’ (ie manipulative authorities) and finds that ‘[m]anipulative authorities have significantly lower probabilities of employing competitive procurement procedure. The bunching manipulation scheme significantly diminishes cost-effectiveness of public procurement. On average, prices of below threshold contracts are 18-28% higher when the authority has an elevated probability of bunching.’ These are quite striking (but perhaps not surprising) results.

The paper employs a regression discontinuity approach to determine the likelihood of bunching. In order to do that, the paper relies on the TED database. The paper is certainly difficult to read and hardly intelligible for a lawyer, but there are some issues that raise important questions. One concerns the authors’ (mis)understanding of how the WTO GPA and the EU procurement rules operate, in particular when the paper states that ‘Contracts covered by the WTO GPA are subject to additional scrutiny by international organizations and authorities (sic). Accordingly, contracts covered by the WTO GPA are less likely to be manipulated by EU authorities’ (p. 12).  This is simply an acritical transplant of considerations made by the authors of a paper that examined procurement in the Czech Republic, where the relevant threshold between EU covered and non-EU covered procurement would make sense. Here, the distinction between WTO GPA and EU-covered procurement simply makes no sense, given that WTO GPA and EU thresholds are coordinated. This alone raises some issues concerning the tests designed by the author to check the robustness of the hypothesis that bunching leads to inefficiency in procurement expenditure.

Another issue concerns the way in which the author equates open procedures to a ‘first price auction mechanism’ (which they are not exactly) and dismisses other procedures (notably, the restricted procedure) as incapable of ensuring value for money or, more likely, as representative of a higher degree of discretion for the contracting authority—which is a highly questionable assumption.

More importantly, I am not sure that the author understood what is in the TED database and, crucially, what is not there (see section 2 of Tas (2019a) for methodology and data description). Albeit not very clearly, the author presents TED as a comprehensive database of procurement notices—ie, as if 100% of procurement expenditure by Member States was recorded there. However, in the specific context of bunching below thresholds, the TED database is very likely to be incomplete.

Contracting authorities tendering contracts below EU thresholds are under no obligation to publish a contract notice (Art 49 Dir 2014/24). They could publish voluntarily, in particular in the form of a voluntary ex ante transparency (VEAT) notice, but that would make no sense from the perspective of a contracting authority that seeks to avoid compliance with EU rules by bunching (ie manipulating) the estimated contract value, as that would expose it to potential litigation. Most authorities that are bunching their procurement needs (or, in simple terms) avoiding compliance with the EU rules will not be reflected in the TED database at all, or will not be identified by the methodology used by Tas (2019a), as they will not have filed any notices for contracts below thresholds.

How is it possible that TED includes notices regarding contracts below the EU thresholds, then? Well, this is anybody’s guess, but mine is that a large proportion of those notices will be linked to either countries with a tradition of full transparency (over-reporting), to contracts where there are any doubts about the potential cross-border interest (sometimes assessed over-cautiously), or will be notices with mistakes, where the estimated value of the contract is erroneously indicated as below thresholds.

Even if my guess was incorrect and all notices for contracts with a value below thresholds were accurate and justified by the existence of a potential cross-border interest, the database cannot be considered complete. One of the issues raised (imperfectly) by the Single Market Scoreboard (indicator [3] publication rate) is the relatively low level of procurement that is advertised in TED compared to the (putative/presumptive) total volume of procurement expenditure by the Member States. Without information on the conditions of the vast majority of contract awards (below thresholds, unreported, etc), any analysis of potential losses of competitiveness / efficiency in public expenditure (due to bunching or otherwise) is bound to be misleading.

Moreover, Tas (2019a) is premised on the hypothesis that procurement below EU thresholds allows for significantly more discretion than procurement above those thresholds. However, this hypothesis fails to recognise the variety of transposition strategies at Member State level. While some countries have opted for less stringent below EU threshold regimes, others have extended the EU rules to the entirety of their procurement (or, perhaps, to contracts up to and including much lower values than the EU thresholds, to the exception of some class of ‘micropurchases’). This would require the introduction of a control that could refine Tas’ analysis and distinguish those cases of bunching that do lead to more discretion and those that do not (at least formally)—which could perhaps distinguish between price effects derived from national-only transparency from those of more legally-dubious maneuvering.

In my view, regardless of the methodology and the math underpinning the paper (which I am in no position to assess in detail), once these data issues are taken into account, the story the paper tries to tell breaks down and there are important shortcomings in its empirical strategy that, in my view, raise significant issues around the strength of its findings—assessed not against the information in TED, but against the (largely unknown, unrecorded) reality of procurement in the EU.

I have no doubt that there is bunching in practice, and that the intuition that it raises procurement costs must be right, but I have serious doubts about the possibility to reliably identify bunching or estimate its effects on the basis of the information in TED, as most culprits will not be included and the effects of below threshold (national) competition only will mostly not be accounted for.

(Good) Regulation, Competition & Cost-Effectiveness (Tas: 2019b)

It is also a very intuitive hypothesis that better regulation should lead to better procurement outcomes and, consequently, that more open and robust procurement rules should lead to more efficiency in the expenditure of public funds. As mentioned above, Tas (2019b) explores this hypothesis and seeks to empirically test it using the TED database and the World Bank’s Benchmarking Public Procurement (in its 2017 iteration, see here). I will not repeat my misgivings about the use of the TED database as a reliable source of information. In this second part, I will solely comment on the use of the WB’s benchmark.

The paper relies on four of the WB’s benchmark indicators (one further constructed by Djankov et al (2017)): the ‘bid preparation score, bid and contract management score, payment of suppliers score and PP overall index’. The paper includes a useful table with these values (see Tas (2019b: Table 4)), which allows the author to rank the countries according to the quality of their procurement regulation. The findings of Tas (2019b) are thus entirely dependent on the quality of the WB’s benchmark and its ability to capture (and distinguish) good procurement regulation.

In order to test the extent to which the WB’s benchmark is a good input for this sort of analysis, I have compared it to the indicator that results from the European Commission’s Single Market Scoreboard for Public Procurement (SMSPP, in its 2018 iteration). The comparison is rather striking …

Source: own elaboration.

Source: own elaboration.

Clearly, both sets of indicators are based on different methodologies and measure relatively different things. However, they are both intended to express relevant regulators’ views on what constitutes ‘good procurement regulation’. In my view, both of them fail to do so for reasons already given (see here and here).

The implications for work such as Tas (2019b) is that the reliability of the findings—regardless of the math underpinning them—is as weak as the indicators they are based on. Likely, plugging the same methods to the SMSPP instead of the WB’s index would yield very different results—perhaps, that countries with very low quality of procurement regulation (as per the SMSPP index) achieve better economic results, which would not be a popular story with policy-makers…  and the results with either index would also be different if the algorithms were not fed by TED, but by a more comprehensive and reliable database.

So, the most that can be said is that attempts to empirically show effects of good (or poor) procurement regulation remain doomed to fail or , in perhaps less harsh terms, doomed to tell a story based on a very skewed, narrow and anecdotal understanding of procurement and an incomplete recording of procurement activity. Believe those stories at your own peril…

Data and procurement policy: some thoughts on the Single Market Scoreboard for public procurement

There is a growing interest in the use of big data to improve public procurement performance and to strengthen procurement governance. This is a worthy endeavour and, like many others, I am concentrating my research efforts in this area. I have not been doing this for too long. However, soon after one starts researching the topic, a preliminary conclusion clearly emerges: without good data, there is not much that can be done. No data, no fun. So far so good.

It is thus a little discouraging to confirm that, as is widely accepted, there is no good data architecture underpinning public procurement practice and policy in the EU (and elsewhere). Consequently, there is a rather limited prospect of any real implementation of big data-based solutions, unless and until there is a significant investment in the creation of a proper data foundation that can enable advanced analysis and policy-making. Adopting the Open Contracting Data Standard for the European Union would be a good place to start. We could then discuss to what extent the data needs to be fully open (hint: it should not be, see here and here), but let’s save that discussion for another day.

What a recent twitter threat has reminded me is that there is a bigger downside to the existence of poor data than being unable to apply advanced big data analytics: the formulation of procurement policy on the basis of poor data and poor(er) statistical analysis.

This reflection emerged on the basis of the 2018 iteration of the Single Market Scoreboard for Public Procurement (the SMSPP), which is the closest the European Commission is getting to data-driven policy analysis, as far as I can see. The SMSPP is still work in progress. As such, it requires some close scrutiny and, in my view, strong criticism. As I will develop in the rest of this post, the SMSPP is problematic not solely in the way it presents information—which is clearly laden by implicit policy judgements of the European Commission—but, more importantly, due to its inability to inform either cross-sectional (ie comparative) or time series (ie trend) analysis of public procurement policy in the single market. Before developing these criticisms, I will provide a short description of the SMSPP (as I understand it).

The Single Market Scoreboard for Public Procurement: what is it?

The European Commission has developed the broader Single Market Scoreboard (SMS) as an instrument to support its effort of monitoring compliance with internal market law. The Commission itself explains that the “scoreboard aims to give an overview of the practical management of the Single Market. The scoreboard covers all those areas of the Single Market where sufficient reliable data are available. Certain areas of the Single Market such as financial services, transport, energy, digital economy and others are closely monitored separately by the responsible Commission services“ (emphasis added). The SMS organises information in different ways, such as by stage in the governance cycle; by performance per Member State; by governance tool; by policy area or by state of trade integration and market openness (the latter two are still work in progress).

The SMS for public procurement (SMSPP) is an instance of SMS by policy area. It thus represents the Commission’s view that the SMSPP is (a) based on sufficiently reliable data, as it is fed from the database resulting from the mandatory publications of procurement notices in the Tenders Electronic Daily (TED), and (b) a useful tool to provide an overview of the functioning of the single market for public procurement or, in other words of the ‘performance’ of public procurement, defined as a measure of ‘whether purchasers get good value for money‘.

The SMSPP determines the overall performance of a given Member States by aggregating a number of indicators. Currently, the SMSPP is based on 12 indicators (it used to be based on a smaller number, as discussed below): [1] Single bidder; [2] No calls for bids; [3] Publication rate; [4] Cooperative procurement; [5] Award criteria; [6] Decision speed; [7] SME contractors; [8] SME bids; [9] Procedures divided into lots; [10] Missing calls for bids; [11] Missing seller registration numbers; [12] Missing buyer registration numbers. As the SMSPP explains, the addition of these indicators results in the measure of ‘overall performance’, which

is a sum of scores for all 12 individual indicators (by default, a satisfactory performance in an individual indicator increases the overall score by one point while an unsatisfactory performance reduces it by one point). The 3 most important are triple-weighted (Single bidder, No calls for bids and Publication rate). This is because they are linked with competition, transparency and market access–the core principles of good public procurement. Indicators 7-12 receive a one-third weighting. This is because they measure the same concepts from different perspectives: participation by small firms (indicators 7-9) and data quality (indicators 10-12).

The most recent snapshot of overall procurement performance is represented in the map below, which would indicate that procurement policy is rather disfunctional—as most EEA countries do not seem to be doing very well.

Source: European Commission, 2018 Single Market Scorecard for Public Procurement (based on 2017 data).

Source: European Commission, 2018 Single Market Scorecard for Public Procurement (based on 2017 data).

In my view, this use of the available information is very problematic: (a) to begin with, because the data in TED can hardly be considered ‘sufficiently reliable‘. The database in TED has problems of various sorts because it is a database that is constructed as a result of the self-declaration of data by the contracting authorities of the Member States, which makes its content very dishomogeneous and difficult to analyse, including significant problems of under-inclusiveness, definitional fuzziness and the lack of filtering of errors—as recognised, repeatedly, in the methodology underpinning the SMSPP itself. This should make one take the results of the SMSPP with more than a pinch of salt. However, these are not all the problems implicit in the SMSPP.

More importantly: (b) the definition of procurement performance and the ways in which the SMSPP seeks to assess it are far from universally accepted. They are rather judgement-laden and reflect the policy biases of the European Commission without making this sufficiently explicit. This issue requires further elaboration.

The SMSPP as an expression of policy-making: more than dubious judgements

I already criticised the Single Market Scoreboard for public procurement three years ago, mainly on the basis that some of the thresholds adopted by the European Commission to establish whether countries performed well or poorly in relation to a given indicator were not properly justified or backed by empirical evidence. Unfortunately, this remains the case and the Commission is yet to make a persuasive case for its decision that eg, in relation to indicator [4] Cooperative procurement, countries that aggregate 10% or more of their procurement achieve good procurement performance, while countries that aggregate less than 10% do not.

Similar issues arise with other indicators, such as [3] Publication rate, which measures the value of procurement advertised on TED as a proportion of national Gross Domestic Product (GDP). It is given threshold values of more than 5% for good performance and less than 2.5% for poor performance. The Commission considers that this indicator is useful because ‘A higher score is better, as it allows more companiesto bid, bringing better value for money. It also means greater transparency, as more information is available to the public.’ However, this is inconsistent with the fact that the SMSPP methodology stresses that it is affected by the ‘main shortcoming … that it does not reflect the different weight that government spending has in the economy of a particular’ Member State (p. 13). It also fails to account for different economic models where some Member States can retain a much larger in-house capability than others, as well as failing to reflect other issues such as fiscal policies, etc. Moreover, the SMSPP includes a note that says that ‘Due to delays in data availability, these results are based on 2015 data (also used in the 2016 scoreboard). However, given the slow changes to this indicator, 2015 results are still relevant.‘ I wonder how is it possible to establishes that there are ‘slow changes’ to the indicator where there is no more current information. On the whole, this is clearly an indicator that should be dropped, rather than included with such a phenomenal number of (partially hidden) caveats.

On the whole, then, the SMSPP and a number of the indicators on which it is based is reflective of the implicit policy biases of the European Commission. In my view, it is disingenuous to try to save this by simply stressing that the SMSPP and its indicators

Like all indicators, however, they simplify reality. They are affected by country-specific factors such as what is actually being bought, the structure of the economies concerned, and the relationships between different tendering options, none of which are taken into account. Also, some aspects of public procurement have been omitted entirely or covered only indirectly, e.g. corruption, the administrative burden and professionalism. So, although the Scoreboard provides useful information, it gives only a partial view of EU countries' public procurement performance.

I would rather argue that, in these conditions, the SMSPP is not really useful. In particular, because it fails to enable analysis that could offer some valuable insights even despite the shortcomings of the underlying indicators: first, a cross-sectional analysis by comparing different countries under a single indicator; second, a trend analysis of evolution of procurement “performance” in the single market and/or in a given country.

The SMSPP and cross-sectional analysis: not fit for purpose

This criticism is largely implicit in the previous discussion, as the creation of indicators that are not reflective of ‘country-specific factors such as what is actually being bought, the structure of the economies concerned, and the relationships between different tendering options’ by itself prevents meaningful comparisons across the single market. Moreover, a closer look at the SMSPP methodology reveals that there are further issues that make such cross-sectional analysis difficult. To continue the discussion concerning indicator [4] Cooperative procurement, it is remarkable that the SMSPP methodology indicates that

[In previous versions] the only information on cooperative procurement was a tick box indicating that "The contracting authority is purchasing on behalf of other contracting authorities". This was intended to mean procurement in one of two cases: "The contract is awarded by a central purchasing body" and "The contract involves joint procurement". This has been made explicit in the [current methodology], where these two options are listed instead of the option on joint procurement. However, as always, there are exceptions to how uniformly this definition has been accepted across the EU. Anecdotally, in Belgium, this field has been interpreted as meaning that the management of the procurement procedure has been outsource[d] (e.g. to a legal company) -which explains the high values of this indicator for Belgium.

In simple terms, what this means is that the data point for Belgium (and any other country?) should have been excluded from analysis. In contrast, the SMSPP presents Belgium as achieving a good performance under this indicator—which, in turn, skews the overall performance of the country (which is, by the way, one of the few achieving positive overall performance… perhaps due to these data issues?).

This should give us some pause before we decide to give any meaning to cross-country comparisons at all. Additionally, as discussed below, we cannot (simply) rely on year-on-year comparisons of the overall performance of any given country.

The SMSPP and time series analysis: not fit for purpose

Below is a comparison of the ‘overall performance’ maps published in the last five iterations of the SMSPP.

Source: own elaboration, based on the European Commission’s Single Market Scoreboard for Public Procurement for the years 2014-2018 (please note that this refers to publication years, whereas the data on which each of the reports is based corresponds to the previous year).

Source: own elaboration, based on the European Commission’s Single Market Scoreboard for Public Procurement for the years 2014-2018 (please note that this refers to publication years, whereas the data on which each of the reports is based corresponds to the previous year).

One would be tempted to read these maps as representing a time series and thus as allowing for trend analysis. However, that is not the case, for various reasons. First, the overall performance indicator has been constructed on the basis of different (sub)indicators in different iterations of the SMSPP:

  • the 2014 iteration was based on three indicators: bidder participation; accessibility and efficiency.

  • the 2015 SMSPP included six indicators: single bidder; no calls for bids; publication rate; cooperative procurement; award criteria and decision speed.

  • the 2016 SMSPP also included six indicators. However, compared to 2015, the 2016 SMSPP omitted ‘publication rate’ and instead added an indicator on ‘reporting problems’.

  • the 2017 SMSPP expanded to 9 indicators. Compared to 2016, the 2017 SMSPP reintroduced ‘publication rate’ and replaced ‘reporting problems’ for indicators on ‘missing values’, ‘missing calls for bids’ and ‘missing registration numbers’.

  • the 2018 SMSPP, as mentioned above, is based on 12 indicators. Compared to 2017, the 2018 SMSPP has added indicators on ‘SME contractors’, ‘SME bids’ and ‘procedures divided into lots’. It has also deleted the indicator ‘missing values’ and disaggregated the ‘missing registration numbers’ into ‘missing seller registration numbers’ and ‘missing buyer registration numbers’.

It is plain that there are no two consecutive iterations of the SMSPP based on comparable indicators. Moreover, the way that the overall performance is determined has also changed. While the SMSPP for 2014 to 2017 established the overall performance as a ‘deviation from the average’ of sorts, whereby countries were given ‘green’ for overall marks above 90% of the average mark, ‘yellow’ for overall marks between 80 and 90% of the average mark, and ‘red’ for marks below 80% of the average mark; in the 2018 SMSPP, ‘green’ indicates a score above 3, ‘yellow’ indicates a score below 3 and above -3, and ‘red’ indicates a score below -3. In other words, the colour coding for the maps has changed from a measure of relative performance to a measure of absolute performance—which, in fairness, could be more meaningful.

As a result of these (and, potentially, other) issues, the SMSPP is clearly unable to support trend analysis, either at single market or country level. However, despite the disclaimers in the published documents, this remains a risk (to the extent that anyone really engages with the SMSPP).

Overall conclusion

The example of the SMSPP does not augur very well for the adoption of data analytics-based policy-making. This is a case where, despite acknowledging shortcomings in the methodology and the data, the Commission has pressed on, seemingly on the premise that ‘some data (analysis) is better than none’. However, in my view, this is the wrong approach. To put it plainly, the SMSPP is rather useless. However, it may create the impression that procurement data is being used to design policy and support its implementation. It would be better for the Commission to stop publishing the SMSPP until the underlying data issues are corrected and the methodology is streamlined. Otherwise, the Commission is simply creating noise around data-based analysis of procurement policy, and this can only erode its reputation as a policy-making body and the guardian of the single market.


US GAO reports on test commercial items program for #publicprocurement


In a recently published report, the US Government Accountability Office (GAO) assessed the status of a test program for the acquisition of commercial items and services--i.e. are those that generally available in the commercial marketplace in contrast with items developed to meet specific governmental requirements.
 
The report is interesting, and it highlights that US federal agencies are conducting around 2% of their procurement through this program and that, overall, the "test program reduced contracting lead time and administrative burdens and generally did not incur additional risks above those on other federal acquisition efforts for those contracts GAO reviewed." Therefore, there seems to be scope for further use of the commercial items acquisition program.
 
Importantly too, GAO warns that, however, a significant number of these contracts were "awarded noncompetitively [and that, w]hile these awards were justified and approved in accordance with federal regulations when required, GAO and others have found that noncompetitive contracting poses risks of not getting the best value because these awards lack a direct market mechanism to help establish pricing." Consequently, GAO has recommended the interested federal agencies to look in more detail into the use of the program and to take measures to ensure that thorough market research is conducted before a commercial items contract is awarded noncompetitively.
 
In my view, the emphasis that GAO places on the collection and analysis of data in order to determine the benefits and success of the commercial items program offers valuable insights to procurement regulators in other jurisdictions--and, particularly, in the EU, where Member States should start considering procurement reform in view of the imminent publication of the new Directives in the Official Journal.