Lucky dip from a mixed bag? Summary results of the consultation on the evaluation of EU Procurement rules

May 16, 2025

Janet Turra & Cambridge Diversity Fund / https://betterimagesofai.org / https://creativecommons.org/licenses/by/4.0/. The image features a silver meat grinder. Going into the grinder at the top are various culturally symbolic, historical, and fun icons – such as emojis, old statutes, a computer, newspapers, an aeroplane. At the other end of the meat grinder, coming out is a sea of blue and grey icons representing chat bot responses like 'Let me know if this aligns with your vision' in a grey chat bot message symbol.

The European Commission has published a Factual Summary report on the public consultation on Evaluation of the Public Procurement Directives (the Summary). While we await for the Commission’s fuller analysis of the responses to the consultation (which officials have publicly acknowledged to be processing with AI tools, at least in part) after the summer, it is worth taking a look at the numbers on their face value.

And even before that, it is worth reflecting on the value of a consultation that largely seeks input on the ‘lived experience’ of procurement but sidesteps the critical issue that respondents will provide views based on the specific implementation of the EU rules in their jurisdiction. Barring ‘pure copy-paste’ approaches (such as the good old UK approach), this already creates a significant methodological and analytical hurdle because the underlying reasons for any views expressed cannot without more be attributed to the EU directives—but are rather by necessity mediated by domestic implementation decisions, as well as by domestic procurement culture, legal context and technical infrastructure. The latter is perhaps the easier to grasp. Questions around e-procurement will elicit very different responses depending on the level of functionality, reliability, and sophistication (costs, etc) of e-procurement systems put in place in each of the Member States. Given the broad variation in that regard, it is hard to meaningfully extrapolate the feedback and attribute it to the minimalistic rules on e-procurement in the directives. The same applies across the piece.

Moreover, even setting that aside and taking the statistical summary as provided by the Commission, it is hard to know what to make of it. In short, in my view, the picture that emerges is very much a mixed bag. This further supports the emerging (?) view that the priority should not be the reform of the legal framework, but rather the much more complicated (and expensive) but potentially more impactful work on ensuring procurement practice maximizes use of the flexibility within the existing framework (as discussed in the recent conference held at the University of Copenhagen by Professor Carina Risvig Hamer — see the key conclusions here). Here is why.

(small) majority and (large) minority views

Let’s take a few headline figures and statements:

49% of respondents believe that the Directives did not make the public procurement system flexible enough and 54% think that they did not establish simpler rules for the public procurement system.
most of respondents (48%) think that the rules aiming at increasing procedural flexibility (e. g. the choice of available procedures, time limits for submitting offers, contract modifications) are no longer relevant and adequate.
the same percentage of respondents (48%) consider the Directives’ rules on transparency (e.g. EU-wide publication via Tenders Electronic Daily 'TED') to be still relevant and adequate.
most respondents (53%) believe that the Directives ensure the equal treatment of bidders from other EU countries in all stages of the process and the objective evaluation of tenders.
almost half of respondents (49%) consider that the rules on eProcurement are still relevant and adequate to facilitate market access.
there is some agreement that the Directives’ rules that aim for environmentally friendly procurement (e.g. quality assurance standards and environmental management standards) and for socially responsible procurement (e.g. reserved contracts, requirements on accessibility for people with disabilities and design for all users) are still relevant and adequate. 39% and 43% of respondents say so, respectively.
Most respondents (39%) believe that the objectives of the three Public Procurement Directives are coherent with each other. However, EU legislation relating to public procurement (e.g. sectoral rules such as the Net Zero Industry Act or Clean Vehicles Directive) are not thought to be coherent with the Directives by the largest part of respondents (37% vs 11% who think that sectoral files are coherent).
Most respondents (49%) disagree that the Directives are fit for purpose to contribute to the EU’s strategic autonomy (including the security of EU supply chains). 42% think that the Directives are not fit for purpose in urgent situations. 44% consider that they are not fit for purpose in case of major supply shortages (e.g. supply-chain disruptions during a health, energy or security crisis). 38% think that the Directives do not ensure that security considerations are properly addressed by the contracting authorities.

The figures above, even if phrased in terms of majority of respondents, hardly show a clear majority view on any of those issues. At best, the majoritarian view reaches a figure just above the 50% threshold and, in most instances, the majoritarian view is in reality a large minority view (and sometimes not even that large at all). Constructing the figures to reflect ‘truly’ majority views to potentially influence the direction of reform proposals would require ‘appropriating’ the neutral space (which hovers between 15-28%, depending on the issue in the list above). This raises some questions on methodology itself (should neutral answers be allowed at all?), as well as on ways of treating data that stems from a non-representative and tiny sample (given the figures around number of public buyers, companies tendering for public contracts, and other stakeholders across the EU).

‘Consultation assessment by numbers’ is clearly not going to work. This should push our hopes to the qualitative analysis of the responses, which however raises no smaller questions on the relevance and reliability of the insights provided by this approach to public consultation. Moreover, it can be concerning that the qualitative analysis is being supported by AI tools, as this creates all sort of risks — from technical issues (such as confabulation and the simple making up of ‘insights’) to methodological issues (especially, if the AI is seeking to extract trends, which then largely replicates the problem of ‘assessment by numbers’). It would be very important for the Commission to publish a methodological annex in the future report explaining how AI was used, so that we can have a good sense of whether the qualitative analysis is robust or (use)less.

expert (?) views

To put it mildly, some trends in the Summary run directly against expert insights on the operation (and shortcomings) of the Directives.

This is perhaps most starkly shown in the responses around transparency. There is to my mind no question whatsoever that the expert community considers that there is insufficient procurement transparency and that the TED system is unfit to enable for the collection, publication and facilitation of re-use of procurement data in ways that lead to helpful data insights and, potentially, AI deployment. However, 48% of respondents have said otherwise. What to make of this? What is the point of asking this sort of question in an open consultation? Will this be used as a justification (aham, excuse) not to decidedly revisit the issue of procurement data in a way that promotes the development of an adequate data infrastructure fit for current policy challenges, as the expert community keeps advocating for?

Similarly, though 53% of respondents consider there is no issue of equal treatment of non-domestic bidders, what is the evidence for that? Given how relatively little cross-border tendering there is, on what grounds is this view formed? If the answers are based on a point of principle, what weight (if any) should be given to this set of answers? How does this square up with expert insights (and the Court of Justice’s occasional reminder) that fragmentation of requirements can create de facto unequal treatment (eg where domestic tenderers are more familiar with requirements arising from a broad array of administrative law provisions)?

Jarringly, the outcome of the consultation on the trends in competition for public contracts reflect that: ‘No significant conclusion could be drawn on whether competition had increased, remained the same or decreased over the last 8 years: 25% of respondents think that it decreased, 21% that it remained the same, and 25% that it increased’. However, from the European Court of Auditors’ report, we know that (by available metrics) competition has been on a constant reduction over the last decade. What was the point of asking this question and what to make of this outcome?

sectoral bias

Moreover, the qualitative analysis will require taking into account the specific position (and agenda) of respondents. Clearly, for example, the (declared) perception of aspects of the procurement system will be massively different depending on which side of the policy table respondents sit at.

This is most starkly shown around strategic procurement, where the public/private sector split is clear: ‘Public authorities agree that the Directives have encouraged contracting authorities to buy works, goods and services which are environmentally friendly (56%), socially responsible (55%), and innovative (45%). However, all other respondent groups are less positive. For instance, companies/businesses disagree that the Directives have encouraged contracting authorities to buy works, goods and services which are environmentally friendly (46%), socially responsible (50%), and innovative (54%).’ However, more importantly, and even with problems in the data, we know that uptake of green, social and innovation procurement is woefully low. Again, the European Court of Auditors has clearly documented this. What was the point of asking this question and, more importantly, how will this sort of outcome help inform policy going forward?

What next?

It will be interesting to see what comes out of the fuller analysis of the responses to the public consultation. However, it seems to me that this piece of information gathering will result in a relatively wide variety of views and thus likely have very little meaningful value in informing the direction of travel for the formulation of a proposal for revised rules. More importantly, I think this exercise shows the limited value in trying to obtain this sort of general views on high level questions around issues that are by definition complex, multi-layered, and in some cases politically contested.

As the conclusions to the Copenhagen conference show, there was broad general agreement (in that context) that three elements need to be at the core of the process of review of the EU rules: digitalisation, a clarification of the purpose/s of EU procurement rules, and practical simplification of legal requirements. Given the push to reform, we can hope that the Commission will take a path along those lines going forward. However, the Commission’s own statement of priorities included digitalisation, simplification and EU preference/strategic procurement. That is in itself showing a potentially big clash in approaches and the likely impossibility of achieving a set of goals that cut across each other.

Moreover, I think it is not too late to stop and reconsider whether we are falling in a legocentric trap. During the conference, ‘an element raised several times … was whether it was the procurement rules or the procurement practices that needed to change?’. I think there is a lot value in considering this in detail. We should not delude ourselves thinking that just because something is written in the procurement Directive, reality follows… It would also be helpful to consider whether it is possible to take a staged approach and truly prioritise efforts, so that we can move forward in relation to a single priority (which in my view should be digitalisation) before attempting the more complex and contested aspects of a reform.

Did you use AI to write this tender? What? Just asking! -- Also, how will you use AI to deliver this contract?

March 26, 2024

The UK’s Cabinet Office has published procurement policy note 2/24 on ‘Improving Transparency of AI use in Procurement’ (the ‘AI PPN’) because ‘AI systems, tools and products are part of a rapidly growing and evolving market, and as such, there may be increased risks associated with their adoption … [and therefore] it is essential to take steps to identify and manage associated risks and opportunities, as part of the Government’s commercial activities’.

The crucial risk the AI PPN seems to be concerned with relates to generative AI ‘hallucinations’, as it includes background information highlighting that:

‘Content created with the support of Large Language Models (LLMs) may include inaccurate or misleading statements; where statements, facts or references appear plausible, but are in fact false. LLMs are trained to predict a “statistically plausible” string of text, however statistical plausibility does not necessarily mean that the statements are factually accurate. As LLMs do not have a contextual understanding of the question they are being asked, or the answer they are proposing, they are unable to identify or correct any errors they make in their response. Care must be taken both in the use of LLMs, and in assessing returns that have used LLMs, in the form of additional due diligence.’

The PPN has the main advantage of trying to tackle the challenge of generative AI in procurement head on. It can help raise awareness in case someone was not yet talking about this and, more seriously, it includes an Annex A that brings together the several different bits of guidance issued by the UK government to date. However, the AI PPN does not elaborate on any of that guidance and is thus as limited as the Guidelines for AI procurement (see here), relatively complicated in that it points to rather different types of guidance ranging from ethics, to legal, to practical considerations, and requires significant knowledge and expertise to be operationalised (see here). Perhaps the best evidence of the complexity of the mushrooming sets of guidance is that the PPN itself includes in Annex A a reference to the January 2024 Guidance to civil servants on use of generative AI, which has been superseded by the Generative AI Framework for HMG, to which it also refers in Annex A. In other words, the AI PPN is not a ‘plug-and-play’ document setting out how to go about dealing with AI hallucinations and other risks in procurement. And given the pace of change in this area, it is also bound to be a PPN that requires multiple revisions and adaptations going forward.

A screenshot showing that the January guidance on generative AI use has been superseded (taken on 26 March 2024 10:20am).

More generally, the AI PPN is bound to be controversial and has already spurred insightful discussion on LinkedIn. I would recommend the posts by Kieran McGaughey and Ian Makgill. I offer some additional thoughts here and look forward to continuing the conversation.

In my view, one of the potential issues arising from the AI PPN is that it aims to cover quite a few different aspects of AI in procurement, as well as neglecting others. Slightly simplifying, there are three broad areas of AI-procurement interaction. First, there is the issue of buying AI-based solutions or services. Second, there is the issue of tenderers using (generative) AI to write or design their tenders. Third, there is the issue of the use of AI by contracting authorities, eg in relation to qualitative selection/exclusion, or evaluation/award decisions. The AI PPN covers aspects of . However, it is not clear to me that these can be treated together, as they pose significantly different policy issues. I will try to disentangle them here.

Buying and using AI

Although it mainly cross-refers to the Guidelines for AI procurement, the AI PPN includes some content relevant to the procurement and use of AI when it stresses that ‘Commercial teams should take note of existing guidance when purchasing AI services, however they should also be aware that AI and Machine Learning is becoming increasingly prevalent in the delivery of “non-AI” services. Where AI is likely to be used in the delivery of a service, commercial teams may wish to require suppliers to declare this, and provide further details. This will enable commercial teams to consider any additional due diligence or contractual amendments to manage the impact of AI as part of the service delivery.’ This is an adequate and potentially helpful warning. However, as discussed below, the PPN suggests a way to go about it that is in my view wrong and potentially very problematic.

AI-generated tenders

The AI PPN is however mostly concerned with the use of AI for tender generation. It recognises that there ‘are potential benefits to suppliers using AI to develop their bids, enabling them to bid for a greater number of public contracts. It is important to note that suppliers’ use of AI is not prohibited during the commercial process but steps should be taken to understand the risks associated with the use of AI tools in this context, as would be the case if a bid writer has been used by the bidder.’ It indicates some potential steps contracting authorities can take, such as:

‘Asking suppliers to disclose their use of AI in the creation of their tender.’
‘Undertaking appropriate and proportionate due diligence:
- If suppliers use AI tools to create tender responses, additional due diligence may be required to ensure suppliers have the appropriate capacity and capability to fulfil the requirements of the contract. Such due diligence should be proportionate to any additional specific risk posed by the use of AI, and could include site visits, clarification questions or supplier presentations.
- Additional due diligence should help to establish the accuracy, robustness and credibility of suppliers’ tenders through the use of clarifications or requesting additional supporting documentation in the same way contracting authorities would approach any uncertainty or ambiguity in tenders.’
‘Potentially allowing more time in the procurement to allow for due diligence and an increase in volumes of responses.’
‘Closer alignment with internal customers and delivery teams to bring greater expertise on the implications and benefits of AI, relative to the subject matter of the contract.’

In my view, there are a few problematic aspects here. While the AI PPN seems to try not to single out the use of generative AI as potentially problematic by equating it to the possible use of (human) bid writers, this is unconvincing. First, because there is (to my knowledge) no guidance whatsoever on an assessment of whether bid writers have been used, and because the AI PPN itself does not require disclosure of the engagement of bid writers (o puts any thought on the fact that third-party bid writers ma have used AI without this being known to the hiring tenderer, which would then require an extension of the disclosure of AI use further down the tender generation chain). Second, because the approach taken in the AI PP seems to point at potential problems with the use of (external, third-party) bid writers, whereas it does not seem to object to the use of (in-house) bid writers, potentially by much larger economic operators, which seems to presumptively not generate issues. Third, and most importantly, because it shows that perhaps not enough has been done so far to tackle the potential deceit or provision of misleading information in tenders if contracting authorities must now start thinking about how to get expert-based analysis of tenders, or develop fact-checking mechanisms to ensure bids are truthful. You would have thought that regardless of the origin of a tender, contracting authorities should be able to check their content to an adequate level of due diligence already.

In any case, the biggest issue with the AI PPN is how it suggests contracting authorities should deal with this issue, as discussed below.

AI-based assessments

The AI PPN also suggests that contracting authorities should be ‘Planning for a general increase in activity as suppliers may use AI to streamline or automate their processes and improve their bid writing capability and capacity leading to an increase in clarification questions and tender responses.’ One of the possibilities could be for contracting authorities to ‘fight fire with fire’ and also deploy generative AI (eg to make summaries, to scan for errors, etc). Interestingly, though, the AI PPN does not directly refer to the potential use of (generative) AI by contracting authorities.

While it includes a reference in Annex A to the Generative AI framework for HM Government, that document does not specifically address the use of generative AI to manage procurement processes (and what it says about buying generative AI is redundant given the other guidance in the Annex). In my view, the generative AI framework pushes strongly against the use of AI in procurement when it identifies a series of use cases to avoid (page 18) that include contexts where high-accuracy and high-explainability are required. If this is the government’s (justified) view, then the AI PPN has been a missed opportunity to say this more clearly and directly.

The broader issue of confidential, classified or proprietary information

Both in relation to the procurement and use of AI, and the use of AI for tender generation, the AI PPN stresses that it may be necessary:

‘Putting in place proportionate controls to ensure bidders do not use confidential contracting authority information, or information not already in the public domain as training data for AI systems e.g. using confidential Government tender documents to train AI or Large Language Models to create future tender responses.‘; and that
‘In certain procurements where there are national security concerns in relation to use of AI by suppliers, there may be additional considerations and risk mitigations that are required. In such instances, commercial teams should engage with their Information Assurance and Security colleagues, before launching the procurement, to ensure proportionate risk mitigations are implemented.’

These are issues that can easily exceed the technical capabilities of most contracting authorities. It is very hard to know what data has been used to train a model and economic operators using ‘off-the-shelf’ generative AI solutions will hardly be in a position to assess themselves, or provide any meaningful information, to contracting authorities. While there can be contractual constraints on the use of information and data generated under a given contract, it is much more challenging to assess whether information and data has been inappropriately used at a different link of increasingly complex digital supply chains. And, in any case, this is not only an issue for future contracts. Data and information generated under contracts already in place may not be subject to adequate data governance frameworks. It would seem that a more muscular approach to auditing data governance issues may be required, and that this should not be devolved to the procurement function.

How to deal with it? — or where the PPN goes wrong

The biggest weakness in the AI PPN is in how it suggests contracting authorities should deal with the issue of generative AI. In my view, it gets it wrong in two different ways. First, by asking for too much non-scored information where contracting authorities are unlikely to be able to act on it without breaching procurement and good administration principles. Second, by asking for too little non-scored information that contracting authorities are under a duty to score.

Too much information

The AI PPN includes two potential (alternative) disclosure questions in relation to the use of generative AI in tender writing (see below Q1 and Q2).

I think these questions miss the mark and expose contracting authorities to risks of challenge on grounds of a potential breach of the principle of equal treatment and the duty of good administration. The potential breach of the duty of good administration could be on grounds that the contracting authority is taking irrelevant information into account in the assessment of the relevant tender. The potential breach of equal treatment could come if tenders with some AI-generated elements were subjected to significantly more scrutiny than tenders where no AI was used. Contracting authorities should subject all tenders to the same level of due diligence and scrutiny because, at the bottom of it, there is no reason to ‘take a tenderer at its word’ when no AI is used. That is the entire logic of the exclusion, qualitative selection and evaluation processes.

Crucially, though, what the questions seem to really seek to ascertain is that the tenderer has checked for and confirms the accuracy of the content of the tender and thus makes the content its own and takes responsibility for it. This could be checked generally by asking all tenderers to confirm that the content of their tenders is correct and a true reflection of their capabilities and intended contractual delivery, reminding them that contracting authorities have tools to sanction economic operators that have ‘negligently provided misleading information that may have a material influence on decisions concerning exclusion, selection or award’ (reg.57(8)(i)(ii) PCR2015 and sch.7 13(2)(b) PA2023). And then enforcing them!

Checking the ‘authenticity’ of tenders when in fact contracting authorities are meant to check their truthfulness, accuracy and deliverability would be a false substitution of the relevant duties. It would also potentially eschew the incentives to disclose use of AI generation (lest contracting authorities find a reliable way of identifying it themselves and start applying the exclusion grounds above)—as thoroughly discussed in the LinkedIn posts referred to above.

too little information

Conversely, the PPN takes too soft and potentially confusing an approach to the use of AI to deliver the contract. The proposed disclosure question (Q3) is very problematic. It presents as ‘for information only’ a request for information on the use of AI or machine learning in the context of the actual delivery of the contract. This is information that will either relate to the technical specifications, award criteria or performance clauses (or all of them) and there is no meaningful way in which AI could be used to deliver the contract without this having an impact on the assessment and evaluation of the tender. The question is potentially misleading not only because of the indication that the information would not be scored, but also because it suggests that the use of AI in the delivery of a service or product is within the discretion of the tenderers. In my view, this would only be possible if the technical specifications were rather loosely written in performance terms, which would then require a very thorough description and assessment of how that performance is to be achieved. Moreover, the use of AI would probably require a set of organisational arrangements that should also not go unnoticed or unchecked in the procurement process. Moreover, one of the main challenges may not be in the use of AI in new contracts (were tenderers are likely to highlight it to stress the advantages, or to justify that their tenders are not abnormally low in comparison with delivery through ‘manual’ solutions), but in relation to pre-existing contracts. It also seems that a broader policy, recommendation and audit of the use of generative AI for the delivery of existing contracts and its treatment as a (permissible??) contract modification would have been needed.

Final thought

The AI PPN is an interesting development and will help crystallise many discussions that were somehow hovering in the background. However, a significant rethink is needed and, in my view, much more detailed guidance is needed in relation to the different dimensions of the interaction between AI and procurement. There are important questions that remain unaddressed and, in my view, one of the most pressing ones concerns the balance between general regulation and the use of procurement to regulate AI use. While the UK government remains committed to its ‘pro-innovation’ approach and no general regulation of AI use is put in place, in particular in relation to public sector AI use, procurement will continue to struggle and fail to act as a regulator of the technology.

Thoughts on the AI Safety Summit from a public sector procurement & use of AI perspective

November 3, 2023

The UK Government hosted an AI Safety Summit on 1-2 November 2023. A summary of the targeted discussions in a set of 8 roundtables has been published for Day 1, as well as a set of Chair’s statements for Day 2, including considerations around safety testing, the state of the science, and a general summary of discussions. There is also, of course, the (flagship?) Bletchley Declaration, and an introduction to the announced AI Safety Institute (UK AISI).

In this post, I collect some of my thoughts on these outputs of the AI Safety Summit from the perspective of public sector procurement and use of AI.

What was said at the AI safety Summit?

Although the summit was narrowly targeted to discussion of ‘frontier AI’ as particularly advanced AI systems, some of the discussions seem to have involved issues also applicable to less advanced (ie currently in existence) AI systems, and even to non-AI algorithms used by the public sector. As the general summary reflects, ‘There was also substantive discussion of the impact of AI upon wider societal issues, and suggestions that such risks may themselves pose an urgent threat to democracy, human rights, and equality. Participants expressed a range of views as to which risks should be prioritised, noting that addressing frontier risks is not mutually exclusive from addressing existing AI risks and harms.’ Crucially, ‘participants across both days noted a range of current AI risks and harmful impacts, and reiterated the need for them to be tackled with the same energy, cross-disciplinary expertise, and urgency as risks at the frontier.’ Hopefully, then, some of the rather far-fetched discussions of future existential risks can be conducive to taking action on current harms and risks arising from the procurement and use of less advanced systems.

There seemed to be some recognition of the need for more State intervention through regulation, for more regulatory control of standard-setting, and for more attention to be paid to testing and evaluation in the procurement context. For example, the summary of Day 1 discussions indicates that participants agreed that

‘We should invest in basic research, including in governments’ own systems. Public procurement is an opportunity to put into practice how we will evaluate and use technology.’ (Roundtable 4)
‘Company policies are just the baseline and don’t replace the need for governments to set standards and regulate. In particular, standardised benchmarks will be required from trusted external third parties such as the recently announced UK and US AI Safety Institutes.’ (Roundtable 5)

In Day 2, in the context of safety testing, participants agreed that

Governments have a responsibility for the overall framework for AI in their countries, including in relation to standard setting. Governments recognise their increasing role for seeing that external evaluations are undertaken for frontier AI models developed within their countries in accordance with their locally applicable legal frameworks, working in collaboration with other governments with aligned interests and relevant capabilities as appropriate, and taking into account, where possible, any established international standards.
Governments plan, depending on their circumstances, to invest in public sector capability for testing and other safety research, including advancing the science of evaluating frontier AI models, and to work in partnership with the private sector and other relevant sectors, and other governments as appropriate to this end.
Governments will plan to collaborate with one another and promote consistent approaches in this effort, and to share the outcomes of these evaluations, where sharing can be done safely, securely and appropriately, with other countries where the frontier AI model will be deployed.

This could be a basis on which to build an international consensus on the need for more robust and decisive regulation of AI development and testing, as well as a consensus of the sets of considerations and constraints that should be applicable to the procurement and use of AI by the public sector in a way that is compliant with individual (human) rights and social interests. The general summary reflects that ‘Participants welcomed the exchange of ideas and evidence on current and upcoming initiatives, including individual countries’ efforts to utilise AI in public service delivery and elsewhere to improve human wellbeing. They also affirmed the need for the benefits of AI to be made widely available’.

However, some statements seem at first sight contradictory or problematic. While the excerpt above stresses that ‘Governments have a responsibility for the overall framework for AI in their countries, including in relation to standard setting’ (emphasis added), the general summary also stresses that ‘The UK and others recognised the importance of a global digital standards ecosystem which is open, transparent, multi-stakeholder and consensus-based and many standards bodies were noted, including the International Standards Organisation (ISO), International Electrotechnical Commission (IEC), Institute of Electrical and Electronics Engineers (IEEE) and relevant study groups of the International Telecommunication Union (ITU).’ Quite how State responsibility for standard setting fits with industry-led standard setting by such organisations is not only difficult to fathom, but also one of the potentially most problematic issues due to the risk of regulatory tunnelling that delegation of standard setting without a verification or certification mechanism entails.

Moreover, there seemed to be insufficient agreement around crucial issues, which are summarised as ‘a set of more ambitious policies to be returned to in future sessions’, including:

‘1. Multiple participants suggested that existing voluntary commitments would need to be put on a legal or regulatory footing in due course. There was agreement about the need to set common international standards for safety, which should be scientifically measurable.

2. It was suggested that there might be certain circumstances in which governments should apply the principle that models must be proven to be safe before they are deployed, with a presumption that they are otherwise dangerous. This principle could be applied to the current generation of models, or applied when certain capability thresholds were met. This would create certain ‘gates’ that a model had to pass through before it could be deployed.

3. It was suggested that governments should have a role in testing models not just pre- and post-deployment, but earlier in the lifecycle of the model, including early in training runs. There was a discussion about the ability of governments and companies to develop new tools to forecast the capabilities of models before they are trained.

4. The approach to safety should also consider the propensity for accidents and mistakes; governments could set standards relating to how often the machine could be allowed to fail or surprise, measured in an observable and reproducible way.

5. There was a discussion about the need for safety testing not just in the development of models, but in their deployment, since some risks would be contextual. For example, any AI used in critical infrastructure, or equivalent use cases, should have an infallible off-switch.

…

8. Finally, the participants also discussed the question of equity, and the need to make sure that the broadest spectrum was able to benefit from AI and was shielded from its harms.’

All of these are crucial considerations in relation to the regulation of AI development, (procurement) and use. A lack of consensus around these issues already indicates that there was a generic agreement that some regulation is necessary, but much more limited agreement on what regulation is necessary. This is clearly reflected in what was actually agreed at the summit.

What was agreed at the AI Safety Summit?

Despite all the discussions, little was actually agreed at the AI Safety Summit. The Blethcley Declaration includes a lengthy (but rather uncontroversial?) description of the potential benefits and actual risks of (frontier) AI, some rather generic agreement that ‘something needs to be done’ (eg welcoming ‘the recognition that the protection of human rights, transparency and explainability, fairness, accountability, regulation, safety, appropriate human oversight, ethics, bias mitigation, privacy and data protection needs to be addressed’) and very limited and unspecific commitments.

Indeed, signatories only ‘committed’ to a joint agenda, comprising:

‘identifying AI safety risks of shared concern, building a shared scientific and evidence-based understanding of these risks, and sustaining that understanding as capabilities continue to increase, in the context of a wider global approach to understanding the impact of AI in our societies.
building respective risk-based policies across our countries to ensure safety in light of such risks, collaborating as appropriate while recognising our approaches may differ based on national circumstances and applicable legal frameworks. This includes, alongside increased transparency by private actors developing frontier AI capabilities, appropriate evaluation metrics, tools for safety testing, and developing relevant public sector capability and scientific research’ (emphases added).

This does not amount to much that would not happen anyway and, given that one of the UK Government’s objectives for the Summit was to create mechanisms for global collaboration (‘a forward process for international collaboration on frontier AI safety, including how best to support national and international frameworks’), this agreement for each jurisdiction to do things as they see fit in accordance to their own circumstances and collaborate ‘as appropriate’ in view of those seems like a very poor ‘win’.

In reality, there seems to be little coming out of the Summit other than a plan to continue the conversations in 2024. Given what had been said in one of the roundtables (num 5) in relation to the need to put in place adequate safeguards: ‘this work is urgent, and must be put in place in months, not years’; it looks like the ‘to be continued’ approach won’t do or, at least, cannot be claimed to have made much of a difference.

What did the UK Government promise in the AI Summit?

A more specific development announced with the occasion of the Summit (and overshadowed by the earlier US announcement) is that the UK will create the AI Safety Institute (UK AISI), a ‘state-backed organisation focused on advanced AI safety for the public interest. Its mission is to minimise surprise to the UK and humanity from rapid and unexpected advances in AI. It will work towards this by developing the sociotechnical infrastructure needed to understand the risks of advanced AI and enable its governance.’

Crucially, ‘The Institute will focus on the most advanced current AI capabilities and any future developments, aiming to ensure that the UK and the world are not caught off guard by progress at the frontier of AI in a field that is highly uncertain. It will consider open-source systems as well as those deployed with various forms of access controls. Both AI safety and security are in scope’ (emphasis added). This seems to carry forward the extremely narrow focus on ‘frontier AI’ and catastrophic risks that augured a failure of the Summit. It is also in clear contrast with the much more sensible and repeated assertions/consensus in that other types of AI cause very significant risks and that there is ‘a range of current AI risks and harmful impacts, and reiterated the need for them to be tackled with the same energy, cross-disciplinary expertise, and urgency as risks at the frontier.’

Also crucially, UK AISI ‘is not a regulator and will not determine government regulation. It will collaborate with existing organisations within government, academia, civil society, and the private sector to avoid duplication, ensuring that activity is both informing and complementing the UK’s regulatory approach to AI as set out in the AI Regulation white paper’.

According to initial plans, UK AISI ‘will initially perform 3 core functions:

Develop and conduct evaluations on advanced AI systems, aiming to characterise safety-relevant capabilities, understand the safety and security of systems, and assess their societal impacts
Drive foundational AI safety research, including through launching a range of exploratory research projects and convening external researchers
Facilitate information exchange, including by establishing – on a voluntary basis and subject to existing privacy and data regulation – clear information-sharing channels between the Institute and other national and international actors, such as policymakers, international partners, private companies, academia, civil society, and the broader public’

It is also stated that ‘We see a key role for government in providing external evaluations independent of commercial pressures and supporting greater standardisation and promotion of best practice in evaluation more broadly.’ However, the extent to which UK AISI will be able to do that will hinge on issues that are not currently clear (or publicly disclosed), such as the membership of UK AISI or its institutional set up (as ‘state-backed organisation’ does not say much about this).

On that very point, it is somewhat problematic that the UK AISI ‘is an evolution of the UK’s Frontier AI Taskforce. The Frontier AI Taskforce was announced by the Prime Minister and Technology Secretary in April 2023’ (ahem, as ‘Foundation Model Taskforce’—so this is the second rebranding of the same initiative in half a year). As is problematic that UK AISI ‘will continue the Taskforce’s safety research and evaluations. The other core parts of the Taskforce’s mission will remain in [the Department for Science, Innovation and Technology] as policy functions: identifying new uses for AI in the public sector; and strengthening the UK’s capabilities in AI.’ I find the retention of analysis pertaining to public sector AI use within government problematic and a clear indication of the UK’s Government unwillingness to put meaningful mechanisms in place to monitor the process of public sector digitalisation. UK AISI very much sounds like a research institute with a focus on a very narrow set of AI systems and with a remit that will hardly translate into relevant policymaking in areas in dire need of regulation. Finally, it is also very problematic that funding is not locked: ‘The Institute will be backed with a continuation of the Taskforce’s 2024 to 2025 funding as an annual amount for the rest of this decade, subject to it demonstrating the continued requirement for that level of public funds.’ In reality, this means that the Institute’s continued existence will depend on the Government’s satisfaction with its work and the direction of travel of its activities and outputs. This is not at all conducive to independence, in my view.

So, all in all, there is very little new in the announcement of the creation of the UK AISI and, while there is a (theoretical) possibility for the Institute to make a positive contribution to regulating AI procurement and use (in the public sector), this seems extremely remote and potentially undermined by the Institute’s institutional set up. This is probably in stark contrast with the US approach the UK is trying to mimic (though more on the US approach in a future entry).

GC supports exercise of discretion in the assessment of technical compliance in public procurement (T-30/12)

March 19, 2015

In its Judgment in IDT Biologika v Commission, T-30/12, EU:T:2015:159 (only available in DE and FR and involving public procurement by the EU Institutions), the General Court (GC) has decided on an issue involving the contracting authority's discretion to assess the sufficiency of technical reports and certificates submitted by the tenderer in order to proof conformity of its offer with requirements set out in the technical specifications. This is an important case because it supports the exercise of technical discretion in the assessment of compliance with specifications in public procurement processes and, in my view, consolidates a welcome anti-formalistic development of this area of EU public procurement law.

In the case at hand, there was a tender for the supply of anti-rabies vaccines to a region in Serbia. The technical specifications determined that the vaccines had to meet certain conditions, amongst which it was necessary to demonstrate that the vaccine had been registered by the European Medicines Agency or equivalent agency of an EU Member State, and that its use was also authorised by the Serbian medicines agency prior to its distribution.

Bioveta made an offer to supply anti-rabies vaccines based on a type of virus ("SAD-Bern MSV Bio 10") that differed from the one included in the registration and the authorisation documents it submitted as part of the technical documentation (referring to "SAD-Bern"), which had been obtained for commercialisation in both Serbia and the Czech Republic.

In view of that discrepancy, the contracting authority required Bioveta to clarify and confirm that, despite the use of a different virus, the vaccine it offered did not require a new registration with a medicines agency, and that the commercialisation under a different name did not breach the initial authorisation to distribute the product in the Serbian market.

In simple terms, Bioveta explained that the virus had been changed in 1992 and that the "SAD-Bern MSV Bio 10" was the virus used when the product had been authorised for distribution in Serbia. It also submitted a written explanation of the mere commercial orientation of the change of name (implemented to distinguish Bioveta's vaccines from those of competitors that also sold solutions based on the "SAD-Bern" virus), and submitted that it did not require new registration. It also furnished a report by the Czech medicines agency that confirmed that the products were equivalent and the name "SAD-Bern MSV Bio 10" had been used in all registrations and renewals that had taken place since 1992.

The contracting authority considered that the clarification was sufficient and the contract was eventually awarded to Bioveta. The decision was subsequently challenged by the competing bidder IDT Biologika on several grounds (some of them very technical in veterinary terms). In my view, the interesting ground for challenge rests on the discretion of the contracting authority when it comes to the assessment of technical aspects of a tender for a contract to be awarded on the basis of the lowest price (or in post-2014 terms, to the most cost-effective offer).

IDT Biologika fundamentally submitted that the explanations and certificates provided by Bioveta had been improperly assessed and taken into consideration by the contracting authority, and that the award decision was flawed due to the exercise of excessive discretion in accepting them--as, in IDT Biologika's view, the contracting authority should have taken a formalistic approach and rejected Bioveta's tender.

In order to resolve this issue, the GC builds on CMB and Christof v Commission, where it was established that "in the context of a public procurement procedure where ... the contract is awarded to the tenderer who has submitted the lowest priced administratively and technically compliant tender, the contracting authority limits its margin of discretion with regard to the award of the contract to the lowest priced tender among the compliant tenders. However, its margin of discretion must remain broad with regard to the evaluation of the conformity of the tenders presented, and in particular the documentation provided in that regard" (T-407/07, EU:T:2011:477, para 116, emphasis added). It then goes on to determine that, in view of the information supplied by Bioveta, it was not unreasonable or manifestly wrong for the contracting authority not to reject the tender.

In my view, this is a significant consolidation of the case law and, under the CMB and Christof v Commission and IDT Biologika v Commission line of case law, contracting authorities and their evaluation teams should be confident in sticking to a possibilistic approach towards the assessment of the tenders--so as to move past strict formalities and accept sufficient technical evidence as to ensure compliance with the technical specifications.

This is certainly the correct approach from the perspective of maximization of competition and the assessment of technical requirements from a functional perspective--and, consequently, the one that best fits the framework set by Art 44 of Directive 2014/24 on test reports, certification and other means of proof of conformity with requirements or criteria set out in the technical specifications, the award criteria or the contract performance conditions (in particular, art 44(2) dir 2014/24 on alternative means of proof).

How precisely must evaluation rules be described in procurement documents? According to the GC, not that precisely

July 18, 2012

In yet another public procurement case derived from a complaint by the Greek company Evropaïki Dynamiki, the General Court has analysed the issue of the degree of precision required in the description of evaluation methods for contract award purposes in its Judgment of 12 July 2012 in case T-476/07 Evropaïki Dynamiki v Frontex.

Regarding the degree of precision in the publication of the award criteria and the evaluation methods to be used by the contracting authority, the GC has adopted a lenient approach that seems questionable, since it may result in leaving excessive discretion in the hands of evaluation teams. It is worth stressing that the GC in Frontex considers that:

the fact that a precise scale of the calculation of the tenders with regard to that award criterion [multiplication of efficiency by effectiveness] was not given cannot constitute a breach of the tendering specifications consisting in the introduction, by the contracting authority, of a new award criterion. The calculation used to arrive at a well defined score does not constitute an evaluation criterion of the proposed hypothetical IT solution, but rather a consequence of that evaluation (case T-476/07, at para 106, emphasis added).

This seems to me as a highly controversial finding, which may run contrary to the case law of the Court of Justice of the EU, particularly in Lianakis (C-532/06 [2008] ECR I-251), where the CJEU clearly indicated that it is settled case law that: "potential tenderers should be aware of all the elements to be taken into account by the contracting authority in identifying the economically most advantageous offer, and their relative importance, when they prepare their tenders" and that "[p]otential tenderers must be in a position to ascertain the existence and scope of those elements when preparing their tenders" (paras 36 and 37, emphasis added). Even further, the CJEU stressed that "tenderers must be placed on an equal footing throughout the procedure, which means that the criteria and conditions governing each contract must be adequately publicised by the contracting authorities" (para 40, emphasis added).

If evaluation methods do not include the scales to be used by evaluation teams when they assess the tenders submitted by bidders, it is hard to see how all transparency requirements will be made operational and how applicants can effectively tailor their offers to the actual (preferred) requirements of the contracting authority or entity.

Unless there is a good overriding reason to keep the evaluation methodologies and scales secret or undefined in contract notices and documents, it seems clearly desirable that evaluation methods AND scales are published and available to bidders when preparing their tenders. In the end, it is not very useful to know that your tender will be assessed under a criterion of 'efficiency' or 'effectiveness' if there is no indication whatsoever how such requirements will be operationalized by the evaluation team.

Therefore, I think that the position of the GC in Frontex clashes with the more general case law highlighted by the CJEU in Lianakis, and that Frontex reflects a too lenient approach towards unjustified restrictions in the transparency of evaluation tools and procedures in public procurement.

In this regard, it seems desirable that the current revision of the EU Directives further details the obligations of contracting authorities to specify evaluation methods and scales in contract notices (e.g. in article 66 of the proposal for a Directive replacing 2004/18).