Procurement recommenders: a response by the author (García Rodríguez)

It has been refreshing to receive a detailed response by the lead author of one of the papers I recently discussed in the blog (see here). Big thanks to Manuel García Rodríguez for following up and for frank and constructive engagement. His full comments are below. I think they will help round up the discussion on the potential, constraints and data-dependency of procurement recommender systems.

Thank you Prof. Sánchez Graells for your comments, it has been a rewarding reading. Below I present my point of view to continue taking an in-depth look about the topic.

Regarding the percentage of success of the recommender, a major initial consideration is that the recommender is generic. It means, it is not restricted to a type of contract, CPV codes, geographical area, etc. It is a recommender that uses all types of Spanish tenders, from any CPV and over 6 years (see table 3). This greatly influences the percentage of success because it is the most difficult scenario. An easier scenario would have restricted the browser to certain geographic areas or CPVs, for example. In addition, 102,000 tenders have been used to this study and, presumably, they are not enough for a search engine which learn business behaviour patterns from historical tenders (more tenders could not be used due to poor data quality).

Regarding the comment that ‘the recommender is an effective tool for society because it enables and increases the bidders participation in tenders with less effort and resources‘. With this phrase we mean that the Administration can have an assistant to encourage participation (in the tenders which are negotiations with or without prior notice) or, even, in which the civil servants actively search for companies and inform those companies directly. I do not know if the public contracting laws of the European countries allow to search for actively and inform directly but it would be the most efficient and reasonable. On the other hand, a good recommender (one that has a high percentage of accuracy) can be an analytical tool to evaluate the level of competition by the contracting authorities. That is, if the tenders of a contracting authority attract very little competition but the recommender finds many potential participating companies, it means that the contracting authority can make its tenders more attractive for the market.

Regarding the comment that “It is also notable that the information of the Companies Register is itself not (and probably cannot be, period) checked or validated, despite the fact that most of it is simply based on self-declarations.” The information in the Spanish Business Register are the annual accounts of the companies, audited by an external entity. I do not know the auditing laws of the different countries. Therefore, I think that the reliability of the data is quite high in our article.

Regarding the first problematic aspect that you indicate: “The first one is that the recommender seems by design incapable of comparing the functional capabilities of companies with very different structural characteristics, unless the parameters for the filtering are given such range that the basket of recommendations approaches four digits”. There will always be the difficulty of comparing companies and defining when they are similar. That analysis should be done by economists, engineers can contribute little. There is also the limitation of business data, the information of the Business Register is usually paywalled and limited to certain fields, as is the case with the Spanish Business Registry. For these reasons, we recognise in the article that it is a basic approach, and it should be modified the filters/rules in the future: “Creating this profile to search similar companies is a very complex issue, which has been simplified. For this reason, the searching phase (3) has basic filters or rules. Moreover, it is possible to modify or add other filters according to the available company dataset used in the aggregation phase”.

Regarding the second problematic aspect that you indicate: “The second issue is that a recommender such as this one seems quite vulnerable to the risk of perpetuating and exacerbating incumbency advantages, and/or of consolidating geographical market fragmentation (given the importance of eg distance, which cannot generate the expected impact on eg costs in all industries, and can increasingly be entirely irrelevant in the context of digital/remote delivery).” This will not happen in the medium and long term because the recommender will adapt to market conditions. If there are companies that win bids far away, the algorithm will include that new distance range in its search. It will always be based on the historical winner companies (and the rest of the bidders if we have that information). You cannot ask a machine learning algorithm (the one used in this article) to make predictions not based on the previous winners and historical market patterns.

I totally agree with your final comment: “It would in my view be preferable to start by designing the recommender system in a way that makes theoretical sense and then make sure that the required data architecture exists or is created.” Unfortunately, I did not find any articles that discuss this topic. Lawyers, economists and engineers must work together to propose solid architectures. In this article we want to convince stakeholders that it is possible to create software tools such as a bidder recommender and the importance of public procurement data and the company’s data in the Business Registers for its development.

Thank you for your critical review. Different approaches are needed to improve on the important topic of public procurement.

Procurement recommender systems: how much better before we trust them? -- re García Rodríguez et al (2020)

© jm3 on Flickr.

How great would it be for a public buyer if an algorithm could identify the likely best bidder/s for a contract it sought to award? Pretty great, agreed.

For example, it would allow targeted advertising or engagement of public procurement opportunities to make sure those ‘best suited’ bidders came forward, or to start negotiations where this is allowed. It could also enable oversight bodies, such as competition authorities, to screen for odd (anti)competitive situations where well-placed providers did not bid for the contract, or only did in worse than expected conditions. If the algorithm was flipped, it would also allow potential bidders to assess for which tenders they are particularly well suited (or not).

It is thus not surprising that there are commercial attempts being developed (eg here) and interesting research going on trying to develop such recommender systems—which, at root, work similarly to recommender systems used in e-commerce (Amazon) or digital content platforms (Netflix, Spotify), in the sense that they try to establish which of the potential providers are most likely to satisfy user needs.

An interesting paper

On this issue, on which there has been some research for at least a decade (see here), I found this paper interesting: García Rodríguez et al, ‘Bidders Recommender for Public Procurement Auctions Using Machine Learning: Data Analysis, Algorithm, and Case Study with Tenders from Spain’ (2020) Complexity Art 8858258.

The paper is interesting in the way it builds the recommender system. It follows three steps. First, an algorithm trained on past tenders is used to predict the winning bidder for a new tender, given some specific attributes of the contract to be awarded. Second, the predicted winning bidder is matched with its data in the Companies Register, so that a number of financial, workforce, technical and location attributes are linked to the prediction. Third and final, the recommender system is used to identify companies similar to the predicted winner. Such identification is based on similarities with the added attributes of the predicted winner, which are subject to some basic filters or rules. In other words, the comparison is carried out at supplier level, not directly in relation to the object of the contract.

Importantly, such filters to sieve through the comparison need to be given numerical values and that is done manually (i.e. set at rather random thresholds, which in relation to some categories, such as technical specialism, make little intuitive sense). This would in principle allow the user of the recommender system to tailor the parameters of the search for recommended bidders.

In the specific case study developed in the paper, the filters are:

  • Economic resources to finance the project (i.e. operating income, EBIT and EBITDA);

  • Human resources to do the work (i.e. number of employees):

  • Specialised work which the company can do (based on code classification: NACE2, IAE, SIC, and NAICS); and

  • Geographical distance between the company’s location and the tender’s location.

Notably, in the case study, distance ‘is a fundamental parameter. Intuitively, the proximity has business benefits such as lower costs’ (at 8).

The key accuracy metric for the recommender system is whether it is capable of identifying the actual winner of a contract as the likely winning bidder or, failing that, whether it is capable of including the actual winner within a basket of recommended bidders. Based on the available Spanish data, the performance of the recommender system is rather meagre.

The poor results can be seen in the two scenarios developed in the paper. In scenario 1, the training and test data are split 80:20 and the 20% is selected randomly. In scenario 2, the data is also split 80:20, but the 20% test data is the most recent one. As the paper stresses, ‘the second scenario is more appropriate to test a real engine search’ (at 13), in particular because the use of the recommender will always be ‘for the next tender’ after the last one included in the relevant dataset.

For that more realistic scenario 2, the recommender has an accuracy of 10.25% in correctly identifying the actual winner, and this only raises to 23.12% if the recommendation includes a basket of five companies. Even for the more detached from reality scenario 1, the accuracy of a single prediction is only 17.07%, and this goes up to 31.58% for 5-company recommendations. The most accurate performance with larger baskets of recommended companies only reaches 38.52% in scenario 1, and 30.52% in scenario 2, although the much larger number of recommended companies (approximating 1,000) also massively dilutes the value of the information.

Comments

So, with the available information, the best performance of the recommender system creates about 1 in 10 chances of correctly identifying the most suitable provider, or 1 in 5 chances of having it included in a basket of 5 recommendations. Put the other way, the best performance of the realistic recommender is that it fails to identify the actual winner for a tender 9 out of 10 times, and it still fails 4 out of 5 times when it is given five chances.

I cannot say how this compares with non-automated searches based on looking at relevant company directories, other sources of industry intelligence or even the anecdotal experience of the public buyer, but these levels of accuracy could hardly justify the adoption of the recommender.

In that regard, the optimistic conclusion of the paper (‘the recommender is an effective tool for society because it enables and increases the bidders participation in tenders with less effort and resources‘ at 17) is a little surprising.

The discussion of the limitations of the recommender system sheds some more light:

The main limitation of this research is inherent to the design of the recommender’s algorithm because it necessarily assumes that winning companies will behave as they behaved in the past. Companies and the market are living entities which are continuously changing. On the other hand, only the identity of the winning company is known in the Spanish tender dataset, not the rest of the bidders. Moreover, the fields of the company’s dataset are very limited. Therefore, there is little knowledge about the profile of other companies which applied for the tender. Maybe in other countries the rest of the bidders are known. It would be easy to adapt the bidder recommender to this more favourable situation (at 17).

The issue of the difficulty of capturing dynamic behaviour is well put. However, there are more problems (below) and the issue of disclosure of other participants in the tender is not straightforwardly to the benefit of a more accurate recommender system, unless there was not only disclosure of other bidders but also of the full evaluations of their tenders, which is an unlikely scenario in practice.

There is also the unaddressed issue of whether it makes sense to compare the specific attributes selected in the study, which it mostly does not, but is driven by the available data.

What is ultimately clear from the paper is that the data required for the development of a useful recommender is simply not there, either at all or with sufficient quality.

For example, it is notable that due to data quality issues, the database of past tenders shrinks from 612,090 recorded to 110,987 useable tenders, which further shrink to 102,087 due to further quality issues in matching the tender information with the Companies Register.

It is also notable that the information of the Companies Register is itself not (and probably cannot be, period) checked or validated, despite the fact that most of it is simply based on self-declarations. There is also an issue with the lag with which information is included and updated in the Companies Register—e.g. under Spanish law, company accounts for 2021 will only have to be registered over the summer of 2022, which means that a use of the recommender in late 2022 would be relying on information that is already a year old (as the paper itself hints, at 14).

And I also have the inkling that recommender systems such as this one would be problematic in at least two aspects, even if all the necessary data was available.

The first one is that the recommender seems by design incapable of comparing the functional capabilities of companies with very different structural characteristics, unless the parameters for the filtering are given such range that the basket of recommendations approaches four digits. For example, even if two companies were the closest ones in terms of their specialist technical competence (even if captured only by the very coarse and in themselves problematic codes used in the model)—which seems to be the best proxy for identifying suitability to satisfy the functional needs of the public buyer—they could significantly differ in everything else, especially if one of them is a start-up. Whether the recommender would put both in the same basket (of a useful size) is an empirical question, but it seems extremely unlikely.

The second issue is that a recommender such as this one seems quite vulnerable to the risk of perpetuating and exacerbating incumbency advantages, and/or of consolidating geographical market fragmentation (given the importance of eg distance, which cannot generate the expected impact on eg costs in all industries, and can increasingly be entirely irrelevant in the context of digital/remote delivery).

So, all in all, it seems like the development of recommender systems needs to be flipped on its head if data availability is driving design. It would in my view be preferable to start by designing the recommender system in a way that makes theoretical sense and then make sure that the required data architecture exists or is created. Otherwise, the adoption of suboptimal recommender systems would not only likely generate significant issues of technical debt (for a thorough warning, see Sculley et al, ‘Hidden Technical Debt in Machine Learning Systems‘ (2015)), but also risk significantly worsening the quality (and effectiveness) of procurement decision-making. And any poor implementation in ‘real life’ would deal a sever blow to the prospects of sustainable adoption of digital technologies to support procurement decision-making.