T 0161/18 brings to the fore the requirement of disclosing training data in AI case

F. Hagel (FR)

1. Introduction

The EPO Technical Board of Appeal 3.5.05 has issued on May 12, 2020 a decision T0161/18 Äquivalenter Aortendruck/ARC SEIBERSDORF which relates to a method involving an artificial neural network and deserves great attention. This decision upholds a rejection from the Examining Division for lack of inventive step under Article 56 EPC but more importantly, adds a rejection for lack of sufficient disclosure of training data under Article 83 EPC.

2. The decision

Claim 1 of the application relates to a method for determining cardiac output from an arterial blood pressure curve measured at the periphery, in which such curve is transformed into the central blood pressure curve by the aid of an artificial neural network whose weighting values are determined by learning, and the cardiac output is calculated from the central blood pressure curve.

We will focus this comment on reason 2 of the decision which holds that the application is rejected for lack of sufficient disclosure under Article 83 EPC. Reason 3 upholds the rejection of the Examining Division under Article 56 EPC on the basis of prior art documents.

The Board in reason 2.2 asserts that the application fails to disclose the input data to be used for training the neuronal network or at least a dataset enabling the technical problem to be solved. The Board specifically notes that the application only mentions that the input data must cover a broad spectrum of patients of different ages, sexes, constitutional types, health conditions and the like to avoid specialisation of the network. The Board states on the basis of this lack of disclosure of the input data that the skilled person cannot carry out the training of the network and concludes in reason 2.4 that the application does not meet the requirement of Article 83 EPC. The rejection applies to the application as a whole, not to specific claims.

The Board’s focus on the requirement of Article 83 EPC must be considered meaningful from a policy perspective in that it keeps away from relating the disclosure requirement to the patentability conditions of Articles 52-57 EPC, especially the prerequisite of "technical content” established by the jurisprudence of the Boards of Appeal or the inventive step condition of article 56 EPC, as it has been the case in some Board of Appeal decisions^{Hagel, F. Bombshell Decision T 2101/12 (Vasco) questions the technical/non-technical distinction – epi information 2/19}. This is important because a sufficient disclosure enabling the invention to be carried out is a quid pro quo for the award of an exclusive right to an invention meeting the patentability conditions and thus fulfills a public policy objective, the diffusion of information and knowledge to the public. to be distinguished from that of the patentability conditions.

3. Background regarding review of Article 83 EPC at the EPO - General

Decision T 0161/18 appears as a milestone in that it is to our knowledge the first Board of Appel decision to reject an application for lack of disclosure of training data in an AI case. In addition, it is unusual in that the Board has raised the ground of insufficient disclosure under article 83 EPC of its own motion.
The decision comes out against the contrasted background of review by the EPO Examining Divisions of compliance with Art 83 EPC. There is a marked difference on this issue between the pharmaceutical & chemical sectors and the other sectors.

In the pharmaceutical & chemical sectors, the claimed invention frequently encompasses a broad family of species for which a desirable effect is to be attained, and because there is no logical reasoning linking the structure of a given species (such as a molecule or a value of a parameter within a range) and the effect, it must be assessed whether it is plausible (or credible) for the desired effect to be achieved across the entire family recited in the claim. The review is based on detailed examples and relies on compliance with Articles 83 and/or 84 EPC.

In other sectors (mechanical/physical/telecom & computer technology), compliance with Article 83 EPC is generally given scant attention by the Examining Divisions. We had mentioned this situation back in 2008^{Hagel, F. Quality of patents : a matter of information inputs – epi information 2/2008} and stressed that a sufficient disclosure is a key ingredient of patent quality. Frequently, Article 83 issues when they occur are raised by third parties in opposition proceedings or observations under Article 115 EPC, for which Article 83 EPC issues are explicitly encouraged by Guidelines E-VI, 3^{Guidelines E-VI, 3 states : “Although lack of novelty and/or inventive step are the most common observations, third-party observations may also be directed to clarity (Art. 84), sufficiency of disclosure (Art. 83), patentability (Art. 52(2) and Art. 52(3), Art. 53 or Art. 57) and unallowable amendments (Art. 76(1), Art. 123(2) and Art. 123(3)).” It is of note that the 2019 issue of the case law of the Boards of Appeal contains a narrower, literal interpretation of Article 115 EPC which only refers to patentability conditions of Articles 52-57 EPC.}. A rationale here is that third parties engaged in the field of the invention typically possess highly specific expertise, which allow them to spot insufficient disclosures not easily detected by Examiners. The result anyhow is that patents may sometimes be granted in spite of a total lack of relevant disclosure on a critical component, favouring speculative patenting.

Previous commenters have consistently emphasised^{Jones, S. Patentability of AI and machine learning at the EPO – Kluwer Patent Blog December 21, 2018 at http://patentblog.kluweriplaw.com/2018/12/21/patentability-of-ai-and-machine-learning-at-the-epo} ^{Read, H. Artificial intelligence and machine learning : sufficiency and plausibility – June 12, 2019 at https://www.appleyardlees.com/artificial-intelligence-and-machine-learning-sufficiency-and-plausibility} ^{AIPPI 2019 Resolution plausibility, Background #2 at https://aippi.org/wp-content/uploads/2020/05/Resolution_Patents_Plausibility_English.pdf.} that the plausibility issue which is common in the pharmaceutical & chemical sectors as recalled above also arises in AI cases. This is because the dynamic and unpredictable behaviour and blackbox character of an AI tool once trained by means of training data open up the question of the reproducibility and reliability of the purported effect.

4. Review of Article 83 EPC at the EPO in AI cases

A statement of the EPO alongside the other IP5 Offices^{IP5 is a forum of the five largest intellectual property offices in the world. The five patent offices are the US Patent and Trademark Office (USPTO), the European Patent Office (EPO), the Japan Patent Office (JPO), the Korean Intellectual Property Office (KIPO), and the National Intellectual Property Administration (CNIPA formerly SIPO) in China.} regarding disclosure requirements in AI cases can be found in the European Patent Office
Report from the IP5 expert round table on artificial intelligence, Munich, 31 October 2018.
It reads as follows :

9. The requirement of sufficiency of disclosure remains fully applicable in all IP5
jurisdictions and can be met, for example, when the applicant discloses how the
model was trained and provides the data used for training. Elements which can
be expected to be known to a skilled person (e.g. how a computer works) may not
need to be disclosed.

10. The applicant is required to fully disclose the claimed invention. If the inventive
contribution is in the algorithm, the latter must be disclosed. If the contribution lies
in the use of data and the algorithm is not part of the invention, then the algorithm
may not need to be disclosed.

11. All IP5 Offices have strict disclosure requirements, including reproducibility and
repeatability. However, the application of the requirement of sufficiency of disclosure allows for some flexibility.

This statement affirming « strict disclosure requirements » strongly departs from a benign neglect attitude. However, this is easier said than done. The latest update of the Guidelines (November 2019) sheds no light on the disclosure requirement in AI cases, it only includes in G-II 3.3.1 an addition regarding the presence of « technical character » in an AI invention.

The statement refers to algorithms and training data, but there are numerous areas in AI inventions requiring sufficient disclosure, inter alia, the structure of the AI model, the training process, the setting of the model’s coefficients, and the disclosure regarding the input data (selection of sources, classification, labelling of data) raises difficult questions^{Van der Heijden, H. AI inventions and sufficiency of disclosure – when enough is enough NLO – IAM Yearbook 2020 at https://www.iam-media.com/ai-inventions-and-sufficiency-disclosure-when-enough-enough}.

It is to be mentioned in this respect that the preliminary communication dated January 24, 2020 issued by the Board in the case of decision T0161/18 referred in point 4.2 to other features than training data (configuration of the layers of the neural network, activation functions, adjustment of weighing coefficients of the network). While the decision did not retain these aspects, this must be kept in mind.

It is also noteworthy that the Resolution 2020 « Inventorship of inventions made using Artificial Intelligence » adopted by AIPPI World Congress on October, 2020 lists in points 4.a to 4.e the following contributions the author of which is to be considered inventor : use of an AI algorithm to design a particular type of product or process, design of an AI algorithm, selection of a data source for training an AI algorithm, selection or generation of data or data source for input to a trained AI algorithm, recognition that an output of an AI algorithm constitutes an invention. Such contributions since inventive are by definition beyond the purview of a person skilled in the art and raise a need for a sufficient disclosure.

The future development of the Boards of Appeal jurisprudence is certain to provide guidance as to which disclosure passes muster for compliance with the sufficiency requirement of Article 83 EPC. It can be expected that future updates of the Guidelines will incorporate the Board’s reasoning and hopefully provide additional insights. This will be especially helpful to practitioners who need to satisfy the requirement of Article 83 EPC as to training data without providing public access to the data, such public access being generally both impractical and undesirable to applicants for confidentiality concerns. This would meet the objective of flexibility as set out in the IP5 Offices statement cited above.

In the past, a similar question arose for inventions using software. The insertion of the source code of the software was sometimes used but then it was considered sufficient to disclose the architecture of the software and the sequence of operations in such detail that a programmer could write the source code. In the case of training data for AI, the sufficiency requirement could be considered met if the application discloses the methodologies for the selection of data sources and processing of data which are specifically adapted to enable the skilled person to prepare training data relevant to the objective – following the ancient Chinese saying: “give a man a fish you feed him for a day. Teach a man to fish and you feed him for a lifetime”.

Meantime, it is up to the EPO management to enhance the expertise of Examiners in AI technology. As explained above, this is critical for a proper assessment of the disclosure requirement and it is all the more necessary as AI technology has been moving quickly and it is implemented in constantly broadening fields of technology.

5. Training data : GIGO — or “garbage in, garbage out”

The significance of the disclosure requirement of training data for AI emphasised by decision T0161/18 cannot be overstated.

As stated by an AI expert^{ODSC Open Data Source October 24, 2019 Garbage In, Garbage Out : Automated Machine Learning Begins with Quality Data at https://medium.com/@ODSC/garbage-in-garbage-out-automated-machine-learning-begins-with-quality-data-70471cb33748}, "GIGO — or ‘garbage in, garbage out’ — has been the programmer’s mantra since the dawn of computing, with meaning that computers and systems are only as good as the information that is fed into them.

It’s no secret that machine learning methods are highly dependent on the quality of the data they receive as input. If you think of machine learning as a manufacturing process, the higher the quality of the input data, the more likely it is that the final product is of high quality as well. This relationship presents a big challenge to analytics teams when it comes to figuring out the right data for helping to solve business problems. It is necessary for those teams to prepare all datasets to achieve a machine learning process free of errors. This involves setting up quality standards and fixing data issues like missing values or columns with low statistical variance, as well as selecting the right data types, removing duplicate data, and more. Automated machine learning can assist with this.

According to the CrowdFlower survey, data preparation and cleaning take roughly 60% of the time of data scientists and analytics professionals. This does not take into account the time needed to first collect and aggregate the required data for the problem at hand. However, data preparation is critical, as the efficacy of machine learning algorithms directly depends on the quality of the inputs as well as their relevance to the use case.”

Other AI experts tellingly depict training data as “the Achilles’ heel of AI”^{Schmelzer, R. The Achilles’ heel of AI, March 7, 2019 at https://www.forbes.com/sites/cognitiveworld/2019/03/07/the-achilles-heel-of-ai/#273c53927be7". The author provides a detailed analysis of the tasks involved in the preparation of training data.} or “the lifeblood of AI”^{Menendez, C. Data is the lifeblood of AI, but how do you collect it? Infoworld August 8, 2018 at https://www.infoworld.com/article/3296044/data-is-the-lifeblood-of-ai-but-how-do-you-collect-it.html.}.

Conclusion

Decision T0161/18 deserves great attention from practitioners as the Board has raised of its own motion the requirement of Article 83 EPC for the lack of disclosure of training data in an AI case. Future decisions will no doubt provide guidance as to how satisfy this requirement, a prominent issue in AI cases given the criticality of training data for the efficient operation of an AI tool and it can be expected that future revisions of the Guidelines will incorporate insights on this issue.