Implications of the General Data Protection Regulation (GDPR) for Detecting Infringement of Artificial Intelligence (AI) Patents
Why is the ability to detect patent infringement important?
R. Free, L. Pugh (GB)
New AI technology typically comprises new algorithms which express new ways of learning, new ways of representing data, new ways of searching through large search spaces to find solutions and other processes which enable AI technology to act in intelligent ways. Helping clients to protect this type of technology using patents is challenging in a number of ways, one of which is that it is often very difficult to detect infringement of patent claims which contain details of AI algorithms.
A patent claim sets out the scope of monopoly held by the patent owner and generally speaking, can be thought of as a list of features, such as a list of things to do in the case of an invention which is a method. Generally speaking, a competitor infringes the method claim when the competitor does all the things in the list. The things in the list are referred to as features. Features of the algorithms relating to inputs to the algorithms and outputs of those algorithms are features which can sometimes be observed in competitor products, from application programming interfaces, or found from product literature. However, features about types of computation and types of representation used by the algorithms are much harder to detect in competitor products. Sometimes an educated guess can be made that a competitor product is likely to be using a particular type of algorithm, but to be certain of this is often not possible. As a result the value of the AI algorithm patent may be significantly reduced because the patent cannot be effectively exploited through licensing. Turning to trade secrets as an alternative form of protection is often not possible where for commercial reasons the details of the algorithms are made public.
One option for applicants is to try to reduce the features of the algorithm in the patent claim which are not easy to detect and instead try to include features related to the application domain (i.e. the task the AI is being used for), any observable user inputs, and any observable data or sensor inputs and outputs to the algorithm from other sources. However, often the application domain itself is not a technical one and so the applicant is forced into finding a technical problem and solution within the algorithmic detail. Examples of non-technical application domains include online advertising, linguistic processing and presentation of information.
Why is the GDPR potentially relevant for detecting infringement of AI patents?
The GDPR is new European Union law in relation to the processing of personal data and from 25 May 2018 applies across the EU.
The principles of the GDPR include that personal data shall be processed lawfully, fairly and in a transparent manner (see GDPR Article 5). The transparency requirement means that a data controller has to disclose various information and there could be a possibility that the disclosed information is helpful for detecting patent infringement. The disclosed information is made without a duty of confidence.
Generally speaking, the definition of personal data in the GDPR is very broad. Personal data is information that relates to an identified or identifiable individual (a so called 'data subject') and may include data identifying a person, such as a name, internet protocol address or telephone number. Where personal data is collected from a data subject, the data controller (being the person that determines the purposes and means of processing of the personal data) is obliged by GDPR Article 13 to provide the data subject with various information at the time the personal data is collected. In certain circumstances, the information to be provided includes "the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject".
Article 22 of the GDPR is about automated decision-making. Article 22 states in paragraph 1 that, "The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her." A data subject is able to give his or her explicit consent so that the prohibition in paragraph 1 is lifted. Important things to note include that Article 22 has the word "solely" so that it reads "a decision based solely on automated processing". Also, it says that the decision is one "which produces legal effects concerning him or her or similarly significantly affect him or her".
Given that the GDPR sets out that data controllers must disclose "meaningful information about the logic involved" in certain circumstances it could be that such disclosures are useful for detecting infringement of algorithm patents.
How should we interpret "meaningful information about the logic involved" in the GDPR?
There are various documents available to help us interpret the wording in the GDPR and these include the Guidelines on Automated Individual Decision-making and Profiling for the Purposes of Regulation 2016/679, last revised and adopted on 6 February 2018, by the Article 29 Data Protection Working Party; and the UK Information Commissioner's Office detailed guidance on automated individual decision-making and profiling which was published on 23 May 2018 (these two documents are referred to herein as the "Guidance Documents"). The Guidance Documents suggest that it is not necessary to disclose the full details of an AI algorithm as a result of the GDPR. However, it is necessary to disclose some details of the AI algorithm as explained in the next section of this document.
Not necessary to disclose the full details of an AI algorithm
The Guidance Documents make it clear that a data controller does not have to disclose the source code of the AI algorithm, does not have to give a complex explanation of the algorithms used, and does not have to disclose the full algorithm. Practically, a lay person is not going to be able to understand a complex explanation or source code anyway. Also, the authors of the Guidance Documents were presumably aware that scientists currently have no good way to explain the predictions computed by deep neural networks.
What does have to be disclosed?
The following list of what has to be disclosed has been compiled by the author from the Guidance Documents. The list uses verbatim wording from the Guidance Documents where possible and contains duplication and overlap, since as many relevant extracts from the Guidance Documents as possible have been included. The circumstances in which disclosure has to be made are discussed later.
- The criteria relied on in reaching the decision
- The rationale behind the decision
- Information which is sufficiently comprehensive for the data subject to understand the reasons for the decision
- Meaningful information about the logic involved
- The likely consequences for individuals
- Why the data controller is using the automated decision-making process and the likely results
- Categories of data that have been or will be used in the profiling or decision-making process
- Why these categories are considered pertinent
- How any profile used in the automated decision-making process is built including any statistics used in the analysis
- Why the profile is relevant to the decision-making process
- How the profile is used for a decision concerning the data subject
- Controllers may wish to consider visualization and interactive techniques to aid algorithmic transparency
- The type of information collected or used in creating a profile or making an automated decision
- Why this information is relevant
- What the likely impact is going to be/how it's likely to affect them
In the Guidance Documents there is an example given about a data controller who uses credit scoring to assess and reject an individual's loan application. The score is computed automatically based on information held by the data controller. The example goes on to state that the if the credit score is used to reject an individual's loan application, then the data controller is obliged to explain that the scoring process helps them to make fair and responsible lending decisions (i.e. the rationale behind the decision) and that the data controller should provide details of the main characteristics considered in reaching the decision, the source of the information and the relevance.
In the example it could be that automated statistical rules are used without any AI technology. However, it could also be that a neural network is used to predict the credit score. Thus the example is relevant both in the case that machine learning is involved and in the case where there is no machine learning involved. The example is useful to help understand what information will be disclosed. A discussion of how the disclosed information may help with detecting patent infringement is given later in this document.
What are the conditions which have to apply in order for a data controller to disclose "meaningful information about the logic involved" to the data subject?
At least the following conditions have to apply:
- an automated decision using personal data;
- the decision is solely automated;
- the decision has a legal or similarly significant effect on the data subject; and
- the decision is necessary for entering into, or performance of, a contract between the controller and data subject; the decision is authorized by EU or Member State law; or the decision is based on the individual's explicit consent.
With regards to "solely automated" note that it is not enough to have a cursory human review of the output of the AI algorithm. As explained in the Guidance Documents, to avoid "solely automated" a human needs to "weigh up and interpret the results of an automated decision before applying it to the individual". Also, a process is still "considered solely automated if a human inputs the data to be processed, and then the decision-making is carried out by an automated system".
Going forwards it is likely that automated decision-making will be increasingly used in situations where the above conditions do apply, however, in cases of contract and explicit consent, human intervention does still have to be available on request of the data subject. This increase is because of the business case involved, in particular around cost savings. In addition in the case of the use of AI, AI is already more accurate than humans at many tasks and this will increase in terms of the variety of tasks where AI outperforms humans and the level of outperformance.
Suppose a new algorithm is created which learns from many recordings of calls to an alcoholic beverage delivery service. The calls are labelled as being from adults or children, and consent has been obtained to use the data. The algorithm is able to learn from the examples and to generalize its learning so that it can predict with accuracy, whether a new incoming call is from a person who is old enough to legally buy alcoholic drinks.
Once the algorithm has been trained it is then used as part of a delivery service selling alcoholic drinks. When a customer calls the service the customer is asked to give consent to an automated decision being made as to whether the customer is old enough to purchase alcoholic drinks. If consent is given, the incoming call is then used by the trained algorithm to predict the age of the customer and make the automated decision. In this case the GDPR provisions regarding disclosing "meaningful information about the logic involved" arguably apply, since the decision is solely automated, uses data which identifies a person (his or her voice), has a significant effect on the person (ability to buy a product), and is made with the consent of the user.
Suppose there is a patent protecting the technology. The details of the patent claim include that the algorithm learns using features of the calls, including the voice of the caller, the geographical location of the originating phone and the time of day of the call. If a competitor launches a similar service then it is very difficult to tell whether the competing service infringes the patent claim because it is not known whether the same features are used. However, the GDPR requires that the criteria for the decision are disclosed, and the categories of data used are disclosed. Therefore there is a strong argument that infringement of the patent will be found through disclosure of the "meaningful information about the logic involved" to the data subject.
If we modify this example slightly we can see a situation where the GDPR is less helpful for detecting patent infringement. Suppose the artificial intelligence has learnt using features of the voice recordings computed from fast Fourier transforms of the voice signals and without using any other categories of data except the age of the callers. In this case the GDPR requires that the data controller discloses that the algorithm has reached its decision using features of the voice recordings computed from the voice signals themselves. This is not enough to detect infringement since there are many ways to compute features from an audio signal and the data controller does not have to disclose the particular details of the algorithm according to the Guidance Documents.
How to find a practical way for data controllers to give "meaningful information about the logic involved"?
The authors of the Guidance documents had the difficult problem of how to find a practical way for data controllers to give "meaningful information about the logic involved" to members of the public. Providing source code is not useful because often, even to expert programmers, messy source code is difficult to interpret. Providing algorithm design documents would explain the logic involved but is likely to bamboozle the lay member of the public. Even more difficult is the situation where deep neural networks are used where scientists currently have no easy ways of explaining the computed decisions. However, one approach which is more pragmatic and practical is the counterfactual approach which has seemingly been followed, at least in part, in the Guidance Documents mentioned above.
The counterfactual approach
Under the counterfactual approach the term "meaningful information about the logic involved" is interpreted as giving information to the data subject to enable him or her to understand what things the data subject needs to change in order to obtain a different outcome of the automated decision-making process. The Guidance Documents go some way towards the counterfactual approach because they speak about giving information to the data subject about the rationale, and about the categories of data used. If the categories of data are known, the data subject can think about how to modify his or her data within those categories in order to obtain a different outcome of the automated decision.
For the purposes of detecting patent infringement, the counterfactual approach is not as useful an approach as disclosing the full algorithm. However, the counterfactual approach is still useful, especially if more than one observation is taken into account. That is, suppose I collect automated decision outcomes made by a competitor service. I collect data about automated decision outcomes over different data subjects, over different times, and over different values of the personal data in the disclosed categories. I potentially collect a very large number of sets of values in this manner and then use them to infer how the automated decision was computed. However, collecting data about multiple automated decision outcomes will be time-consuming and costly. Making the inferences will also be time-consuming and costly.
Tips for patent drafting
To make patent claims more valuable we can think about including features which are likely to be disclosed as a result of the GDPR requirements and minimizing the use of features which relate to mathematical or computational detail. Features likely to be disclosed pursuant to the GDPR will be features concerning the categories of data used by the artificial intelligence algorithm, the criteria used, and features about the overall application domain (i.e. the reason for the automated decision-making process).
In conclusion, the GDPR will be useful for detecting patent infringement in cases where patents are directed to AI algorithms for making automated decisions, and where the requirements of the GDPR to disclose "meaningful information about the logic involved" apply. Information about categories of data used by algorithms and about rationale for automated decisions is publicly disclosed in some situations and can be used to help detect patent infringement. Collecting multiple sets of data about observed automated decisions and using them to infer how the automated decision algorithm works in detail is potentially possible, but will be costly and time intensive.
Dr Rachel Free is a UK and European patent attorney with an MSc in Artificial Intelligence and Dr Loretta Pugh is a partner and solicitor specialising in technology and data protection law; they are both at CMS Cameron McKenna Nabarro Olswang LLP in London. See more at cms.law. Please note that this article was first published in the CIPA journal July-August 2018 volume 47, number 7-8.