DCT

1:23-cv-00816

Friendliai Inc v. Hugging Face Inc

Key Events
Complaint
complaint Intelligence

I. Executive Summary and Procedural Information

  • Parties & Counsel:
  • Case Identification: 1:23-cv-00816, D. Del., 07/28/2023
  • Venue Allegations: Venue is alleged to be proper in the District of Delaware because Defendant is a Delaware corporation and conducts business in the state, including offering products and services through its website.
  • Core Dispute: Plaintiff alleges that Defendant's Text Generation Inference server for large language models infringes a patent related to methods for dynamically batching and scheduling user requests to improve efficiency.
  • Technical Context: The technology addresses performance bottlenecks in serving large-scale generative artificial intelligence models, a critical operational challenge in the rapidly growing AI industry.
  • Key Procedural History: The complaint alleges that the patented technology was first described in a July 2022 academic paper co-authored by the inventors, titled "Orca: A distributed serving system for {Transformer-Based} generative models." Plaintiff also alleges it provided Defendant with pre-suit notice of infringement on or around July 21, 2023.

Case Timeline

Date Event
2021-12-03 U.S. Patent No. 11,442,775 Priority Date
2022-07-01 "Orca" academic paper describing the technology published
2022-09-13 U.S. Patent No. 11,442,775 Issued
2023-02-01 Accused Product (Text Generation Inference) launched (approx.)
2023-07-21 Plaintiff allegedly sent pre-suit notice of infringement to Defendant (approx.)
2023-07-28 Complaint Filed

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 11,442,775 - Dynamic Batching for Inference System for Transformer-Based Generation Tasks

  • Patent Identification: U.S. Patent No. 11,442,775, "Dynamic Batching for Inference System for Transformer-Based Generation Tasks," issued September 13, 2022 (the "'775 Patent").

The Invention Explained

  • Problem Addressed: The patent's background section describes the inefficiency of conventional batching methods for processing requests in transformer-based AI models (e.g., large language models) Compl. ¶12 These methods often require all requests in a batch to have the same data length, which can lead to wasted computational resources and increased latency when requests vary in complexity Compl. ¶12 Under this "static batching," a short request that finishes early must wait for the longest request in the batch to complete, and newly arriving requests must wait for the entire current batch to finish before being processed '775 Patent, col. 1:45-53 Compl. ¶12
  • The Patented Solution: The invention is a method for "iteration-level dynamic batching" that allows an inference system to modify a batch of requests while it is being processed '775 Patent, abstract '775 Patent, col. 2:45-52 On an iteration-by-iteration basis, the system can add new requests to the batch or remove requests that have been completed '775 Patent, col. 2:62-65 This allows for faster responses to users and prevents the processing hardware from being under-utilized, thereby increasing throughput and reducing latency '775 Patent, col. 3:3-8 The process is illustrated in Figures 5A-5D of the patent, which show new requests being added to an execution engine's queue and completed requests being removed over successive iterations.
  • Technical Importance: This approach provides a method to optimize the use of specialized AI hardware (like GPUs) when serving generative models, a critical factor in making such services commercially viable and responsive at scale Compl. ¶14

Key Claims at a Glance

  • The complaint asserts infringement of at least Claim 10 Compl. ¶42 Compl. ¶46
  • Essential elements of independent Claim 10 include:
    • Receiving one or more requests for execution by a serving system that includes a scheduler and one or more execution engines.
    • Scheduling a first batch of requests for execution on an engine.
    • Generating a first set of output tokens by applying a transformer model to the first batch.
    • Receiving a new request from a client device.
    • Scheduling a second batch of requests that additionally includes the new request, where this scheduling is done responsive to determining that the execution engine has memory available.
    • A condition where the length of the input tokens for the new request is different from the length of an input for at least one other request in the batch.
    • Generating a second set of output tokens by applying the transformer model to this new, second batch.
  • The complaint does not explicitly reserve the right to assert dependent claims, but states infringement of "one or more claims" of the '775 Patent Compl. ¶41

III. The Accused Instrumentality

Product Identification

  • The complaint identifies Defendant's "Text Generation Inference" ("TGI") as the primary accused functionality Compl. ¶26 It also names services that allegedly incorporate TGI, including Spaces, Inference Endpoints, Enterprise Hub, HuggingChat, and OpenAssistant Compl. ¶41

Functionality and Market Context

  • TGI is described as an inference server for Large Language Models ("LLMs") Compl. ¶26 The complaint alleges that an "important feature" of TGI is "continuous batching of incoming requests," which it also refers to as "dynamic batching" Compl. ¶28 Compl. ¶42 This feature is alleged to involve "regularly running queries in the same forward step of the LLM (a 'batch') and also removing them when they are finished" Compl. ¶29
  • The complaint alleges this functionality enables "increased total throughput" and allows for an "optimal balance between 'exploiting the hardware and perceived latency'" compared to prior art static batching systems Compl. ¶30 A screenshot in the complaint from Defendant's documentation highlights TGI's ability to enable "high throughput" and "low latency" Compl. ¶31, Ex. 11 Another screenshot lists "Continuous batching of incoming requests for increased total throughput" as a key feature Compl. ¶42, Ex. 10

IV. Analysis of Infringement Allegations

The complaint references an infringement chart (Exhibit 2), which is provided with the filings and summarized below.

'775 Patent Infringement Allegations

Claim Element (from Independent Claim 10) Alleged Infringing Functionality Complaint Citation Patent Citation
A non-transitory computer-readable storage medium storing computer program instructions executable to perform operations for dynamically executing batches of requests on one or more execution engines running a machine-learning transformer model, the operations comprising: Defendant's TGI is alleged to be software that performs operations for dynamically executing batches of requests on execution engines running transformer models. ¶42 col. 28:55-58
receiving, by a serving system, one or more requests for execution, the serving system including a scheduler and one or more execution engines each coupled to access a machine-learning transformer model including at least a set of decoders; The TGI system allegedly receives requests for execution and includes a scheduler and execution engines that access transformer models with decoders. ¶42 col. 3:9-12
scheduling, by the scheduler, a batch of requests including the one or more requests for execution on an execution engine; The TGI scheduler allegedly schedules an initial batch of requests for execution. ¶42 col. 3:12-14
generating, by the execution engine, a first set of output tokens by applying the transformer model to a first set of inputs for the batch of requests, wherein applying the transformer model comprises applying at least one batch operation to one or more input tensors associated with the batch of requests; The TGI execution engine allegedly applies the transformer model to the initial batch to generate a first set of output tokens using a batch operation. ¶42 col. 3:14-20
receiving, by a request processor, a new request from a client device, the new request including a sequence of input tokens; The TGI system allegedly receives a new request containing input tokens while the first batch is processing. ¶42 col. 3:21-22
scheduling, by the scheduler, a second batch of requests additionally including the new request for execution on the execution engine, the second batch of requests scheduled responsive to determining that the execution engine has memory available to execute the second batch of requests... The TGI scheduler allegedly merges the new request into the currently processing batch, thereby scheduling a new, second batch for execution, and does so when memory is available. ¶42 col. 3:22-28
...wherein in a second set of inputs for the second batch of requests, a length of the sequence of input tokens for the new request is different from a length of an input for at least one request other than the new request; and TGI's continuous batching feature is alleged to allow for new requests of different lengths to be processed in the same batch as existing requests. ¶42 col. 2:4-7
generating, by the execution engine, a second set of output tokens by applying the transformer model to the second set of inputs for the second batch. The TGI execution engine allegedly generates a second set of output tokens by applying the model to the newly constituted second batch. ¶42 col. 3:28-31

Identified Points of Contention

  • Scope Questions: The analysis may turn on the definition of "scheduling... a second batch." A question for the court could be whether dynamically adding a new request to an in-flight set of requests constitutes scheduling a "second batch," or if it is merely a modification of the first batch. The patent's description of iteration-level updates may support the plaintiff's view, while a defendant might argue for a more formal distinction between a "first" and "second" scheduled batch.
  • Technical Questions: A central evidentiary question will be whether TGI's scheduling is "responsive to determining that the execution engine has memory available" as required by the claim. The complaint alleges this occurs Compl. ¶42, Ex. 2, but the factual basis for this specific mechanism in the accused TGI product will require technical evidence beyond the marketing materials provided.

V. Key Claim Terms for Construction

The Term: "scheduling, by the scheduler, a second batch of requests additionally including the new request"

  • Context and Importance: This term is the core of the dynamic aspect of the invention. Its construction will determine whether adding a request to an in-progress workload constitutes infringement. Practitioners may focus on this term because the distinction between "modifying a batch" and "scheduling a second batch" could be a central non-infringement argument.
  • Intrinsic Evidence for Interpretation:
    • Evidence for a Broader Interpretation: The patent abstract describes the invention as allowing the system to "dynamically modify a batch" by "adding new incoming requests" '775 Patent, abstract This language suggests that the act of adding a request creates a new batch definition, aligning with the "second batch" language.
    • Evidence for a Narrower Interpretation: The patent describes the process as updating a batch "between iterations" '775 Patent, col. 2:60-62 and the flowchart in Figure 7 shows two distinct "Schedule a batch" steps (712 and 718). This could support an argument that a "second batch" requires a new, discrete scheduling event, rather than a continuous modification.

The Term: "responsive to determining that the execution engine has memory available"

  • Context and Importance: This limitation defines the condition that triggers the scheduling of the new request. Proving infringement requires showing that TGI performs this specific check.
  • Intrinsic Evidence for Interpretation:
    • Evidence for a Broader Interpretation: The patent explains that dynamic batching "allows the inference system to add new requests to a batch of requests if the execution engine is being under-utilized," which implies a general awareness of available capacity ('775 Patent, col. 3:6-8).
    • Evidence for a Narrower Interpretation: The detailed description specifies that the scheduler "monitors the cache memory" and updates the batch "Responsive to determining that execution engine... has cache memory available" '775 Patent, col. 23:26-29 This language suggests an active, specific step of checking memory status before scheduling.

VI. Other Allegations

Indirect Infringement

  • The complaint alleges that Defendant induces infringement by providing TGI and related documentation, instructions, and code to customers, with the knowledge and intent that their use will infringe the '775 Patent Compl. ¶45

Willful Infringement

  • The complaint alleges that Defendant's infringement has been willful since at least the launch date of TGI and no later than July 21, 2023, when Plaintiff allegedly sent a notification letter Compl. ¶49 The willfulness claim is further supported by the allegation that Defendant was aware of and copied the technology from the inventors' "Orca paper" Compl. ¶35

VII. Analyst's Conclusion: Key Questions for the Case

  • A core issue will be one of functional specificity: Does Defendant's "continuous batching" feature in TGI operate in a manner that meets the specific, ordered steps of Claim 10? In particular, what technical evidence will show that scheduling a new request is performed "responsive to determining that the execution engine has memory available," as opposed to being based on other scheduling logic?
  • A key legal question will be one of claim scope: Can the phrase "scheduling... a second batch" be construed to read on a system that dynamically adds new requests to a single, continuously modified processing queue? The resolution of this construction will be pivotal to the infringement analysis.
  • A significant question regarding damages and willfulness will be one of knowledge and intent: What evidence exists to support the allegation that Defendant was aware of the inventors' "Orca paper" and copied the described technology when developing TGI? The answer could substantially impact a potential willfulness finding.