DCT

1:24-cv-01279

VB Assets LLC v. Soundhound Ai Inc

Key Events

Amended Complaint

complaint Intelligence

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: VB Assets, LLC (Delaware)
- Defendant: SoundHound AI, Inc. (Delaware)
- Plaintiff's Counsel: Farnan LLP
Case Identification: 1:24-cv-01279, D. Del., 03/06/2026
Venue Allegations: Venue is alleged to be proper in the District of Delaware based on Defendant SoundHound AI, Inc.'s residence in the district as a Delaware corporation.
Core Dispute: Plaintiff alleges that Defendant's voice artificial intelligence platforms and enterprise solutions infringe eight U.S. patents related to conversational voice user interfaces, voice-based advertising, and voice commerce.
Technical Context: The technology at issue resides in the field of conversational artificial intelligence and natural language understanding, a commercially significant market for in-vehicle systems, restaurant services, and consumer applications.
Key Procedural History: The complaint alleges that Plaintiff provided Defendant with notice of infringement of its patent portfolio on November 13, 2024, prior to filing the original complaint on November 21, 2024. The complaint also notes that during the prosecution of one patent-in-suit, the applicant distinguished the invention over prior art by emphasizing its ability to select a product from a database based on a "single user input," a point which may be relevant to claim construction.

Case Timeline

Date	Event
2006-10-16	Earliest Priority Date ('681, '626, '699, '249 Patents)
2007-02-08	Earliest Priority Date ('536, '097, '176 Patents)
2010-10-19	Issue Date (U.S. 7,818,176)
2011-12-06	Issue Date (U.S. 8,073,681)
2012-01-09	VoiceBox and Toyota strategic relationship announced
2014-09-16	Earliest Priority Date ('385 Patent)
2014-11-11	Issue Date (U.S. 8,886,536)
2016-02-23	Issue Date (U.S. 9,269,097)
2019-05-21	Issue Date (U.S. 10,297,249)
2020-08-25	Issue Date (U.S. 10,755,699)
2021-08-10	Issue Date (U.S. 11,087,385)
2022-01-11	Issue Date (U.S. 11,222,626)
2024-11-13	Plaintiff sends notice of infringement to Defendant
2024-11-21	Plaintiff files original complaint
2025-01-07	SoundHound AI announces in-vehicle voice commerce demo at CES 2025
2025-01-30	Plaintiff files First Amended Complaint
2026-03-06	Plaintiff files Second Amended Complaint

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 8,073,681 - "SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE"

Patent Identification: U.S. Patent No. 8,073,681, "SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE," issued December 6, 2011 (the "'681 Patent").

The Invention Explained

Problem Addressed: The patent addresses the limitations of prior "Command and Control" voice user interfaces, which forced users to memorize rigid speech prompts and navigate complex verbal menus, inhibiting widespread adoption of speech-recognition systems Compl. ¶¶12, 32-33
The Patented Solution: The invention proposes a "cooperative conversational" model where a "conversational speech engine" interprets natural, free-form human speech Compl. ¶34 '681 Patent, col. 2:11-20 This engine accumulates "short-term" knowledge from the current conversation and "long-term" knowledge from past conversations to understand the user's intent, disambiguate words with multiple meanings, and generate an adaptive response Compl. ¶25 '681 Patent, abstract Figure 1 of the patent illustrates an architecture where an input is processed by a speech recognizer and a conversational language processor that communicates with databases to form an output '681 Patent, Fig. 1
Technical Importance: This approach allowed users to interact with voice systems more naturally, as if in a human-to-human conversation, thereby improving the functionality and user experience of such systems Compl. ¶36

Key Claims at a Glance

The complaint asserts at least independent claim 25 Compl. ¶25
The essential elements of claim 25 include:
- A voice input device configured to receive an utterance with one or more words having different meanings in different contexts.
- A conversational speech engine with one or more processors configured to:
  - accumulate short-term shared knowledge about the current conversation;
  - accumulate long-term shared knowledge about the user from past conversations;
  - identify a context for the utterance from the short-term and long-term knowledge;
  - establish an intended meaning for the utterance within that context to disambiguate the user's intent; and
  - generate a grammatically or syntactically adapted response based on the intended meaning.
The complaint states it anticipates identifying additional asserted claims Compl. ¶89

U.S. Patent No. 11,222,626 - "SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE"

Patent Identification: U.S. Patent No. 11,222,626, "SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE," issued January 11, 2022 (the "'626 Patent").

The Invention Explained

Problem Addressed: Like the '681 Patent, this patent addresses the rigidity of prior "Command and Control" voice systems that failed to provide a seamless conversational experience (Compl. ¶¶32-33).
The Patented Solution: The invention describes a system for facilitating natural language responses that uses a "context stack" generated from prior utterances Compl. ¶27 The system tracks contexts from a series of utterances, generates a stack of these contexts in reverse chronological order, and then compares a new utterance against the contexts in the stack to determine the correct interpretation '626 Patent, abstract '626 Patent, col. 4:15-33 This allows the system to maintain conversational state and correctly interpret user inputs that depend on prior turns in the conversation Compl. ¶27
Technical Importance: The use of a structured context stack provides a specific method for improving how a voice system handles multi-turn dialogues and resolves ambiguity based on recent conversational history Compl. ¶¶27, 35

Key Claims at a Glance

The complaint asserts at least independent claim 10 Compl. ¶27
The essential elements of claim 10 include:
- One or more physical processors configured to:
  - track a series of contexts from a series of natural language utterances in a conversation;
  - generate a context stack based on the tracked contexts, with the contexts listed in reverse chronological order;
  - receive a third natural language utterance;
  - determine whether the third utterance corresponds to a context in the stack by comparing it to the contexts in their stacked order; and
  - interpret the third utterance using the corresponding context if a match is found.
The complaint states it anticipates identifying additional asserted claims Compl. ¶95

U.S. Patent No. 10,755,699 - "SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE"

Patent Identification: U.S. Patent No. 10,755,699, "SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE," issued August 25, 2020 (the "'699 Patent").
Technology Synopsis: The '699 Patent claims a system for generating natural language responses that are adapted based on a user's manner of speaking. The system identifies the manner in which an utterance was spoken by using both short-term knowledge from the current conversation and long-term knowledge from prior conversations, and generates a response based on this identified manner Compl. ¶29
Asserted Claims: At least independent claim 12 is asserted Compl. ¶29
Accused Features: The complaint alleges that the Accused Products' ability to generate adapted responses infringes the '699 Patent Compl. ¶¶65, 101

U.S. Patent No. 10,297,249 - "SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE"

Patent Identification: U.S. Patent No. 10,297,249, "SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE," issued May 21, 2019 (the "'249 Patent").
Technology Synopsis: The '249 Patent claims a system that facilitates natural language responses using short-term knowledge generated from prior multi-modal device interactions. The system receives and compares voice inputs from multiple input devices, filters sound, and generates short-term knowledge based on both voice and non-voice inputs (e.g., user interface state) to determine a context and interpretation for an utterance Compl. ¶31
Asserted Claims: At least independent claim 16 is asserted Compl. ¶31
Accused Features: The Accused Products are alleged to infringe by providing voice AI on a variety of devices that support multi-modal interactions Compl. ¶¶64, 107

U.S. Patent No. 8,886,536 - "SYSTEM AND METHOD FOR DELIVERING TARGETED ADVERTISEMENTS AND TRACKING ADVERTISEMENT INTERACTIONS IN VOICE RECOGNITION CONTEXTS"

Patent Identification: U.S. Patent No. 8,886,536, "SYSTEM AND METHOD FOR DELIVERING TARGETED ADVERTISEMENTS AND TRACKING ADVERTISEMENT INTERACTIONS IN VOICE RECOGNITION CONTEXTS," issued November 11, 2014 (the "'536 Patent").
Technology Synopsis: The '536 Patent describes a system for delivering promotional content within a voice interface. It processes a user utterance by determining "domain information" and using multiple "domain agents" to obtain different interpretations, ultimately determining the final interpretation and selecting promotional content based upon it Compl. ¶39 This approach is presented as an improvement over difficult-to-use, menu-driven voice systems Compl. ¶44
Asserted Claims: At least independent claim 32 is asserted Compl. ¶39
Accused Features: The Accused Products allegedly infringe through their voice commerce ecosystem, which provides responses and potentially promotional content based on interpreting user requests Compl. ¶¶65, 72, 113

U.S. Patent No. 9,269,097 - "SYSTEM AND METHOD FOR DELIVERING TARGETED ADVERTISEMENTS AND/OR PROVIDING NATURAL LANGUAGE PROCESSING BASED ON ADVERTISEMENTS"

Patent Identification: U.S. Patent No. 9,269,097, "SYSTEM AND METHOD FOR DELIVERING TARGETED ADVERTISEMENTS AND/OR PROVIDING NATURAL LANGUAGE PROCESSING BASED ON ADVERTISEMENTS," issued February 23, 2016 (the "'097 Patent").
Technology Synopsis: The '097 Patent claims a system for natural language processing based on advertisements. After presenting an advertisement, the system receives a user utterance and interprets it based on the advertisement, including determining if a pronoun in the utterance refers to the advertised product or service Compl. ¶41
Asserted Claims: At least independent claim 23 is asserted Compl. ¶41
Accused Features: The Accused Products' voice commerce ecosystem, which may present advertisements and process follow-up user commands, is alleged to infringe Compl. ¶¶65, 72, 119

U.S. Patent No. 7,818,176 - "SYSTEM AND METHOD FOR SELECTING AND PRESENTING ADVERTISEMENTS BASED ON NATURAL LANGUAGE PROCESSING OF VOICE-BASED INPUT"

Patent Identification: U.S. Patent No. 7,818,176, "SYSTEM AND METHOD FOR SELECTING AND PRESENTING ADVERTISEMENTS BASED ON NATURAL LANGUAGE PROCESSING OF VOICE-BASED INPUT," issued October 19, 2010 (the "'176 Patent").
Technology Synopsis: This patent describes a system for presenting advertisements in response to natural language utterances. A speech recognition engine recognizes words or phrases, and a "conversational language processor" interprets them to establish a context. An advertisement is then selected and presented within that established context Compl. ¶43
Asserted Claims: At least independent claim 27 is asserted Compl. ¶43
Accused Features: The complaint alleges infringement by the Accused Products' voice commerce features, which are capable of interpreting user requests and presenting responses, including potential advertisements Compl. ¶¶65, 125

U.S. Patent No. 11,087,385 - "VOICE COMMERCE"

Patent Identification: U.S. Patent No. 11,087,385, "VOICE COMMERCE," issued August 10, 2021 (the "'385 Patent").
Technology Synopsis: The '385 Patent claims a system for voice commerce that receives a "single first user input" (a natural language utterance), searches a database, and selects a product or service to be purchased "without further user input other than the single first user input" Compl. ¶51 The system then completes the purchase after receiving a second user input for confirmation Compl. ¶51 This is positioned as an improvement over online shopping systems requiring users to manually browse websites and fill out forms Compl. ¶53
Asserted Claims: At least independent claim 16 is asserted Compl. ¶51
Accused Features: The complaint specifically targets SoundHound's "in-car voice commerce ecosystem," including its food ordering system, as infringing technology Compl. ¶¶65, 72, 131

III. The Accused Instrumentality

Product Identification

The accused instrumentalities are the "SoundHound Voice AI Systems," which include the SoundHound Houndify platform, the Voice AI platform, the SoundHound Chat AI app, and enterprise solutions for Automotive, Hospitality, and Restaurants Compl. ¶65

Functionality and Market Context

The Accused Products provide voice recognition and natural language understanding technology to customers such as Hyundai, White Castle, and Vizio Compl. ¶63 Compl. ¶76 The technology is delivered as on-device software ("Edge"), a cloud service ("Cloud-Only"), or a hybrid ("Edge+Cloud") architecture Compl. ¶66 A screenshot from Defendant's website illustrates the Edge+Cloud connectivity solution, showing a user query being processed through both on-device and cloud components to formulate a response Compl. ¶66, p. 24
The complaint alleges these systems are implemented in a variety of "Voice-Enabled Devices," including automotive infotainment systems, drive-thru kiosks, TVs, and mobile devices Compl. ¶64 Compl. ¶73 A press release screenshot describes SoundHound's demonstration of an "in-vehicle voice commerce platform" for ordering takeout food directly from a car's infotainment system Compl. ¶72, p. 26 The complaint positions these products as central to SoundHound's business and marketing efforts Compl. ¶¶63, 71, 78

IV. Analysis of Infringement Allegations

The complaint alleges that the Accused Products directly and indirectly infringe the asserted claims but does not include the evidentiary claim charts referenced as exhibits Compl. ¶89 The following analysis summarizes the infringement allegations based on the complaint's narrative descriptions of the asserted patents and the accused technology.

'681 Patent Infringement Allegations

The complaint alleges that the SoundHound Voice AI Systems infringe claim 25 of the '681 Patent by providing a cooperative conversational voice user interface Compl. ¶25 Compl. ¶89 The theory of infringement suggests that the Accused Products perform each step of the claimed method. It is alleged that the systems receive user utterances containing ambiguous words and use a "conversational speech engine" to disambiguate them Compl. ¶¶25, 34 This engine allegedly accumulates short-term knowledge by maintaining the context of a current conversation and long-term knowledge through user-specific data to identify a context, determine the user's intent, and generate an adapted response Compl. ¶¶25, 35 The complaint points to SoundHound's marketing of conversational interfaces for automotive and restaurant customers as evidence of this functionality Compl. ¶¶63, 73 A screenshot of a White Castle drive-thru kiosk prompts a user, "Say your order. I'm listening," which is presented as an example of an infringing voice input device Compl. ¶73, p. 27

'626 Patent Infringement Allegations

The infringement theory for the '626 Patent centers on the allegation that the Accused Products facilitate natural language responses using a "context stack" as claimed in claim 10 Compl. ¶¶27, 95 The complaint alleges that during a conversation, the SoundHound systems track a series of contexts from user utterances Compl. ¶27 It is alleged that the systems generate a "context stack" from these contexts in reverse chronological order and use this ordered stack to interpret subsequent user utterances Compl. ¶27 This functionality is alleged to be part of the core natural language understanding technology offered in the Accused Products, which are designed to handle multi-turn conversations Compl. ¶35 Compl. ¶65 The complaint's description of a hybrid Edge+Cloud system processing a complex, multi-part user query suggests a mechanism for tracking conversational state that Plaintiff may argue maps to the claimed context stack Compl. ¶66, p. 24

Identified Points of Contention:

Scope Questions:
- A primary point of contention for the '681 and '699 Patents may be whether Defendant's AI architecture constitutes a "conversational speech engine" as that term is used in the patents, or whether it represents a technologically distinct approach.
- For the '626 Patent, a key issue may be whether the mechanism by which the Accused Products maintain conversational context meets the specific claim limitation of a "context stack" generated in "reverse chronological order."
- For the '385 Patent, a central question will likely be whether the accused voice commerce systems "select, without further user input other than the single first user input, a product or service... to be purchased," a limitation the applicant emphasized during prosecution Compl. ¶59
Technical Questions:
- An evidentiary question will be what proof the Plaintiff can obtain through discovery to demonstrate that the internal operations of the Accused Products practice the specific steps of accumulating and using short- and long-term knowledge ('681 Patent), tracking contexts in a reverse-chronological stack ('626 Patent), and analyzing a user's "manner of speaking" ('699 Patent). The complaint's reliance on marketing materials suggests these technical details are not publicly available Compl. ¶¶63-73

V. Key Claim Terms for Construction

'681 Patent (Claim 25)

The Term: "conversational speech engine"
Context and Importance: This term appears to define the core of the claimed system. Its construction will be critical to determining whether the architecture of the Accused Products, which may be distributed across device and cloud Compl. ¶66, falls within the scope of the claims. Practitioners may focus on whether the term implies a monolithic structure or can be read more broadly on a distributed system.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification states the engine "could be implemented locally on a user device or remotely on a server," which may support a broader, non-location-specific construction Compl. ¶34 It is also described functionally as including a "conversational language processor and/or a context determination process" '681 Patent, col. 8:1-14
- Evidence for a Narrower Interpretation: The patent's figures, such as Figure 1, depict the "Conversational Language Processor" and "Voice Search Engine" as components within a single logical block 115, which could support a narrower interpretation tied to that specific architecture '681 Patent, Fig. 1

'626 Patent (Claim 10)

The Term: "generate a context stack... in reverse chronological order"
Context and Importance: This limitation defines a specific data structure and its ordering. The infringement analysis will likely turn on whether the accused system's method for managing conversational history is structurally equivalent to this "last-in, first-out" style of stack.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification describes the goal of the context determination process as an attempt to "fit a current utterance into recent contexts," which could support a functional interpretation where any method that prioritizes recent context over older context meets the limitation '626 Patent, col. 4:21-23
- Evidence for a Narrower Interpretation: The claim language recites a specific structure ("stack") and order ("reverse chronological"). A defendant may argue this requires a literal LIFO data structure, and any other method of weighting or prioritizing contexts, even if it favors recent ones, would not meet this limitation. The patent does not appear to provide an explicit definition beyond the plain meaning of the words.

VI. Other Allegations

Indirect Infringement

The complaint alleges both induced and contributory infringement for all eight patents Compl. ¶¶90-91 Compl. ¶¶96-97 Compl. ¶¶102-103 Compl. ¶¶108-109 Compl. ¶¶114-115 Compl. ¶¶120-121 Compl. ¶¶126-127 Compl. ¶¶132-133 The allegations are based on SoundHound designing, marketing, and supplying its Voice AI Systems to customers with instructions, demonstrations, and support, allegedly knowing and intending for those customers to use the systems in an infringing manner Compl. ¶¶78-85 The complaint further alleges the Accused Products are a material part of the inventions and are not staple articles suitable for substantial non-infringing use Compl. ¶86

Willful Infringement

The complaint alleges that SoundHound had knowledge of the patents at least as of a November 13, 2024 notice letter, as well as from the filing of the original complaint and subsequent amended complaints Compl. ¶87 This alleged knowledge forms the basis for a claim of ongoing willful infringement.

VII. Analyst's Conclusion: Key Questions for the Case

A core issue will be one of definitional scope and equivalence: Can the term "context stack... in reverse chronological order" ('626 Patent) be construed to cover the methods used by SoundHound's modern AI to manage conversational state, or is there a fundamental mismatch in the claimed data structure versus the accused implementation? Similarly, does SoundHound's distributed architecture constitute the claimed "conversational speech engine" ('681 Patent)?
A second central question is one of functional operation: For the "voice commerce" patents, does the accused food-ordering system actually operate by selecting a product based on a "single first user input" as required by the '385 Patent, or does it involve a multi-step dialogue that falls outside the claim scope distinguished during prosecution?
A key evidentiary question will be one of technical proof: The complaint relies heavily on marketing materials to allege infringement. A decisive factor will be whether discovery reveals that the internal workings of the SoundHound Voice AI Systems in fact perform the specific, granular steps recited in the asserted claims, such as analyzing a user's "manner of speaking" ('699 Patent) or processing multi-modal inputs to generate short-term knowledge ('249 Patent).