DCT

1:24-cv-00839

VB Assets LLC v. Amazon.com Services LLC

Key Events

Complaint

complaint Intelligence

I. Executive Summary and Procedural Information

Parties & Counsel:
- Plaintiff: VB Assets, LLC (Delaware)
- Defendant: Amazon.com Services, LLC (Delaware)
- Plaintiff's Counsel: Smith, Katzenstein & Jenkins LLP
Case Identification: 1:24-cv-00839, D. Del., 07/18/2024
Venue Allegations: Venue is alleged to be proper as Defendant is incorporated in Delaware and therefore resides in the district. The complaint also alleges Defendant derives substantial revenue from sales of its accused Alexa Products in Delaware.
Core Dispute: Plaintiff alleges that Defendant's Alexa-branded natural language systems infringe five U.S. patents related to cooperative conversational interfaces, voice-enabled commerce, and targeted voice advertising.
Technical Context: The technology at issue pertains to natural language understanding (NLU) systems that use short-term and long-term conversational context to enable more fluid, human-like voice interactions and to facilitate voice-driven commercial activities.
Key Procedural History: The complaint states that this lawsuit follows a November 2023 jury verdict in a prior case where Defendant was found to have willfully infringed four related patents owned by Plaintiff. That verdict resulted in an ongoing royalty award which, at the time of the trial, equated to $46.7 million. This complaint asserts five additional patents from the same portfolio against Defendant's ongoing activities.

Case Timeline

Date	Event
2001-01-01	VoiceBox founded
2005-10-18	U.S. Patent Nos. 10,297,249 & 10,755,699 Priority Date
2007-02-06	U.S. Patent No. 11,080,758 Priority Date
2009-11-10	U.S. Patent No. 9,502,025 Priority Date
2011-10-07	VoiceBox teleconference with Amazon
2011-10-19	Meeting at Amazon's offices
2011-10-26	Meeting at VoiceBox's office
2014-09-16	U.S. Patent No. 11,087,385 Priority Date
2014-11-01	Amazon launches Alexa and Echo products
2016-11-22	U.S. Patent No. 9,502,025 Issue Date
2017-02-02	Meeting where Amazon was allegedly informed of the '249 patent application
2018-01-04	'025 Patent cited during prosecution of an Amazon-assigned patent
2019-05-21	U.S. Patent No. 10,297,249 Issue Date
2019-07-29	First VoiceBox lawsuit filed against Amazon
2020-08-25	U.S. Patent No. 10,755,699 Issue Date
2021-08-03	U.S. Patent No. 11,080,758 Issue Date
2021-08-10	U.S. Patent No. 11,087,385 Issue Date
2023-11-08	Jury finds Amazon willfully infringed in first lawsuit
2024-05-01	VoiceBox sends notice letter to Amazon regarding patents-in-suit
2024-07-18	Complaint Filing Date

II. Technology and Patent(s)-in-Suit Analysis

U.S. Patent No. 10,297,249 - System and Method for a Cooperative Conversational Voice User Interface

The Invention Explained

Problem Addressed: The patent's background, as summarized in the complaint, identifies the rigidity of prior "Command and Control" voice user interfaces, which required users to memorize exact words and phrases and navigate restrictive menus, creating an unnatural and time-consuming experience Compl. ¶¶47-48
The Patented Solution: The invention claims a method for a more natural voice interface that uses a "conversational speech engine" to process user utterances Compl. ¶49 The system generates "short-term knowledge" by integrating information from both voice inputs and "multi-modal" non-voice inputs (e.g., touchscreen interactions) within a single conversation Compl. ¶42 Compl. ¶52 This combined knowledge is used to determine the user's context and generate an appropriate response '249 Patent, abstract The patent also describes a method for improving accuracy by receiving the same utterance via two different input devices, comparing the inputs, and filtering sound based on that comparison '249 Patent, col. 4:43-52
Technical Importance: This technology represented a move away from simple command recognition toward context-aware conversational systems, improving the functionality and user experience of voice interfaces Compl. ¶51

Key Claims at a Glance

The complaint asserts independent claim 1 and dependent claims 2-10 and 12-15 Compl. ¶97
Independent claim 1 is a method claim with the following essential elements:
- Receiving a first natural language utterance via a first voice input device and a second voice input device during a conversation.
- Comparing the first and second voice inputs.
- Filtering sound from both voice inputs based on the comparison.
- Obtaining a user interface state from at least one non-voice input associated with the first voice input.
- Generating "short-term knowledge" based on at least the first voice input and the first non-voice input.
- Determining a "first context" based on the short-term knowledge.
- Determining an interpretation of the utterance based on the context.
- Generating a response based on the interpretation.
The complaint reserves the right to assert other claims, including dependent claims Compl. ¶97

U.S. Patent No. 10,755,699 - System and Method for a Cooperative Conversational Voice User Interface

The Invention Explained

Problem Addressed: The patent addresses the same core problem as the '249 Patent: the unnatural and restrictive nature of prior "Command and Control" voice systems Compl. ¶¶47-48
The Patented Solution: The invention describes a system that generates responses adapted to a user's "manner of speaking" Compl. ¶45 It does this by accumulating and using two types of information: "short-term knowledge" from the current conversation and "long-term knowledge" from prior conversations '699 Patent, abstract Compl. ¶50 Based on this combined knowledge, the system identifies the "manner" in which the utterance was spoken-such as its tone, pace, or inflection-and generates a response based on both the words and that identified manner '699 Patent, col. 6:46-67 Compl. ¶57
Technical Importance: This approach allows a voice system to personalize responses based not just on what a user says, but how they say it, thereby improving the natural flow and accuracy of the interaction Compl. ¶45 Compl. ¶56

Key Claims at a Glance

The complaint asserts independent claims 1 and 11 and dependent claims 3-8 Compl. ¶105
Independent claim 1 is a method claim with the following essential elements:
- Receiving a user input comprising a natural language utterance.
- Recognizing words or phrases from the utterance.
- Identifying a context for the utterance based on the recognized words.
- Determining an interpretation of the utterance based on the context.
- Accumulating "short-term knowledge" from utterances within a single conversation.
- Accumulating "long-term knowledge" from utterances prior to the current conversation period.
- Identifying a "manner in which the natural language utterance was spoken" based on both the short-term and long-term knowledge.
- Generating a response based on the interpretation and the identified manner.
The complaint asserts dependent claims Compl. ¶105

U.S. Patent No. 11,087,385 - Voice Commerce

Technology Synopsis: The patent addresses difficulties in online shopping, particularly on mobile devices, which require extensive searching and form-filling Compl. ¶¶62-63 The claimed solution is a voice commerce system that can identify a product for purchase from a "single first user input" without further user input, receive a second user input for confirmation, and then complete the purchase transaction automatically Compl. ¶60 Compl. ¶67
Asserted Claims: Independent claims 1 and 31, and dependent claims 2-5 and 11-15 Compl. ¶113
Accused Features: The complaint alleges that Amazon's Alexa Products embody a voice commerce system that allows users to initiate and complete purchases via voice commands Compl. ¶113

U.S. Patent No. 11,080,758 - System and Method for Delivering Targeted Advertisements and/or Providing Natural Language Processing Based on Advertisements

Technology Synopsis: This patent seeks to solve the problem of prior voice interfaces lacking a mechanism for dialogue or for providing relevant commercial information to users Compl. ¶¶78-79 The invention is a system that processes a natural language utterance to determine a context, selects a "purchase opportunity" based on that context, tracks the user's interaction with that opportunity (including any resulting transaction), and uses this interaction data to build or update a user profile for selecting subsequent purchase opportunities Compl. ¶75
Asserted Claims: Independent claims 1 and 18, and dependent claims 2-7, 9-12, and 19-22 Compl. ¶121
Accused Features: The complaint alleges that Amazon's Alexa Products use natural language utterances to select and deliver targeted advertisements and purchase opportunities to users Compl. ¶121

U.S. Patent No. 9,502,025 - System and Method for Providing a Natural Language Content Dedication Service

Technology Synopsis: The patent addresses the limitations of siloed, "Command and Control" voice systems that prevent users from easily interacting with content across different devices Compl. ¶¶90-91 The solution is a "content dedication service" where a user can, with a first utterance, identify content to dedicate to a recipient. The system then receives a second utterance to be associated with the dedication and sends information to the recipient enabling access to both the content and the second utterance, which is provided as a textual annotation in the content's metadata Compl. ¶88
Asserted Claims: Independent claim 8 and dependent claims 14-19 Compl. ¶129
Accused Features: The complaint alleges that Amazon's Alexa Products provide a service for dedicating content to other users via natural language commands Compl. ¶129

III. The Accused Instrumentality

Product Identification

The accused instrumentalities are collectively referred to as "Alexa Products" Compl. ¶11, fn. 2 This includes the Alexa voice assistant software, associated hardware like Echo smart speakers and smart displays, Alexa mobile applications, Fire TV devices, Amazon smart glasses, and third-party devices that integrate the Alexa platform Compl. ¶11, fn. 2

Functionality and Market Context

The complaint alleges that the Alexa platform is a natural language system designed for "improved conversation capabilities based on personalization and context" Compl. ¶32 This functionality is alleged to rely on carrying over context from previous user interactions to inform current responses Compl. ¶32 An Amazon promotional document included in the complaint describes the next generation of Alexa as being able to "deliver unique experiences based on the preferences you've shared" and to "carr[y] over relevant context throughout conversations" Compl. p. 14 The complaint alleges that Amazon developed this technology after meetings in 2011 where Plaintiff's predecessor, VoiceBox, disclosed its own patented NLU technology Compl. ¶¶4-9 Compl. ¶18

IV. Analysis of Infringement Allegations

U.S. Patent No. 10,297,249 - Infringement Allegations

Claim Element (from Independent Claim 1)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
receive, during a first conversation, a first voice input via a first input device, the first voice input comprising a first natural language utterance;	An Alexa device, such as an Echo Show, receives a voice command from a user.	¶97; Ex. F, p. 3	col. 2:25-27
receive a second voice input comprising the first natural language utterance via a second input device;	A second, proximally located Alexa device, such as a Fire TV, also receives the same voice command.	¶97; Ex. F, p. 6	col. 4:43-49
compare the first voice input with the second voice input;	The Alexa system's Echo Spatial Perception (ESP) feature compares inputs from multiple devices to determine which is closest to the user.	¶97; Ex. F, p. 7	col. 4:49-52
filter sound from the first voice input and the second voice input based on the comparison;	Based on the ESP comparison, the Alexa system filters the sound by selecting one device to respond and instructing others to ignore the input. An audio history log shows other devices logged the audio as "not intended for this device."	¶97; Ex. F, p. 9	col. 4:49-52
obtain, during the first conversation, a user interface state related to one or more non-voice inputs associated with the first voice input...	Following a voice search, a user employs a non-voice input (e.g., a FireTV remote) to select a video from a list of recommendations.	¶97; Ex. F, p. 11	col. 3:56-61
generate the short-term knowledge based on at least the first voice input and the first non-voice input;	Alexa's "Context memory" generates and maintains short-term knowledge, tracking user utterances and other interaction results (like non-voice selections) throughout a session.	¶97; Ex. F, p. 13	col. 5:6-12
determine, based on the short-term knowledge, a first context for the first natural language utterance;	Based on the short-term knowledge of videos displayed on-screen from a prior search, Alexa determines the context for a subsequent command like "Play this."	¶97; Ex. F, p. 16	col. 4:11-13
determine, based on the first context, an interpretation of the first natural language utterance; and	Within the established context, Alexa interprets the utterance "play this" to mean the user wants to play the specific video selected via the non-voice input.	¶97; Ex. F, p. 19	col. 3:8-10
generate, based on the interpretation...a first response to the first natural language utterance.	Alexa generates a response by playing the selected video and providing immediate verbal feedback.	¶97; Ex. F, p. 20	col. 2:40-42

Identified Points of Contention

Scope Questions: A potential point of dispute may be the "filtering sound" element. The court may need to determine whether Amazon's ESP feature, which selects a single device to respond and causes others to disregard the command, constitutes "filter[ing] sound from the first voice input and the second voice input" as required by the claim, or if it is merely a device-selection mechanism that discards one input entirely rather than filtering from both.
Technical Questions: The analysis may turn on the precise mechanism by which "short-term knowledge" is generated. The question will be whether the complaint provides sufficient evidence that Alexa's "Context memory" is "generated based on at least the first voice input and the first non-voice input," suggesting an integration of the two modalities, rather than merely storing data from both independently within the same session.

U.S. Patent No. 10,755,699 - Infringement Allegations

Claim Element (from Independent Claim 1)	Alleged Infringing Functionality	Complaint Citation	Patent Citation
receive a user input comprising a natural language utterance;	An Alexa-enabled device receives a user's spoken command.	¶105; Ex. G, p. 5	col. 2:50-52
recognize one or more words or phrases from the natural language utterance;	An Automatic Speech Recognition (ASR) engine processes the utterance to identify words.	¶105; Ex. G, p. 5	col. 3:35-39
identify a context...based on the one or more words or phrases...;	Alexa uses its "context embedding service" to determine the context of the utterance based on the recognized words.	¶105; Ex. G, p. 6	col. 4:11-13
determine an interpretation of the natural language utterance based on the identified context;	Based on the context, Alexa interprets the user's intent, as shown in a graphic depicting an adaptive conversational flow.	¶105; Ex. G, p. 9	col. 3:8-10
accumulate short-term knowledge...related to a single conversation...;	The system accumulates short-term knowledge within a single session, as illustrated by a "Review Voice History" screenshot showing a sequence of related queries about a song.	¶105; Ex. G, p. 11	col. 5:5-24
accumulate long-term knowledge...received prior to the predetermined time period;	The system uses Alexa Profiles and Voice ID, which are based on a user's accumulated history of utterances and listening preferences over time.	¶105; Ex. G, p. 16	col. 5:25-41
identify a manner in which the natural language utterance was spoken based on the short-term knowledge and the long-term knowledge; and	Alexa allegedly identifies the speaker via Voice ID (long-term knowledge) and can also identify whether the utterance was spoken in a whisper (short-term manner).	¶105; Ex. G, p. 18	col. 6:46-52
generate a response...based on the interpretation and the identified manner...	The system generates a response that is impacted by the identified user (from Voice ID) or by responding in a whisper if the user whispered.	¶105; Ex. G, p. 23	col. 6:53-67

Identified Points of Contention

Scope Questions: The definition of "manner in which the natural language utterance was spoken" will be a central issue. The court will have to decide if this term, which is further defined in a dependent claim to include prosodic features like "tone, pace, [or] inflection," can be construed to cover the binary state of speaker identity (via Voice ID) or whisper detection, as the complaint alleges.
Technical Questions: A key factual question will be whether Alexa's system actually uses both short-term and long-term knowledge in combination to "identify a manner" of speaking. The analysis will require evidence of a causal link showing that data from the current session (short-term) and historical user data (long-term) are jointly processed to determine a feature like tone, pace, or even speaker identity.

V. Key Claim Terms for Construction

U.S. Patent No. 10,297,249

The Term: "short-term knowledge"
Context and Importance: This term is fundamental to the patent's claimed novelty. The scope of infringement will depend on whether Amazon's session-based "Context memory" falls within its definition, particularly the requirement that it be "generated based on" both voice and non-voice inputs.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The specification describes short-term knowledge as accumulating "during a single conversation" and can include the "current user interface state" '249 Patent, col. 5:6-12, which could support a broad reading covering any data retained within a single user session.
- Evidence for a Narrower Interpretation: Claim 1 requires the knowledge to be "generated based on at least the first voice input and the first non-voice input." This "and" conjunction may support a narrower construction requiring a direct, combinatorial generation process, rather than the mere co-existence of voice and non-voice data within a session's memory.

U.S. Patent No. 10,755,699

The Term: "manner in which the natural language utterance was spoken"
Context and Importance: This term distinguishes the invention from prior art. Infringement hinges on whether Alexa's functions, such as speaker identification via Voice ID or whisper detection, constitute identifying the "manner" of speech. Practitioners may focus on this term because the complaint's infringement theory equates it with identifying the speaker, a potentially contentious interpretation.
Intrinsic Evidence for Interpretation:
- Evidence for a Broader Interpretation: The term itself is not explicitly defined in the claim and could plausibly be read to include any attribute of the speech signal beyond its textual content, including the identity of its source.
- Evidence for a Narrower Interpretation: Dependent claim 3 of the patent clarifies that the "manner" includes "at least one of tone, pace, timing, inflection, word use, and/or jargon." This may be used to argue that the scope of the independent claim is limited to these or similar prosodic and stylistic features, and does not extend to a binary state like speaker identity.

VI. Other Allegations

Indirect Infringement: The complaint alleges both induced and contributory infringement for all five patents-in-suit. The basis for inducement is that Amazon allegedly designs Alexa to operate in an infringing manner and instructs users on how to use these infringing features through its website and virtual assistant Compl. ¶98 Compl. ¶106 Compl. ¶114 Compl. ¶122 Compl. ¶130 Contributory infringement is alleged on the basis that Alexa products are especially made or adapted for infringement and are not staple articles of commerce with substantial non-infringing uses Compl. ¶99 Compl. ¶107 Compl. ¶115 Compl. ¶123 Compl. ¶131
Willful Infringement: Willfulness is alleged for all five patents. For the '249, '385, '758, and '025 patents, the complaint alleges pre-suit knowledge based on direct communications and events, including a February 2017 meeting for the '249 patent application, a May 2024 notice letter for all patents, and the citation of the '025 patent during the prosecution of an Amazon patent in 2018 Compl. ¶98 Compl. ¶100 Compl. ¶114 Compl. ¶116 Compl. ¶122 Compl. ¶124 Compl. ¶130 Compl. ¶132 The willfulness claim is further supported by the allegation that this conduct continues after a jury in a prior case found Amazon's infringement of related patents to be willful Compl. ¶31

VII. Analyst's Conclusion: Key Questions for the Case

A core issue will be one of definitional scope: can the term "manner in which the...utterance was spoken" ('699 patent), which is exemplified by prosodic features like tone and pace, be construed to encompass the identification of a speaker via a voice biometric system or the detection of a whisper mode? The outcome of this claim construction will significantly impact the infringement analysis for a key asserted patent.
A key evidentiary question will be one of technical causality: does the accused Alexa system perform the specific combinatorial steps required by the claims, such as generating "short-term knowledge" from both voice and non-voice inputs ('249 patent), or using both short-term and long-term knowledge to identify the "manner" of speech ('699 patent)? The case may turn on whether Plaintiff can prove this specific data integration occurs, as opposed to the mere parallel existence of different data types in system memory.
A central theme of the case will be the impact of prior litigation: how will the prior jury verdict of willful infringement on four related patents influence the proceedings for these five additional patents from the same family? This history raises significant questions regarding the strength of the willfulness allegations and may shape the overall legal and settlement strategy.