Eyewear including diarization

US12136433B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12136433-B2
Application numberUS-202016885606-A
CountryUS
Kind codeB2
Filing dateMay 28, 2020
Priority dateMay 28, 2020
Publication dateNov 5, 2024
Grant dateNov 5, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An eyewear device that performs diarization by segmenting spoken language into different speakers and remembering each speaker over the course of a session. The speech of each speaker is translated to text and the text of each speaker is displayed on an eyewear display. The text of each user has a different attribute such that the eyewear user can distinguish the text of different speakers. Examples of the text attribute can be a text color, font, and font size. The text is displayed on the eyewear display such that it does not substantially obstruct the user's vision.

First claim

Opening claim text (preview).

What is claimed is: 1. Eyewear, comprising: a frame; a display supported by the frame; a microphone coupled to the frame; and a camera configured to generate an image including an object; and an electronic processor configured to: receive speech from a plurality of human speakers via the microphone; identify the plurality of human speakers; perform diarization on the received speech to segment spoken language into different speakers; display text associated with each speaker on the display; display a user created graphical depiction of a person associated with and indicative of the identified speaker proximate the text of the associated speaker such that an eyewear user can visually associate the text to the respective speaker; process pitch and intonation of the received speech; establish a color for received speech based on the pitch and intonation; display the text in the established color based on the pitch and intonation; adjust font size of the text by increasing a font attribute based on a decibel level of the received speech above a first threshold and decreasing the font attribute based on a decibel level of the received speech below a second threshold; determine the object in the image; and generate speech indicative of the object responsive to a speech command. 2. The eyewear of claim 1 , wherein the processor is configured to use a convolutional neural network (CNN) to perform the diarization. 3. The eyewear of claim 2 , wherein the text of each speaker has a unique color. 4. The eyewear of claim 2 , wherein the text of each speaker has a unique font size. 5. The eyewear of claim 2 , wherein the text of each speaker has a unique font style. 6. A method for use with eyewear, the eyewear having a frame, a display supported by the frame, a microphone coupled to the frame, a camera configured to generate an image including an object, and an electronic processor, the processor: receiving speech from a plurality of human speakers via the microphone; identifying the plurality of human speakers; performing diarization on the received speech to segment spoken language into different speakers; displaying text associated with each speaker on the display; and displaying a user created graphical depiction of a person associated with and indicative of the identified speaker proximate the text of the associated speaker such that an eyewear user can visually associate the text to the respective speaker; processing pitch and intonation of the received speech; establishing a color for received speech based on the pitch and intonation; displaying the text in the established color based on the pitch and intonation; adjusting font size of the text by increasing a font attribute based on a decibel level of the received speech above a first threshold and decreasing the font attribute based on a decibel level of the received speech below a second threshold; determining the object in the image; and generating speech indicative of the object responsive to a speech command. 7. The method of claim 6 , wherein the processor uses a convolutional neural network (CNN) to perform the diarization. 8. The method of claim 7 , wherein the text of each speaker has a unique color. 9. The method of claim 7 , wherein the text of each speaker has a unique font size. 10. The method of claim 7 , wherein the text of each speaker has a unique font style. 11. A non-transitory computer readable medium storing program code which, when executed by a processor of eyewear having a frame, a display supported by the frame, a microphone coupled to the frame, a camera configured to generate an image including an object, is operative to cause the processor to perform the steps of: receiving speech from a plurality of human speakers via the microphone; identifying the plurality of human speakers; performing diarization on the received speech to segment spoken language into different speakers; displaying text associated with each speaker on the display; and displaying a user created graphical depiction of a person associated with and indicative of the identified speaker proximate the text of the associated speaker such that an eyewear user can visually associate the text to the respective speaker; processing pitch and intonation of the received speech; establishing a color for received speech based on the text, pitch; and intonation; displaying the text in the established color based on the text, pitch, and intonation; adjusting font size of the text by increasing a font attribute based on a decibel level of the received speech above a first threshold and decreasing the font attribute based on a decibel level of the received speech below a second threshold; determining the object in the image; and generating speech indicative of the object responsive to a speech command. 12. The non-transitory computer readable medium as specified in claim 11 , wherein the program code, when executed, is operative to cause the processor to use a convolutional neural network (CNN) to perform the diarization. 13. The non-transitory computer readable medium as specified in claim 12 , wherein the text of each speaker has a unique color. 14. The non-transitory computer readable medium as specified in claim 12 , wherein the text of each speaker has a unique font size or a font type.

Assignees

Inventors

Classifications

  • Artificial neural networks; Connectionist approaches · CPC title

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • Speaker identification or verification techniques · CPC title

  • Aspects of interface with display user · CPC title

  • with means for controlling the display position {(see provisionally G09G5/42)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12136433B2 cover?
An eyewear device that performs diarization by segmenting spoken language into different speakers and remembering each speaker over the course of a session. The speech of each speaker is translated to text and the text of each speaker is displayed on an eyewear display. The text of each user has a different attribute such that the eyewear user can distinguish the text of different speakers. Exa…
Who is the assignee on this patent?
Geddes Jonathan, Pounds Jennica, Pruden Ryan, and 3 more
What technology area does this patent fall under?
Primary CPC classification G10L21/0272. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 05 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).