System and method for enabling real-time captioning for the hearing impaired via augmented reality
US-10878819-B1 · Dec 29, 2020 · US
US12136433B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12136433-B2 |
| Application number | US-202016885606-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 28, 2020 |
| Priority date | May 28, 2020 |
| Publication date | Nov 5, 2024 |
| Grant date | Nov 5, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An eyewear device that performs diarization by segmenting spoken language into different speakers and remembering each speaker over the course of a session. The speech of each speaker is translated to text and the text of each speaker is displayed on an eyewear display. The text of each user has a different attribute such that the eyewear user can distinguish the text of different speakers. Examples of the text attribute can be a text color, font, and font size. The text is displayed on the eyewear display such that it does not substantially obstruct the user's vision.
Opening claim text (preview).
What is claimed is: 1. Eyewear, comprising: a frame; a display supported by the frame; a microphone coupled to the frame; and a camera configured to generate an image including an object; and an electronic processor configured to: receive speech from a plurality of human speakers via the microphone; identify the plurality of human speakers; perform diarization on the received speech to segment spoken language into different speakers; display text associated with each speaker on the display; display a user created graphical depiction of a person associated with and indicative of the identified speaker proximate the text of the associated speaker such that an eyewear user can visually associate the text to the respective speaker; process pitch and intonation of the received speech; establish a color for received speech based on the pitch and intonation; display the text in the established color based on the pitch and intonation; adjust font size of the text by increasing a font attribute based on a decibel level of the received speech above a first threshold and decreasing the font attribute based on a decibel level of the received speech below a second threshold; determine the object in the image; and generate speech indicative of the object responsive to a speech command. 2. The eyewear of claim 1 , wherein the processor is configured to use a convolutional neural network (CNN) to perform the diarization. 3. The eyewear of claim 2 , wherein the text of each speaker has a unique color. 4. The eyewear of claim 2 , wherein the text of each speaker has a unique font size. 5. The eyewear of claim 2 , wherein the text of each speaker has a unique font style. 6. A method for use with eyewear, the eyewear having a frame, a display supported by the frame, a microphone coupled to the frame, a camera configured to generate an image including an object, and an electronic processor, the processor: receiving speech from a plurality of human speakers via the microphone; identifying the plurality of human speakers; performing diarization on the received speech to segment spoken language into different speakers; displaying text associated with each speaker on the display; and displaying a user created graphical depiction of a person associated with and indicative of the identified speaker proximate the text of the associated speaker such that an eyewear user can visually associate the text to the respective speaker; processing pitch and intonation of the received speech; establishing a color for received speech based on the pitch and intonation; displaying the text in the established color based on the pitch and intonation; adjusting font size of the text by increasing a font attribute based on a decibel level of the received speech above a first threshold and decreasing the font attribute based on a decibel level of the received speech below a second threshold; determining the object in the image; and generating speech indicative of the object responsive to a speech command. 7. The method of claim 6 , wherein the processor uses a convolutional neural network (CNN) to perform the diarization. 8. The method of claim 7 , wherein the text of each speaker has a unique color. 9. The method of claim 7 , wherein the text of each speaker has a unique font size. 10. The method of claim 7 , wherein the text of each speaker has a unique font style. 11. A non-transitory computer readable medium storing program code which, when executed by a processor of eyewear having a frame, a display supported by the frame, a microphone coupled to the frame, a camera configured to generate an image including an object, is operative to cause the processor to perform the steps of: receiving speech from a plurality of human speakers via the microphone; identifying the plurality of human speakers; performing diarization on the received speech to segment spoken language into different speakers; displaying text associated with each speaker on the display; and displaying a user created graphical depiction of a person associated with and indicative of the identified speaker proximate the text of the associated speaker such that an eyewear user can visually associate the text to the respective speaker; processing pitch and intonation of the received speech; establishing a color for received speech based on the text, pitch; and intonation; displaying the text in the established color based on the text, pitch, and intonation; adjusting font size of the text by increasing a font attribute based on a decibel level of the received speech above a first threshold and decreasing the font attribute based on a decibel level of the received speech below a second threshold; determining the object in the image; and generating speech indicative of the object responsive to a speech command. 12. The non-transitory computer readable medium as specified in claim 11 , wherein the program code, when executed, is operative to cause the processor to use a convolutional neural network (CNN) to perform the diarization. 13. The non-transitory computer readable medium as specified in claim 12 , wherein the text of each speaker has a unique color. 14. The non-transitory computer readable medium as specified in claim 12 , wherein the text of each speaker has a unique font size or a font type.
Artificial neural networks; Connectionist approaches · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Speaker identification or verification techniques · CPC title
Aspects of interface with display user · CPC title
with means for controlling the display position {(see provisionally G09G5/42)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.