Response to sounds in an environment based on correlated audio and user events

US12307012B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12307012-B2
Application numberUS-202218047494-A
CountryUS
Kind codeB2
Filing dateOct 18, 2022
Priority dateOct 27, 2021
Publication dateMay 20, 2025
Grant dateMay 20, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed systems and method correlates user behaviors with audio processing to achieve more accurate conclusions about sounds in a user's environment. These conclusions may, in turn, be used to adjust the way a device, such as AR glasses, operate or respond to the sounds. For example, audio events determined from processing speech can be correlated with behavior events determined by sensing a user to improve a speech-to-text transcript of the speech by separating, or otherwise altering, the text in the transcript by speaker.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: capturing audio of a conversation; analyzing the audio to detect a speaker change event with a first confidence level; analyzing a feature of a user to detect a behavior event; detecting a correlation between the speaker change event and the behavior event using a machine learning model; adjusting the first confidence level of the speaker change event to a second confidence level based on the correlation; and generating a change in a speech-to-text transcript of the conversation based on the second confidence level of the speaker change event. 2. The method according to claim 1 , wherein capturing the audio is by a microphone array of AR glasses worn by the user. 3. The method according to claim 1 , wherein the speaker change event corresponds to when a first person in the conversation has stopped speaking and a second person in the conversation has started speaking. 4. The method according to claim 1 , wherein analyzing the audio includes speaker localization, based on a sensitivity of a microphone array. 5. The method according to claim 1 , further comprising sensing the user to measure the feature of the user using an inertial measurement unit of AR glasses worn by the user. 6. The method according to claim 1 , further comprising sensing the user to measure the feature of the user using an eye tracker of AR glasses worn by the user. 7. The method according to claim 1 , further comprising sensing the user to measure the feature of the user using a galvanic skin response sensor of a device worn on a wrist of the user. 8. The method according to claim 1 , further comprising sensing the user to measure the feature of the user using a photoplethysmography sensor of a device worn on a wrist of the user. 9. The method according to claim 1 , wherein the feature of the user is included in a plurality of features of the user, the plurality of features including a position of a head of the user. 10. The method according to claim 9 , wherein the plurality of features of the user include a gaze of the user. 11. The method according to claim 10 , wherein the behavior event is a change in attention of the user. 12. The method according to claim 1 , wherein the feature of the user is included in a plurality of features of the user, the plurality of features including a pupil size of one eye, or both eyes, of the user. 13. The method according to claim 12 , wherein the behavior event is a change in cognitive load of the user. 14. The method according to claim 1 , wherein the feature of the user is included in a plurality of features of the user the plurality of features including a skin conductance of the user. 15. The method according to claim 14 , wherein the plurality of features of the user include a heart rate of the user. 16. The method according to claim 15 , wherein the behavior event a surprise of the user. 17. The method according to claim 1 , wherein the machine learning model is a neural network. 18. The method according to claim 1 , wherein the machine learning model is a state vector machine (SVM) or a random decision forest. 19. The method according to claim 3 , wherein the change includes: inserting a line break, changing a color, or adding tag at the speaker change event in the speech-to-text transcript to separate speakers of the conversation. 20. An augmented reality device comprising: a microphone array configured to capture audio of a conversation; an inertial measurement unit configured to measure a position of a head of a user; an eye tracker configured to measure a gaze of an eye of the user; a heads-up display configured to display a speech-to-text transcript of the conversation to the user; and a processor configured by software instructions to: analyze the audio to detect a speaker change event with a first confidence level; analyze the position of the head of the user and the gaze of the eye of the user to detect a behavior event; detect a correlation between the speaker change event and the behavior event using a machine learning model; adjust the first confidence level of the speaker change event to a second confidence level based on the correlation; and generate a change in the speech-to-text transcript based on the second confidence level of the speaker change event. 21. The augmented reality device according to claim 20 , wherein the change includes: inserting a line break, changing a color, or adding tag at the speaker change event in the speech-to-text transcript to separate speakers of the conversation. 22. The augmented reality device according to claim 20 , wherein the processor is further configured by the software instructions to: adjust a beam forming of the microphone array of the augmented reality device based on the second confidence level of the speaker change event. 23. The augmented reality device according to claim 20 , wherein the processor is further configured by the software instructions to: adjust a noise cancellation of the augmented reality device based on the second confidence level of the speaker change event.

Assignees

Inventors

Classifications

  • Management of the audio stream, e.g. setting of volume, audio stream path · CPC title

  • Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title

  • Head tracking input arrangements · CPC title

  • Eyeglass type (eyeglass details G02C) · CPC title

  • Head mounted · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12307012B2 cover?
The disclosed systems and method correlates user behaviors with audio processing to achieve more accurate conclusions about sounds in a user's environment. These conclusions may, in turn, be used to adjust the way a device, such as AR glasses, operate or respond to the sounds. For example, audio events determined from processing speech can be correlated with behavior events determined by sensin…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/015. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).