Speech processing optimizations based on microphone array

US12387727B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12387727-B1
Application numberUS-202418439412-A
CountryUS
Kind codeB1
Filing dateFeb 12, 2024
Priority dateMar 21, 2018
Publication dateAug 12, 2025
Grant dateAug 12, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving first input data from a first device having a first device location of a building; determining first contextual data indicating one or more first physical objects disposed in association with the first device location; inputting, into a first model, the first input data and the first contextual data; generating, using the first model and based at least in part on the first input data and the first contextual data, first output data; determining that the first device is in a second device location of the building, wherein the second device location is associated with second contextual data indicating one or more second physical objects that differ from the one or more first physical objects; generating, using the first model and based at least in part on second input data and the second contextual data, second output data that differs at least in part from the first output data, the second output data indicating an action to be performed by the first device responsive to a user query, wherein the action includes controlling a playback parameter of an media output of the first device associated with the user query; and sending, based at least in part on generating the second output data, a command to the first device that causes the first device to perform the action. 2. The method of claim 1 , wherein generating the first output data comprises generating text data based at least in part on the first input data and the first contextual data. 3. The method of claim 1 , wherein the first contextual data indicates a distance between the first device and a second device of the building. 4. The method of claim 3 , wherein generating the first output data using the first model comprises generating the first output data using the first model with the distance as an input to the first model. 5. The method of claim 1 , wherein the first input data indicates the first device location with respect to a source of the first contextual data. 6. The method of claim 1 , further comprising: identifying additional input data; generating third contextual data based at least in part on the additional input data; and wherein generating the second output data comprises generating the second output data with the third contextual data as input to the first model. 7. The method of claim 1 , further comprising: determining an environmental context associated with at least a portion of the first input data; and wherein generating the first output data comprises generating the first output data with the environmental context as input to the first model. 8. The method of claim 1 , further comprising: determining that a distance between the first device and a second device in the building satisfies a threshold distance; and wherein generating the first output data comprises generating the first output data based at least in part on the distance satisfying the threshold distance. 9. The method of claim 1 , further comprising determining characteristics of the second device location, wherein the second contextual data is based at least in part on the characteristics of the second device location. 10. The method of claim 1 , wherein the playback parameter includes one or more of mute, volume control, or play. 11. A system comprising: one or more processors; and non-transitory computer-readable media storing instructions that, when executed by on the one or more processors, cause the one or more processors to perform operations comprising: receiving first input data from a first device having a first device location of a building; determining first contextual data indicating one or more first physical objects disposed in association with the first device location; inputting, into a first model, the first input data and the first contextual data; generating, using the first model and based at least in part on the first input data and the first contextual data, first output data; determining that the first device is in a second device location of the building, wherein the second device location is associated with second contextual data indicating one or more second physical objects that differ from the one or more first physical objects; generating, using the first model and based at least in part on second input data and the second contextual data, second output data that differs at least in part from the first output data, the second output data indicating an action to be performed by the first device responsive to a user query, wherein the action includes instructions controlling a playback parameter of a media output of the first device that is associated with the user query; and sending, based at least in part on generating the second output data, a command to the first device that causes the first device to perform the action. 12. The system of claim 11 , wherein generating the first output data comprises generating text data based at least in part on the first input data and the first contextual data. 13. The system of claim 11 , wherein the first contextual data indicates a distance between the first device and a second device of the building. 14. The system of claim 13 , wherein generating the first output data using the first model comprises generating the first output data using the first model with the distance as an input to the first model. 15. The system of claim 11 , wherein the first input data indicates the first device location with respect to a source of the first contextual data. 16. The system of claim 11 , the operations further comprising: identifying additional input data; generating third contextual data based at least in part on the additional input data; and wherein generating the second output data comprises generating the second output data with the third contextual data as input to the first model. 17. The system of claim 11 , the operations further comprising: determining an environmental context associated with at least a portion of the first input data; and wherein generating the first output data comprises generating the first output data with the environmental context as input to the first model. 18. The system of claim 11 , the operations further comprising: determining that a distance between the first device and a second device in the building satisfies a threshold distance; and wherein generating the first output data comprises generating the first output data based at least in part on the distance satisfying the threshold distance. 19. The system of claim 11 , the operations further comprising determining characteristics of the second device location, wherein the second contextual data is based at least in part on the characteristics of the second device location. 20. The system of claim 11 , wherein the playback parameter includes one or more of mute, volume control, or play.

Assignees

Inventors

Classifications

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Noise filtering · CPC title

  • Word spotting · CPC title

  • Execution procedure of a spoken command · CPC title

  • Microphone arrays; Beamforming · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12387727B1 cover?
Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).