Speech recognition method and apparatus
US-2018174580-A1 · Jun 21, 2018 · US
US12387727B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-12387727-B1 |
| Application number | US-202418439412-A |
| Country | US |
| Kind code | B1 |
| Filing date | Feb 12, 2024 |
| Priority date | Mar 21, 2018 |
| Publication date | Aug 12, 2025 |
| Grant date | Aug 12, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving first input data from a first device having a first device location of a building; determining first contextual data indicating one or more first physical objects disposed in association with the first device location; inputting, into a first model, the first input data and the first contextual data; generating, using the first model and based at least in part on the first input data and the first contextual data, first output data; determining that the first device is in a second device location of the building, wherein the second device location is associated with second contextual data indicating one or more second physical objects that differ from the one or more first physical objects; generating, using the first model and based at least in part on second input data and the second contextual data, second output data that differs at least in part from the first output data, the second output data indicating an action to be performed by the first device responsive to a user query, wherein the action includes controlling a playback parameter of an media output of the first device associated with the user query; and sending, based at least in part on generating the second output data, a command to the first device that causes the first device to perform the action. 2. The method of claim 1 , wherein generating the first output data comprises generating text data based at least in part on the first input data and the first contextual data. 3. The method of claim 1 , wherein the first contextual data indicates a distance between the first device and a second device of the building. 4. The method of claim 3 , wherein generating the first output data using the first model comprises generating the first output data using the first model with the distance as an input to the first model. 5. The method of claim 1 , wherein the first input data indicates the first device location with respect to a source of the first contextual data. 6. The method of claim 1 , further comprising: identifying additional input data; generating third contextual data based at least in part on the additional input data; and wherein generating the second output data comprises generating the second output data with the third contextual data as input to the first model. 7. The method of claim 1 , further comprising: determining an environmental context associated with at least a portion of the first input data; and wherein generating the first output data comprises generating the first output data with the environmental context as input to the first model. 8. The method of claim 1 , further comprising: determining that a distance between the first device and a second device in the building satisfies a threshold distance; and wherein generating the first output data comprises generating the first output data based at least in part on the distance satisfying the threshold distance. 9. The method of claim 1 , further comprising determining characteristics of the second device location, wherein the second contextual data is based at least in part on the characteristics of the second device location. 10. The method of claim 1 , wherein the playback parameter includes one or more of mute, volume control, or play. 11. A system comprising: one or more processors; and non-transitory computer-readable media storing instructions that, when executed by on the one or more processors, cause the one or more processors to perform operations comprising: receiving first input data from a first device having a first device location of a building; determining first contextual data indicating one or more first physical objects disposed in association with the first device location; inputting, into a first model, the first input data and the first contextual data; generating, using the first model and based at least in part on the first input data and the first contextual data, first output data; determining that the first device is in a second device location of the building, wherein the second device location is associated with second contextual data indicating one or more second physical objects that differ from the one or more first physical objects; generating, using the first model and based at least in part on second input data and the second contextual data, second output data that differs at least in part from the first output data, the second output data indicating an action to be performed by the first device responsive to a user query, wherein the action includes instructions controlling a playback parameter of a media output of the first device that is associated with the user query; and sending, based at least in part on generating the second output data, a command to the first device that causes the first device to perform the action. 12. The system of claim 11 , wherein generating the first output data comprises generating text data based at least in part on the first input data and the first contextual data. 13. The system of claim 11 , wherein the first contextual data indicates a distance between the first device and a second device of the building. 14. The system of claim 13 , wherein generating the first output data using the first model comprises generating the first output data using the first model with the distance as an input to the first model. 15. The system of claim 11 , wherein the first input data indicates the first device location with respect to a source of the first contextual data. 16. The system of claim 11 , the operations further comprising: identifying additional input data; generating third contextual data based at least in part on the additional input data; and wherein generating the second output data comprises generating the second output data with the third contextual data as input to the first model. 17. The system of claim 11 , the operations further comprising: determining an environmental context associated with at least a portion of the first input data; and wherein generating the first output data comprises generating the first output data with the environmental context as input to the first model. 18. The system of claim 11 , the operations further comprising: determining that a distance between the first device and a second device in the building satisfies a threshold distance; and wherein generating the first output data comprises generating the first output data based at least in part on the distance satisfying the threshold distance. 19. The system of claim 11 , the operations further comprising determining characteristics of the second device location, wherein the second contextual data is based at least in part on the characteristics of the second device location. 20. The system of claim 11 , wherein the playback parameter includes one or more of mute, volume control, or play.
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Noise filtering · CPC title
Word spotting · CPC title
Execution procedure of a spoken command · CPC title
Microphone arrays; Beamforming · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.