System and method for cognitive multilingual speech training and recognition
US-2019303797-A1 · Oct 3, 2019 · US
US10643637B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10643637-B2 |
| Application number | US-201816029491-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 6, 2018 |
| Priority date | Jul 6, 2018 |
| Publication date | May 5, 2020 |
| Grant date | May 5, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for identifying at least one characteristic of a sound-producing object includes storing, in a memory, audio data acquired from an auditory environment via at least one microphone; receiving an input indicating a user request to identify a characteristic of a sound-producing object included in the auditory environment; determining, via a processor and based on a portion of the audio data acquired from the auditory environment prior to the user request, the characteristic of the sound-producing object; and causing information corresponding to the characteristic of the sound-producing object to be output via at least one output device.
Opening claim text (preview).
What is claimed is: 1. A method for identifying a sound-producing object, the method comprising: storing, in a memory, audio data acquired from an auditory environment via at least one microphone; receiving an input indicating a user request to identify the sound-producing object included in the auditory environment; generating, via a processor and based on a first portion of the audio data acquired from the auditory environment prior to the user request, a first determination that identifies the sound-producing object, wherein the first determination has a first confidence measure; causing, when the first confidence measure is greater than a confidence threshold, information corresponding to a characteristic of the sound-producing object to be output via at least one output device; and generating, via the processor when the first confidence measure is not greater than the confidence threshold, one or more additional determinations, based on additional portions of the audio data longer than the first portion, that identify the sound-producing object until a second determination has a second confidence measure that is greater than the confidence threshold. 2. The method of claim 1 , further comprising discarding, from the memory, audio data acquired from the auditory environment that is older than a threshold duration of time. 3. The method of claim 2 , wherein the memory comprises a circular buffer. 4. The method of claim 3 , wherein storing, in the memory, audio data acquired from the auditory environment comprises storing the first portion of the audio data in a first portion of the circular buffer and a second portion of the audio data in a second portion of the circular buffer. 5. The method of claim 4 , further comprising: determining that the first confidence measure is less than the confidence threshold; and generating, via the processor and based on both the first portion of the audio data and the second portion of the audio data, a second determination that identifies the sound-producing object, wherein the second determination has a second confidence measure; and causing, when the second confidence measure is greater than the confidence threshold, information corresponding to the characteristic of the sound-producing object to be output via the at least one output device. 6. The method of claim 5 , wherein the first portion of the audio data corresponds to a first time interval that occurs closer to receiving the input indicating the user request than a second time interval that corresponds to the second portion of the audio data. 7. The method of claim 1 , wherein the input indicating the user request comprises one of: a physical input to a touch-based mechanism, a verbal input, a user gesture, or additional information received from an additional sensor. 8. The method of claim 7 , wherein the verbal input includes a keyword or key phrase. 9. The method of claim 1 , further comprising retrieving the audio data from a computing device separate from a computing device that receives the input. 10. The method of claim 1 , further comprising retrieving the audio data from a memory included in a computing device that receives the input. 11. One or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors, cause the one or more processors to identify a sound-producing object by performing the steps of: storing, in a rotating buffer, audio data acquired from an auditory environment via at least one microphone; receiving an input indicating a user request to the sound-producing object included in the auditory environment; generating, via the one or more processors and based on a first portion of the audio data acquired from the auditory environment prior to the user request, a first determination that identifies the sound-producing object, wherein the first determination has a first confidence measure; causing, when the first confidence measure is greater than a confidence threshold, information corresponding to a characteristic of the sound-producing object to be output via at least one output device; and generating, when the first confidence measure is not greater than the confidence threshold, one or more additional determinations, based on additional portions of the audio data longer than the first portion, that identify the sound-producing object until a second determination has a second confidence measure that is greater than the confidence threshold. 12. The one or more non-transitory computer-readable storage media of claim 11 , wherein the input indicating the user request comprises one of: a physical input to a touch-based mechanism, a verbal input, a user gesture, or additional information received from an additional sensor. 13. The one or more non-transitory computer-readable storage media of claim 12 , wherein the verbal input includes a keyword or key phrase. 14. The one or more non-transitory computer-readable storage media of claim 11 , further comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform the step of retrieving the audio data from a computing device separate from a computing device that receives the input. 15. The one or more non-transitory computer-readable storage media of claim 11 , further comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform the step of retrieving the audio data from a memory included in a computing device that receives the input. 16. The one or more non-transitory computer-readable storage media of claim 11 , wherein: storing the audio data comprises: storing the first portion of the audio data in a first portion of the rotating buffer, and storing a second portion of the audio data in a second portion of the rotating buffer, and the first determination is based on the first portion of the audio data. 17. A system, comprising: a microphone that acquires audio data in an auditory environment of a user; a memory; and a processor, coupled to the microphone and the memory, that: receives, from the microphone, audio data of a sound event that has occurred in the auditory environment stores, in the memory, the audio data of the sound event; after receiving the audio data of the sound event, receives an input indicating a user request to identify a sound-producing object associated with the sound event; generates, based on a first portion of the audio data of the sound event, a first determination that identifies the sound-producing object, wherein the first determination has a first confidence measure; causes, when the first confidence measure is greater than a confidence threshold, information corresponding to an identity of the sound-producing object to be output via at least one output device; and generates, when the first confidence measure is not greater than the confidence threshold, one or more additional determinations, based on additional portions of the audio data longer than the first portion, that identify the sound-producing object until a second determination has a second confidence measure that is greater than the confidence threshold. 18. The system of claim 17 , wherein the memory comprises a rotating buffer. 19. The system of claim 17 , wherein the at least one output device comprises a loudspeaker included in a headphone-based assembly.
using biometrical features, e.g. fingerprint, retina-scan (cryptographic mechanisms or cryptographic arrangements for entity authentication using biological data H04L9/3231) · CPC title
using neural networks · CPC title
Interactive procedures; Man-machine interfaces · CPC title
Query by example, e.g. query by humming · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.