User specified keyword spotting using long short term memory neural network feature extractor
US-2016180838-A1 · Jun 23, 2016 · US
US11355103B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11355103-B2 |
| Application number | US-202016775149-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 28, 2020 |
| Priority date | Jan 28, 2019 |
| Publication date | Jun 7, 2022 |
| Grant date | Jun 7, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein provide for a computer that detects one or more keywords of interest using acoustic features, to detect or query commonalities across multiple fraud calls. Embodiments described herein may implement unsupervised keyword spotting (UKWS) or unsupervised word discovery (UWD) in order to identify commonalities across a set of calls, where both UKWS and UWD employ Gaussian Mixture Models (GMM) and one or more dynamic time-warping algorithms. A user may indicate a training exemplar or occurrence of call-specific information, referred to herein as “a named entity,” such as a person's name, an account number, account balance, or order number. The computer may perform a redaction process that computationally nullifies the import of the named entity in the modeling processes described herein.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: generating, by a computer, a plurality of audio frames from a plurality of audio signals; clustering, by the computer, one or more features of each audio frame according to a modeling algorithm, thereby generating one or more models for each frame; extracting, by the computer, posterior probabilities for each of the one or more features of each audio frame extracted from the audio frames according to the one or more models; receiving, by the computer, from a client computer a keyword indicator for a keyword to query in the audio signals, the keyword comprising one or more words; receiving, by the computer, from the client computer a named entity indicator for a named entity to be redacted from the query, wherein the computer nullifies the posterior probability of each frame containing the named entity; calculating, by the computer, for each audio frame containing the keyword, a first similarity score and a second similarity score, the first similarity score and the second similarity score of an audio frame calculated using a model selected for the respective frame based on the posterior probability of the audio frame; storing, by the computer, into a queue, a subset of audio frames having a second similarity score comparatively higher than a corresponding first similarity score, the subset containing a review-threshold amount of audio frames; and generating, by the computer, a list of audio segments of the audio signals matching the keyword, the list of audio segments containing at least one of the audio frames in the subset. 2. The method of claim 1 , wherein the first similarity score is a lower-bound dynamic time warping score calculated by the computer using a lower-bound dynamic time-warping algorithm. 3. The method of claim 1 , wherein the second similarity score is a segmental dynamic time warping score calculated by the computer using a segmental dynamic time-warping algorithm. 4. The method of claim 1 , wherein the keyword indicator received from the client computer includes one or more timestamps indicating to the computer when instances of the keyword occur in at least one audio signal. 5. The method of claim 1 , wherein the named entity indicator received from the client computer includes one or more timestamps indicating to the computer when instances of the named entity occur in at least one audio signal. 6. The method of claim 1 , further comprising receiving, by the computer, from the client computer a review-threshold indicator indicating the review-threshold amount of audio frames in the subset. 7. The method of claim 1 , further comprising transmitting, by the computer, to the client computer the list of audio segments matching the keyword, the list of audio segments containing the review-threshold amount of audio segments. 8. The method of claim 1 , further comprising identifying, by the computer, for each segment in the list, one or more timestamps indicating when instances of the keyword occur in the segment, wherein the list transmitted to the client computer includes each timestamp associated with the one or more segments of the list. 9. The method of claim 1 , further comprising generating, by the computer, one or more segments for each of the audio signals, wherein each segment comprises at least one frame. 10. The method of claim 9 , wherein the one or more segments of each audio signal are generated according to a voice-activated detection module configured to detect a segment. 11. A computer-implemented method comprising: segmenting, by a computer, a first audio signal into a first set of one or more audio segments, and a second audio signal into a second set of one or more audio segments, at least one audio signal including a named entity associated with a named entity indicator; generating, by the computer, sets of one or more paths for each audio segment in the first set of audio segments, and sets of one or more paths for each audio segment in the second set of audio segments; calculating, by the computer, based on lower-bound dynamic time-warping algorithm, a similarity score for each path of each audio segment of the first set of audio segments, and for each path of each audio segment of the second set of audio segments, wherein the computer nullifies the similarity score of each path containing the named entity according to the named entity indicator; and identifying, by the computer, at least one similar acoustic region between the first set of audio segments and the second set of audio segments, based upon comparing the similarity scores of each path of each segment of the first set of audio segments against the similarity scores of each path of each segment of the second set of audio segments. 12. The method of claim 11 , wherein each path is a fixed-length portion of an audio segment. 13. The method of claim 11 , further comprising: clustering, by the computer, one or more features of each path of each segment in a similar acoustic region according to a modeling algorithm, thereby generating one or more models for each path; and extracting, by the computer, posterior probabilities for each of the one or more features of extracted from the audio paths according to the one or more models, wherein the similarity score for each respective path is calculated using a model selected for the respective path based on the posterior probability of the respective path. 14. The method of claim 13 , further comprising receiving, by the computer, from a client computer a named entity indicator indicating to the computer an instance of the named entity in at least one audio signal, wherein the computer nullifies the posterior probability of each path containing the named entity for clustering. 15. The method of claim 11 , wherein comparing the similarity scores further comprises: selecting, by the computer, from the second set of audio segments a first test segment at a first time index and defined by a first time window; comparing, by the computer, the similarity scores for the paths of the first test segment against the similarity scores for the paths of at least one query segment of the first set of audio segments, according to the first time window and the first time index; selecting, by the computer, from the second set of audio segments a second test segment at a second time index and defined by a second time window; and comparing, by the computer, the similarity scores for the paths of the second test segment against the similarity scores for the paths of the at least one query segment, according to the second time window and the second time index. 16. The method of claim 11 , wherein identifying a similar acoustic region further comprises: identifying, by the computer, a first-level match between a query segment of the first set of audio segments and a test segment of the second set of audio segments, based on determining that a minimum distance value between the similarity scores for the paths of the query segment and the similarity scores for the paths of the test segment satisfies a first-level threshold. 17. The method of claim 16 , further comprising identifying, by the computer, a second-level match between the query segment of the first set of audio segments and the test segment of the second set of audio segments, based on determining that a number of first-level matches satisfies a second-level threshold. 18. The method of claim 17 , further comprising: identifying, by the computer, one or more pairwise matches in the first set of audio s
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Probabilistic grammars, e.g. word n-grams · CPC title
Execution procedure of a spoken command · CPC title
for retrieval · CPC title
Word spotting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.