Detecting synthetic sounds in call audio
US-2023136241-A1 · May 4, 2023 · US
US12413667B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12413667-B2 |
| Application number | US-202418644169-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 24, 2024 |
| Priority date | Nov 3, 2021 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In some implementations, a system may capture audio from a call between a calling device and a called device. The system may filter the captured audio to generate a background audio layer. The system may generate an audio footprint that is a representation of sound in the background audio layer. The system may determine that the audio footprint includes a triggering sound footprint based on one or more audio characteristics of the audio footprint. The system may detect synthetic sound based on the audio footprint and after determining that the audio footprint includes the triggering sound footprint, wherein the synthetic sound is indicative of a sound recording. The system may transmit a notification to one or more devices associated with the call based on detecting the synthetic sound.
Opening claim text (preview).
What is claimed is: 1. A method comprising: generating, by a system, an audio footprint that is associated with a background audio layer generated based on filtering audio from a call to remove a voice audio layer; detecting, by the system, synthetic sound based on detecting, based on comparing a portion of the audio footprint to a plurality of stored audio footprints based on a plurality of priority indicators associated with the plurality of stored audio footprints, that the portion of the audio footprint sufficiently matches a stored audio footprint, of the plurality of stored audio footprints; and performing, by the system, one or more actions based on detecting the synthetic sound. 2. The method of claim 1 , wherein generating the audio footprint that is associated with the background audio layer generated based on filtering the audio from the call to remove the voice audio layer comprises: detecting audio signals corresponding to more than one voice originating from a calling device; determining that at least one audio signal of the audio signals corresponds to a voice other than a user's voice; and including the at least one audio signal in the audio footprint. 3. The method of claim 1 , wherein: a location associated with a first portion of the audio footprint corresponds to a portion of the audio footprint containing a triggering sound footprint, and a location associated with a second portion of the audio footprint corresponds to a remaining portion of the audio footprint. 4. The method of claim 1 , further comprising: detecting the synthetic sound based on determining that the audio footprint satisfies a condition that indicates a likelihood of fraud associated with the call. 5. The method of claim 1 , further comprising: detecting that the audio footprint includes a triggering sound footprint; determining a category associated with the triggering sound footprint; and comparing, based on determining the category, the portion of the audio footprint to stored audio footprints, of the plurality of stored audio footprints, that are associated with the category. 6. The method of claim 1 , further comprising: detecting the synthetic sound based on one or more rules, wherein the one or more rules define what conditions constitute an inconsistency between a first portion of the audio footprint and a second portion of the audio footprint. 7. The method of claim 1 , wherein the plurality of stored audio footprints originate from the call. 8. A system, comprising: one or more memories; and one or more processors, coupled to the one or more memories, configured to: generate an audio footprint that is associated with a background audio layer generated based on filtering audio from a call to remove a voice audio layer; detect synthetic sound based on detecting, based on comparing a portion of the audio footprint to a plurality of stored audio footprints based on a plurality of priority indicators associated with the plurality of stored audio footprints, that the portion of the audio footprint sufficiently matches a stored audio footprint, of the plurality of stored audio footprints; and perform one or more actions based on detecting the synthetic sound. 9. The system of claim 8 , wherein the one or more processors, to generate the audio footprint that is associated with the background audio layer generated based on filtering the audio from the call to remove the voice audio layer, are configured to: detect audio signals corresponding to more than one voice originating from a calling device; determine that at least one audio signal of the audio signals corresponds to a voice other than a user's voice; and include the at least one audio signal in the audio footprint. 10. The system of claim 8 , wherein the one or more processors are further configured to: detect the synthetic sound based on determining that the audio footprint includes a triggering sound footprint. 11. The system of claim 8 , wherein the one or more processors are further configured to: detect the synthetic sound based on determining that the audio footprint satisfies a condition that indicates a likelihood of fraud associated with the call. 12. The system of claim 8 , wherein the one or more processors are further configured to: determine that the audio footprint includes a triggering sound footprint; determine a category associated with the triggering sound footprint; and compare, based on determining the category, the portion of the audio footprint to the plurality of stored audio footprints. 13. The system of claim 8 , wherein the one or more processors are further configured to: detect the synthetic sound based on one or more rules, wherein the one or more rules define what conditions constitute an inconsistency between a first portion of the audio footprint and a second portion of the audio footprint. 14. The system of claim 8 , wherein the plurality of stored audio footprints originate from the call. 15. A non-transitory computer-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a system, cause the system to: filter captured audio from a call to generate a background audio layer; generate an audio footprint that is associated with the background audio layer; detect synthetic sound based on detecting, based on comparing a portion of the audio footprint to a plurality of stored audio footprints based on a plurality of priority indicators associated with the plurality of stored audio footprints, that the portion of the audio footprint sufficiently matches a stored audio footprint, of the plurality of stored audio footprints; and perform one or more actions based on detecting the synthetic sound. 16. The non-transitory computer-readable medium of claim 15 , wherein the one or more instructions, that cause the system to filter the captured audio from the call to generate the background audio layer, cause the system to: detect audio signals corresponding to more than one voice originating from a calling device; determine that at least one audio signal of the audio signals corresponds to a voice other than a user's voice; and include the at least one audio signal in the audio footprint. 17. The non-transitory computer-readable medium of claim 15 , wherein the one or more instructions further cause the system to: detect the synthetic sound based on determining that the audio footprint includes a triggering sound footprint. 18. The non-transitory computer-readable medium of claim 15 , wherein the one or more instructions further cause the system to: detect the synthetic sound based on determining that the audio footprint satisfies a condition that indicates a likelihood of fraud associated with the call. 19. The non-transitory computer-readable medium of claim 15 , wherein the one or more instructions further cause the system to: determine that the audio footprint includes a triggering sound footprint; determine a category associated with the triggering sound footprint; and compare, based on determining the category, stored audio footprints, of the plurality of stored audio footprints, that are associated with the category. 20. The non-transitory computer-readable medium of claim 15 , wherein the one or more instructions further cause the system to: detect the synthetic sound based on one or more rules, wherein the one or more rules define what conditions constitute an inconsisten
for comparison or discrimination · CPC title
using speech recognition · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Management of recordings · CPC title
using properties of sound source · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.