Artificial intelligence apparatus and method for providing visual information
US-2021337274-A1 · Oct 28, 2021 · US
US12373486B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12373486-B2 |
| Application number | US-202217703564-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 24, 2022 |
| Priority date | Sep 22, 2021 |
| Publication date | Jul 29, 2025 |
| Grant date | Jul 29, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes obtaining a query content. The query content includes segment information representing a to-be-recognized audio. The method further includes selecting the preset quantity of candidate audios corresponding to the query content from a preset library. Each candidate audio includes a candidate audio segment matched with the segment information. The method further includes inputting the candidate audio segment into a trained detection model so as to obtain target segment information including the segment information and a target audio where the target segment information is located.
Opening claim text (preview).
What is claimed is: 1. An audio recognition method, performed by an electronic device, comprising: obtaining a query content, wherein the query content comprises segment information representing a to-be-recognized audio; selecting a preset quantity of candidate audios corresponding to the query content from a preset library, wherein the candidate audio comprises a candidate audio segment matched with the segment information; obtaining a to-be-detected vector corresponding to the candidate audio according to the segment information and the candidate audio segment; inputting the to-be-detected vector corresponding to the candidate audio into a trained detection model, so as to obtain a detection result data output by the trained detection model; and obtaining target segment information comprising the segment information and a target audio where the target segment information is located according to the detection result data; wherein the detection result data comprise first probability data and second probability data which correspondingly indicate that a morpheme in the candidate audio segment is located at a starting position and an ending position respectively, and wherein: obtaining the target segment information comprising the segment information and the target audio where the target segment information is located according to the detection result data comprises: determining, in response to determining that the starting position is smaller than the ending position, a target audio segment from the candidate audio segment based on a product of the first probability data and the second probability data; and using the target audio segment as the target segment information recognized from the query content, and using an audio where the target segment information is located as the target audio; wherein determining the target audio segment from the candidate audio segment based on the product of the first probability data and the second probability data comprises: determining a starting morpheme at the starting position and an ending morpheme at the ending position when the product of the first probability data and the second probability data is the largest; and determining that all morphemes between the starting morpheme and the ending morpheme constitute the target audio segment. 2. The method according to claim 1 , wherein selecting the preset quantity of the candidate audios corresponding to the query content from the preset library comprises: determining a similarity between a morpheme of the segment information and text information of an audio in the preset library; sorting audios in the preset library according to the similarity from large to small so as to obtain a sorting result; determining a preset quantity of audios with sorting positions at the front as the candidate audios based on the sorting result, the candidate audio comprising at least one audio segment matched with the morphemes of the segment information; and obtaining an audio segment comprising the longest consecutive matching morpheme from the at least one audio segment of the candidate audio, so as to obtain the candidate audio segment, matched with the segment information, of the candidate audio. 3. The method according to claim 1 , wherein obtaining the to-be-detected vector corresponding to the candidate audio according to the segment information and the candidate audio segment comprises: splicing the segment information with the candidate audio segment of the candidate audio respectively, so as to obtain the to-be-detected vector corresponding to the candidate audio, and wherein the to-be-detected vector at least comprises a first identifier and a second identifier, the first identifier is configured to identify a starting position of the to-be-detected vector, and the second identifier is configured to identify a splicing position and an ending position of the to-be-detected vector. 4. The method according to claim 1 , wherein the to-be-recognized audio is a song, and the segment information refers to part of lyrics in the song. 5. An electronic device, comprising: a processor; and a memory configured to store computer instructions executable by the processor; wherein, when the processor executes the instructions the processor is configured to: obtain a query content, wherein the query content comprises segment information representing a to-be-recognized audio; select a preset quantity of candidate audios corresponding to the query content from a preset library, wherein the candidate audio comprises a candidate audio segment matched with the segment information; obtain a to-be-detected vector corresponding to the candidate audio according to the segment information and the candidate audio segment; input the to-be-detected vector corresponding to the candidate audio into a trained detection model, so as to obtain a detection result data output by the trained detection model; and obtain target segment information comprising the segment information and a target audio where the target segment information is located according to the detection result data; wherein the detection result data comprise first probability data and second probability data which correspondingly indicate that a morpheme in the candidate audio segment is located at a starting position and an ending position respectively; and wherein the processor is further configured to: determine, in response to determining that the starting position is smaller than the ending position, a target audio segment from the candidate audio segment based on a product of the first probability data and the second probability data; and use the target audio segment as the target segment information recognized from the query content, and using an audio where the target segment information is located as the target audio; wherein the processor is further configured to: determine a starting morpheme at the starting position and an ending morpheme at the ending position when the product of the first probability data and the second probability data is the largest; and determine that all morphemes between the starting morpheme and the ending morpheme constitute the target audio segment. 6. The electronic device according to claim 5 , wherein the processor is further configured to: determine a similarity between a morpheme of the segment information and text information of an audio in the preset library; sort audios in the preset library according to the similarity from large to small so as to obtain a sorting result; determine a preset quantity of audios with sorting positions at the front as the candidate audios based on the sorting result, the candidate audio comprising at least one audio segment matched with the morphemes of the segment information; and obtain an audio segment comprising the longest consecutive matching morpheme from the at least one audio segment of the candidate audio, so as to obtain the candidate audio segment, matched with the segment information, of the candidate audio. 7. The electronic device according to claim 5 , wherein the processor is further configured to: splice the segment information with the candidate audio segment of the candidate audio respectively, so as to obtain the to-be-detected vector corresponding to the candidate audio, and wherein the to-be-detected vector at least comprises a first identifier and a second identifier, and wherein the first identifier is configured to identify a starting position of the to-be-detected vector, and the second identifier is configured to identify a splicing position and an ending position of the to-be-detected vector. 8. The electronic device according to claim 5 , wherein the to-be-recognized audio is a song, and the
using artificial neural networks · CPC title
using automatically derived transcript of audio data, e.g. lyrics (speech recognition G10L15/00) · CPC title
using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings · CPC title
Phonemes, fenemes or fenones being the recognition units · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.