Cas12a systems, methods, and compositions for targeted rna base editing
US-2021079366-A1 · Mar 18, 2021 · US
US12008810B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12008810-B2 |
| Application number | US-202117225969-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 8, 2021 |
| Priority date | Mar 5, 2019 |
| Publication date | Jun 11, 2024 |
| Grant date | Jun 11, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This application discloses a video sequence selection method, applicable to a computer device, the method including: receiving a to-be-matched video and a to-be-matched text, the to-be-matched text having a to-be-matched text feature sequence; invoking a spatiotemporal candidate region generator to extract a spatiotemporal candidate region set from the to-be-matched video, the spatiotemporal candidate region set including N spatiotemporal candidate regions; performing feature extraction on each spatiotemporal candidate region by using a convolutional neural network, to obtain N to-be-matched video feature sequences; invoking an attention-based interactor to obtain a matching score corresponding to each spatiotemporal candidate region, the matching score being used for representing a matching relationship between the spatiotemporal candidate region and the to-be-matched text; and selecting a target spatiotemporal candidate region from the spatiotemporal candidate region set according to the matching score corresponding to each spatiotemporal candidate region, and outputting the target spatiotemporal candidate region. In this application, an association between the video and the text in time sequence is considered during matching, thereby increasing a degree of matching between a video sequence and the text.
Opening claim text (preview).
What is claimed is: 1. A video sequence selection method, applicable to a computer device, the method comprising: receiving, by the computer device, a to-be-matched video and a to-be-matched text, wherein the to-be-matched text is not part of the to-be-matched video, the to-be-matched video comprising a plurality of frames, the to-be-matched text comprising at least one word, and the to-be-matched text having a to-be-matched text feature sequence corresponding to a target object; invoking, by the computer device, a spatiotemporal candidate region generator to extract a spatiotemporal candidate region set from the to-be-matched video, the spatiotemporal candidate region set comprising N spatiotemporal candidate regions, N being an integer greater than or equal to 1, and each spatiotemporal candidate region corresponding to images within a respective video sequence in the to-be-matched video that include a candidate object; performing, by the computer device, feature extraction on each spatiotemporal candidate region in the spatiotemporal candidate region set by using a convolutional neural network, to obtain N to-be-matched video feature sequences, each to-be-matched video feature sequence corresponding to a respective spatiotemporal candidate region in the spatiotemporal candidate region set and representing a respective candidate object in the respective spatiotemporal candidate region; invoking, by the computer device, an attention-based interactor to obtain a matching score corresponding to each spatiotemporal candidate region, the interactor being configured to process the to-be-matched video feature sequence and the to-be-matched text feature sequence, and the matching score being used for representing a matching relationship between a respective candidate object in the spatiotemporal candidate region and the target object corresponding to the to-be-matched text; and selecting, by the computer device, from the spatiotemporal candidate region set, a target spatiotemporal candidate region having a highest matching score outputted by the interactor, and outputting the target spatiotemporal candidate region as representing the target object corresponding to the to-be-matched text. 2. The method according to claim 1 , wherein the invoking, by the computer device, a spatiotemporal candidate region generator to extract a spatiotemporal candidate region set from the to-be-matched video comprises: invoking, by the computer device, the spatiotemporal candidate region generator to obtain a candidate region and a confidence score of each frame in the to-be-matched video, each candidate region corresponding to a respective confidence score; invoking, by the computer device, the spatiotemporal candidate region generator to obtain a degree of overlap of similar image content between every two adjacent frames in the to-be-matched video; and invoking, by the computer device, the spatiotemporal candidate region generator to generate the spatiotemporal candidate region set according to the candidate region and the confidence score of each frame and the overlap degrees. 3. The method according to claim 1 , wherein the invoking, by the computer device, an attention-based interactor to obtain a matching score corresponding to each spatiotemporal candidate region comprises: invoking, by the computer device for each spatiotemporal candidate region, an encoder of the interactor to encode the to-be-matched video feature sequence corresponding to the spatiotemporal candidate region, to obtain a visual feature set, the visual feature set comprising at least one visual feature of a candidate object in the spatiotemporal candidate region; invoking, by the computer device, the encoder of the interactor to encode the to-be-matched text feature sequence, to obtain a text feature set, the text feature set comprising at least one text feature of the target object; invoking, by the computer device, the interactor to determine a visual text feature set according to the visual feature set and the text feature set, the visual text feature set comprising at least one visual text feature, the visual text feature representing a visual feature-based text feature; and invoking, by the computer device, the interactor to determine the matching score corresponding to the candidate object in the spatiotemporal candidate region and the target object according to the visual text feature set and the visual feature set. 4. The method according to claim 3 , wherein the invoking, by the computer device, an encoder of the interactor to encode the to-be-matched video feature sequence corresponding to the spatiotemporal candidate region, to obtain a visual feature set comprises: calculating the visual feature set in the following manner: H p ={h t p } t=1 t p , and h t p =LSTM p ( f t p ,h t-1 p ), H p representing the visual feature set, h t p representing a t th visual feature in the visual feature set, t p representing a time step in the spatiotemporal candidate region, h t-1 p representing a (t−1) th visual feature in the visual feature set, LSTM p ( ) representing a first long short-term memory (LSTM) network encoder, and f t p representing a t th row of features in the to-be-matched video feature sequence; and the invoking, by the computer device, the encoder of the interactor to encode the to-be-matched text feature sequence, to obtain a text feature set comprises: calculating the text feature set in the following manner: H q ={h t q } t=1 t q , and h t q =LSTM q ( f t q ,h t-1 q ), H q representing the text feature set, h t q representing a t th text feature in the text feature set, t q representing a word quantity of the to-be-matched text, h t-1 q representing a (t−1) th text feature in the text feature set, LSTM q ( ) representing a second LSTM encoder, and f t q representing a t th row of features in the to-be-matched text feature sequence. 5. The method according to claim 3 , wherein the invoking, by the computer device, the interactor to determine a visual text feature set according to the visual feature set and the text feature set comprises: invoking, by the computer device, the interactor to calculate an attention weight of the text feature corresponding to the visual feature according to the visual feature set and the text feature set; invoking, by the computer device, the interactor to calculate a normalized attention weight of the text feature corresponding to the visual feature according to the attention weight; and invoking, by the computer device, the interactor to calculate the visual text feature set according to the normalized attention weight and the text feature. 6. The method according to claim 5 , wherein the invoking, by the computer device, the interactor to calculate an attention weight of the text feature corresponding to the visual feature according to the visual feature set and the text feature set comprises: calculate the attention weight in the following manner: e i,j =w T tanh( W q h j q +W p h i p +b 1 )+ b 2 ; e i,j representing an attention weight of a j th text feature corresponding to an i th visual feature, h j q representing the j th text feature, h i p representing the i th visual feature, W T representing a first model parameter, W q representing a second model parameter, W p representing a third model parameter, b 1 representing a fourth model parameter, b 2 representing a fifth model parameter, and tanh( ) representing a hyperbolic tangent function; the invoking, by the computer device, the interactor to calculate a normalized attention weight of the text feature corresponding to the visual feature according to the atte
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Engine management systems · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.