Attention neural networks with parallel attention and feed-forward layers
US-12050983-B2 · Jul 30, 2024 · US
US2024330711A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024330711-A1 |
| Application number | US-202218699231-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 30, 2022 |
| Priority date | Jan 5, 2022 |
| Publication date | Oct 3, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present application disclose a natural language processing method and apparatus, a device, and a readable storage medium. The method includes: obtaining a target sentence to be processed, and determining each first entity in the target sentence; for each first entity, in response to the first entity being present in a preset entity set, determining, in the preset entity set, a second entity in maximum correlation with the first entity, generating extended information based on the determined second entity, and adding the extended information after a location of the first entity in the target sentence, to obtain an updated target sentence, where the second entity is any entity in the preset entity set other than the first entity; and inputting the updated target sentence to a bidirectional encoder representations from transformer (BERT) model, such that the BERT model performs a natural language processing task.
Opening claim text (preview).
1 . A natural language processing method, comprising: obtaining a target sentence to be processed, and determining each first entity in the target sentence; for each first entity, in response to the first entity being present in a preset entity set, determining, in the preset entity set, a second in maximum correlation with the first entity, generating extended information based on the determined second entity, and adding the extended information after a location of the first entity in the target sentence, to obtain an updated target sentence, wherein the second entity is any entity in the preset entity set other than the first entity; and inputting the updated target sentence to a bidirectional encoder representation from transformer (BERT) model, such that the BERT model performs a natural language processing task. 2 . The method according to claim 1 , wherein the determining, in the preset entity set, a second entity of a plurality of second entities in maximum correlation with the first entity comprises: taking the first entity as a target object, and determining a maximum relation probability value of the target object relative to each second entity of the plurality of second entities, to obtain N−1 pieces of maximum relation probability values, wherein N−1 represents a quantity of the plurality of second entities, and N represents a total quantity of entities comprised in the preset entity set; determining a correlation between each second entity of the plurality of second entities and the target sentence, to obtain N−1 pieces of correlations; for each second entity of the plurality of second entities, calculating a product of the correlation corresponding to the second entity and the maximum relation probability value corresponding to the second entity, to obtain a correlation score corresponding to the second entity to obtain N−1 pieces of correlation scores; and taking a second entity corresponding to a maximum correlation score of the N−1 pieces of correlation scores as the second entity in maximum correlation with the target object. 3 . The method according to claim 2 , wherein the determining a maximum relation probability value of the target object relative to each second entity of the plurality of second entities comprises: generating an N×N×M-dimensional tensor for representing a relation and a relation probability value between entities in the preset entity set, wherein M represents a quantity of dimensions of a relation vector between different entities in the preset entity set; and generating a knowledge graph based on the N×N×M-dimensional tensor, and querying, in the knowledge graph, the maximum relation probability value of the target object relative to each second entity of the plurality of second entities. 4 . The method according to claim 3 , wherein the generating an N×N×M-dimensional tensor for representing a relation and a relation probability value between entities in the preset entity set comprises: generating an initial tensor that is all-0 in N×N×M dimensions; obtaining a sentence library for generating the preset entity set, traversing each sentence in the sentence library, and using a traversed sentence as a sentence to be recognized; taking two adjacent entities in the sentence to be recognized as an entity group, to obtain a plurality of entity groups; recognizing a relation between two entities in each entity group of the plurality of entity groups by using a relation recognition model, to obtain a plurality of M-dimensional relation vectors; for each M-dimensional relation vector of the plurality of M-dimensional relation vectors, in response to a maximum value in any M-dimensional relation vector of the plurality of M-dimensional relation vectors being greater than a preset threshold, updating an element at a location that corresponds to the maximum value and that is in the initial tensor from 0 to 1, to update the initial tensor; and traversing a next sentence in the sentence library and continuing to update a current tensor, and after each sentence in the sentence library is traversed, outputting and optimizing a currently obtained tensor to obtain the N×N×M-dimensional tensor. 5 . The method according to claim 4 , wherein the recognizing a relation between two entities in each entity group of the plurality of entity groups by using a relation recognition model, to obtain a plurality of M-dimensional relation vectors comprises: for two entities in any entity group of the plurality of entity groups, replacing the two entities in the sentence to be recognized with different identifiers to obtain a replaced sentence, and inputting the replaced sentence to the relation recognition model, such that the relation recognition model outputs an M-dimensional relation vector corresponding to the two entities. 6 . The method according to claim 4 , wherein the optimizing a currently obtained tensor to obtain the N×N×M-dimensional tensor comprises: taking the currently obtained tensor as an initial three-dimensional matrix, and decomposing the initial three-dimensional matrix into M N×N-dimensional matrices X i , wherein i=1, 2, . . . , M; decomposing a d×d×M-dimensional tensor O obtained through initialization into M pieces of d×d-dimensional matrices O i , wherein d represents an adjustable hyper-parameter; obtaining an N×d-dimensional matrix A through initialization, and calculating optimal A′ and M pieces of optimal O i ′ based on X i =AO i A T and a gradient descent method; obtaining a replaced three-dimensional matrix based on the optimal A′ and the M pieces of optimal O i ′; and comparing the initial three-dimensional matrix with the replaced three-dimensional matrix bit by bit based on a max function, and reserving a maximum value at each location, to obtain the N×N×M-dimensional tensor. 7 . The method according to claim 5 , wherein the relation recognition model comprises a sub-model of a transformer structure and a relation classification neural network; and the inputting the replaced sentence to the relation recognition model, such that the relation recognition model outputs an M-dimensional relation vector corresponding to the two entities comprises: inputting the replaced sentence to the sub-model of the transformer structure, to obtain a feature vector with the identifiers of the two entities; and inputting the feature vector with the identifiers of the two entities to the relation classification neural network, to obtain the M-dimensional relation vector corresponding to the two entities. 8 . The method according to claim 2 , wherein the determining a correlation between each second entity of the plurality of second entities and the target sentence comprises: for each second entity of the plurality of second entities, determining a normalization result of a sum of a correlation degree between each first entity in the target sentence and the second entity as the correlation between the second entity and the target sentence. 9 . The method according to claim 8 , wherein a correlation degree between any first entity and any second entity is a maximum relation probability value of the first entity relative to the second entity plus a maximum relation probability value of the second entity relative to the first entity. 10 . The method according to claim 1 , wherein the determining each first entity in the target sentence comprises: converting each word in the target sentence into a 1024-dimensional vector, to obtain a vector set; and inputting the vector set to an entity recognition model, such that the entity recognition model recognizes each first entity in the target sentence. 11 . (canceled) 12 . An
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
Knowledge representation; Symbolic representation · CPC title
Ontology · CPC title
Named entity recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.