Semi-supervised learning of word embeddings
US-2016328388-A1 · Nov 10, 2016 · US
US11599581B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11599581-B2 |
| Application number | US-201816616921-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 25, 2018 |
| Priority date | May 25, 2017 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of generating matching metadata vectors for identifying content items in a store searchable by input vectors, the method comprising: receiving multiple training inputs, each training input comprising a content identifier indicative of a content item, and at least one natural language description of the content item; for each training input: converting the natural language description into at least one text component; generating at least one vector, each vector corresponding to one text component; generating a set of component parts for each vector, each component part corresponding to a coordinate initialized with a random value; adjusting each random coordinate based on the relationship of each component part to other vectors; determining a weighting for each vector with respect to the item; and defining a metadata vector for each item comprising the vectors containing the adjusted coordinates for that item and the weighting for each vector.
Opening claim text (preview).
The invention claimed is: 1. A method of generating matching metadata vectors for identifying content items in a store searchable by input vectors, the method comprising: receiving multiple training inputs, each training input comprising a content identifier indicative of a content item, and at least one natural language description of the content item; for each training input: converting the natural language description into at least one text component; generating at least one vector, each vector corresponding to one text component; generating a set of component parts for each vector, each component part corresponding to a coordinate initialized with a random value; adjusting each random coordinate based on the relationship of each component part to other vectors by assigning an association strength to each component part of each vector, the association strength being indicative of the association of that component part of the vector with the same component part of other vectors; determining a weighting for each vector with respect to the item; and defining a metadata vector for each item comprising the vectors containing the adjusted coordinates for that item and the weighting for each vector. 2. The method according to claim 1 wherein at least one training input is associated with a plurality of descriptions. 3. The method of claim 1 wherein each content item comprises a category, with at least one text component corresponding to that category. 4. The method of claim 3 wherein each content item corresponds to one of: a setting; a place; a name; a title; or a definer of at least one of a setting, a place, a name or a title. 5. The method according to claim 1 wherein each training input comprises multiple natural language descriptions of different semantic levels, the method further comprising: deriving one or more metadata vectors for each semantic level. 6. The method according to claim 5 further comprising: storing a tabular structure comprising the metadata vectors for multiple semantic levels associated with each content identifier. 7. The method according to claim 1 further comprising the steps of: receiving a search input comprising a natural language description of an unknown content item, which lacks a content identifier; vectorising the natural language description into a search vector comprising a set of text components and assigning a weight to each text component; comparing the search vector to each metadata vector derived from the training inputs to generate a list of possible matches; computing a score for each possible match based on vector similarities; and filtering the list of possible matches based on the similarity score to determine that the search input has a match within the training inputs. 8. The method of claim 7 wherein the step of vectoring the natural language description is based on a frequency of occurrence of the text component in the search input. 9. The method of claim 7 wherein the determined weightings are based on a frequency of occurrence of the text component in the search input and wherein the determined weightings are indicative of predictive power of a text component to associate with a search for a content item. 10. The method according to claim 7 further comprising the steps of: presenting match results to a user; receiving confirmation from the user that a match is correct; and associating the content identifier with the search input to create a further training input. 11. The method according to claim 10 further comprising the steps of: receiving confirmation from the user that the match is incorrect; receiving a correct content identifier from the user; and associating the correct content identifier with the search input to create a further training input. 12. A content access and storage system comprising: a store holding a plurality of content identifiers, each content identifier indicative of a content item; and a computer configured to execute a computer program to carry out the method of claim 1 . 13. A content access and storage system according to claim 12 , wherein the store holds a graph structure comprising a plurality of nodes, some nodes representing content items and some nodes representing text components, wherein the nodes are connected by links according to adjusted weightings. 14. A content access storage system according to claim 12 further comprising: a user interface configured to receive a search input from a user and to receive feedback from the user identifying a content identifier to be associated with the search input. 15. A method of accessing a content store comprising a plurality of metadata vectors generated according to the method of claim 1 , the method comprising: receiving a search input comprising a natural language description of an unknown content item, which lacks a content identifier; vectorising the natural language description into a search vector comprising a set of text components and assigning a weight to each text component; comparing the search vector to each metadata vector to generate a list of possible matches; computing a score for each possible match based on vector similarities; and filtering the list of possible matches based on the similarity score to determine if the search input has a match within the training inputs. 16. A content access and storage system comprising a store of metadata vectors in accordance with the method of claim 15 , and a computer program which when executed by a processor performs the method steps of claim 15 . 17. The method according to claim 1 further comprising: storing a graph structure comprising a plurality of nodes, some nodes representing content items and some nodes representing text components, wherein the nodes are connected by links according to the adjusted weightings. 18. A method of generating matching metadata vectors for identifying content items in a store searchable by input vectors, the method comprising: receiving multiple training inputs, each training input comprising a content identifier indicative of a content item, and at least one natural language description of the content item; for each training input: converting the natural language description into at least one text component; generating at least one vector, each vector corresponding to one text component; generating a set of component parts for each vector, each component part corresponding to a coordinate initialized with a random value; adjusting each random coordinate based on the relationship of each component part to other vectors; determining a weighting for each vector with respect to the item by determining that a vector does not provide a correct identification of content for a given set of weightings, and adjusting the weightings; and defining a metadata vector for each item comprising the vectors containing the adjusted coordinates for that item and the weighting for each vector. 19. The method according to claim 18 further comprising: storing a graph structure comprising a plurality of nodes, some nodes representing content items and some nodes representing text components, wherein the nodes are connected by links according to the adjusted weightings. 20. A method of generating matching metadata vectors for identifying content items in a store searchable by input vectors, the method comprising: receiving multiple training inputs, each training input comprising a content identifier indicative of a content item, and at lea
Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title
Presentation of query results · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Natural language query formulation or dialogue systems · CPC title
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.