Automated domain adaptation for semantic search using embedding vectors
US-2025053586-A1 · Feb 13, 2025 · US
US12423364B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12423364-B2 |
| Application number | US-202318365046-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 3, 2023 |
| Priority date | Aug 3, 2023 |
| Publication date | Sep 23, 2025 |
| Grant date | Sep 23, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A large language model (LLM) is used to broaden a search for supporting material for a media project. The LLM is provided with contextual material from the media project and optional grounding material and generates a list of types of supporting material. The list is provided to a machine-learning-based encoder, which encodes the list items into embedding space vectors. A body of source material, which may multimodal, is also encoded into the embedding space. A search engine is used to locate items of source material having embedded space vectors closest to vectors corresponding to the list of types of supporting material. Located items are provided to a media editing application as suggested supporting material. The body of source material may include live external sources. A small language model fine-tuned with a training data set generated by a LLM may be used instead of the LLM.
Opening claim text (preview).
What is claimed is: 1. A method of locating material in support of a media project, the method comprising: directing a prompt to a large language model (LLM), wherein the prompt includes: one or more items of content pertaining to the media project; and a request to provide a list of types of material that support the media project; receiving output from the LLM, the LLM output comprising a list of types of material that support the media project; inputting the LLM output to a first machine-learning-based encoder to generate a first set of embeddings that include, for each type of supporting material specified in the LLM output, a corresponding vector in an embedding space; providing a body of source material to a second machine-learning-based encoder to generate a second set of embeddings comprising, for each item of source material in the body of source material, a corresponding vector in the embedding space; using a search engine to identify a plurality of vectors of the second set of embeddings having a greatest degree of similarity in the embedding space with at least one vector of the first set of embeddings; and outputting from the search engine a plurality of items of the source material, each item of the plurality of items of the source material corresponding to the one of the identified plurality of vectors of the second set of embeddings. 2. The method of claim 1 , wherein the content pertaining to the media project has been incorporated into the media project. 3. The method of claim 1 , wherein the prompt includes grounding material comprising material that was not included within data used to train the LLM. 4. The method of claim 1 , wherein the list of types of material that support the media project output by the LLM includes a plurality of media modalities. 5. The method of claim 1 , wherein the one or more items of content pertaining to the media project included within the prompt includes a non-text media modality. 6. The method of claim 1 , further comprising: enabling a user of a media editing system used to create the media project to edit the LLM output; and inputting to the first machine-learning-based encoder LLM output edited by the user. 7. The method of claim 1 , wherein the body of source material includes a plurality of media modalities. 8. The method of claim 1 , wherein the second machine-learning-based encoder is able to encode a plurality of media modalities into a common embedding space. 9. The method of claim 1 , wherein the body of source material includes material obtained from sources external to a media editing system used to create the media project. 10. The method of claim 9 , wherein the sources external to the media editing system are dynamic. 11. The method of claim 10 , wherein the dynamic sources include at least one of a social media feed, external search engines, and WikiData. 12. The method of claim 9 , wherein a description of a content item in an external source is retrieved and encoded to generate a corresponding vector in the embedding space, and, when the content item is requested for use with the media project, a locator associated with the vector is used to retrieve the content item from the external source. 13. The method of claim 9 , wherein the material obtained from sources external to the media editing system is curated and stored in a storage location that is local to the media editing system, the curation including selection of material based on at least one of a date of release of the material, a current presence of the material in an external source, and a size of the material. 14. The method of claim 1 , wherein the degree of similarity in the embedding space is one of a cosine similarity metric and a Pythagorean distance metric. 15. The method of claim 1 , wherein the prompt is automatically directed to the large language model when a location of a position indicator within a media editing application used to create the media project is changed. 16. The method of claim 1 , further comprising displaying within a user interface of a media editing application used to create the media project one or more items of the plurality of items of the source material output by the search engine. 17. The method of claim 1 , further comprising displaying within a user interface of a media editing application used to create the media project the output of the LLM. 18. The method of claim 1 , further directing a second prompt to the LLM, the second prompt including the LLM output and a request to specify actual instances of potential source material items of the types listed in the LLM output. 19. A method of locating material in support of a media project, the method comprising: directing a prompt to a small language model (SLM), wherein: the SLM has been trained using a data set comprising a plurality of news stories and, for each news story in the plurality of news stories, a corresponding large language model (LLM) output of a list of types of material that would support a media project pertaining to the news story; and the prompt includes: one or more items of content pertaining to the media project; and a request to provide a list of types of material that support the media project; receiving output from the SLM, the SLM output comprising a list of types of material that support the media project; inputting the SLM output to a first machine-learning-based encoder to generate a first set of embeddings that include, for each type of supporting material specified in the SLM output, a corresponding vector in an embedding space; providing a body of source material to a second machine-learning-based encoder to generate a second set of embeddings comprising, for each item of source material in the body of source material, a corresponding vector in the embedding space; using a search engine to identify a plurality of vectors of the second set of embeddings having a greatest degree of similarity in the embedding space with at least one vector of the first set of embeddings; and outputting from the search engine a plurality of items of the source material, each item of the plurality of items of the source material corresponding to the one of the identified plurality of vectors of the second set of embeddings. 20. The method of claim 19 , wherein the SLM and a media editing application used by an editor to edit the media project are hosted on a system local to the editor. 21. A computer program product comprising: a non-transitory computer-readable medium with computer-readable instructions encoded thereon, wherein the computer-readable instructions, when processed by a processing device, instruct the processing device to perform a method of locating material in support of a media project, the method comprising: directing a prompt to a large language model (LLM), wherein the prompt includes: one or more items of content pertaining to the media project; and a request to provide a list of types of material that support the media project; receiving output from the LLM, the LLM output comprising a list of types of material that support the media project; inputting the LLM output to a first machine-learning-based encoder to generate a first set of embeddings that include, for each type of supporting material specified in the LLM output, a corresponding vector in an embedding space; providing a body of source material to a second machine-learning-based encoder to generate a second set of embeddings comprising, for each item of s
Semantic analysis · CPC title
Editing, e.g. inserting or deleting · CPC title
Querying, e.g. by the use of web search engines · CPC title
Natural language query formulation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.