System and method for efficiently managing large datasets for training an ai model
US-2021224683-A1 · Jul 22, 2021 · US
US12293156B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12293156-B2 |
| Application number | US-202217885423-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 10, 2022 |
| Priority date | Aug 10, 2022 |
| Publication date | May 6, 2025 |
| Grant date | May 6, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for deep technology innovation management by cross-pollinating innovations dataset are disclosed. A system extracts context-based keyword from an innovation dataset by transforming the innovation dataset to a vector. Further, the system searches semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted a context-based keyword. Furthermore, system clusters the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords. Thereafter, the system determines weighted keywords in each cluster using the obtained cluster centroids, and classifies the weighted keywords to identify emerging innovation trends relevant to the innovation in the innovation dataset. The system forms cohorts of innovators to explore the reuse of innovations, assets, code, and build focused monetization model.
Opening claim text (preview).
We claim: 1. A system comprising: a processor coupled to a memory, the memory storing instructions executable by the processor to: extract a context-based keyword from an innovation dataset by transforming the innovation dataset to a vector, wherein the innovation dataset comprises data corresponding to an innovation; search semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted a context based keyword, wherein the entities correspond to named entity recognition in the innovation dataset; cluster the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords; determine weighted keywords in each cluster using the obtained cluster centroids, and classify the weighted keywords to identify emerging innovation trends relevant to the innovation in the innovation dataset; receive a two-layer user feedback from a user for the searched semantically relevant keywords, wherein the two-layer user feedback comprises a first layer of feedback corresponding to a relevancy of the searched semantically relevant keywords, and a second layer of feedback comprising an additional relevant keyword for each semantically relevant keyword; and map the additional relevant keyword to the innovation dataset that comprises data corresponding to the innovation. 2. The system as claimed in claim 1 , wherein the processor is further configured to: recommend at least one of content, a team, cohorts, and experts relevant to the emerging innovation trends relevant to the innovation in the innovation dataset; and create a cohort or a private channel comprising team members relevant to the recommendation for reusing the innovation in the innovation dataset. 3. The system as claimed in claim 1 , wherein the processor is further configured to: provide innovation insights, and relationships to create a semantic knowledge network for a thought seeding, wherein the semantic knowledge network comprises at least one of emerging innovation trends, plurality of innovations, innovators, experts, and a demography of the innovators associated with the emerging innovation trends. 4. The system as claimed in claim 1 , wherein the processor is further configured to: retrain a Document to Vector (Doc2Vec) model for semantic search based on the additional relevant keyword corresponding to the semantically relevant keywords. 5. The system as claimed in claim 1 , wherein, for extracting the context-based keywords from the innovation dataset, the processor is further configured to: extract n-grams from the innovation dataset, wherein the n-grams corresponds to a sequence of n-consecutive tokens in a string of the innovation dataset; rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset; determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extract context-based keywords for the similar n-grams; convert the extracted context-based keywords to high-dimensional vectors; calculate the semantic distance between high-dimensional vectors and the innovation dataset; and validate the context-based keywords with a historical keyword dataset. 6. The system as claimed in claim 1 , wherein, for searching semantically relevant keywords, the processor is further configured to: pre-process the extracted context-based keywords, wherein the pre processing comprises at least one of a noise removal process, a tokenization process, a stemming process, a lemmatization process, and a normalization process; extract the entity and the key phrase from the pre-processed context-based keywords; and vectorize the extracted entity and key phrase for searching semantically relevant keywords. 7. The system as claimed in claim 1 , wherein the clustering is performed using at least one of an agglomerative hierarchical clustering technique and a K-means clustering technique. 8. The system as claimed in claim 2 , wherein the cohort or the private channel corresponds to innovation-centric thematic cohorts to determine reuse and monetization strategy of the recommended content relevant to the emerging innovation trends, to interact with other innovators, to drive collaborations between innovators, and inspire other innovators. 9. The system as claimed in claim 8 , wherein the innovation-centric thematic cohorts provide an assessment of an impact of the emerging innovation trends relevant to the innovation in the innovation dataset. 10. A method comprising: extracting, by a processor, a context-based keyword from an innovation dataset by transforming the innovation dataset to a vector, wherein the innovation dataset comprises data corresponding to an innovation; searching semantically, by the processor, relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted context-based keyword, wherein the entities correspond to named entity recognition in the innovation dataset; clustering, by the processor, the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords; determining, by the processor, weighted keywords in each cluster using the obtained cluster centroids, and classifying the weighted keywords to identify emerging innovation trends relevant to the innovation in the innovation dataset; receiving, by the processor, a two-layer user feedback from a user for the searched semantically relevant keywords, wherein the two-layer user feedback comprises a first layer of feedback corresponding to a relevancy of the searched semantically relevant keywords, and a second layer of feedback comprising an additional relevant keyword for each semantically relevant keyword; and mapping, by the processor, the additional relevant keyword to the innovation dataset comprises data that corresponding to the innovation. 11. The method as claimed in claim 10 , further comprises: recommending, by the processor, at least one of content, a team, cohorts, and experts relevant to the emerging innovation trends relevant to the innovation in the innovation dataset; and creating, by the processor, a cohort or a private channel comprising team members relevant to the recommendation for reusing the innovation in the innovation dataset. 12. The method as claimed in claim 10 , further comprises: providing, by the processor, innovation insights, and relationships to create a semantic knowledge network for a thought seeding, wherein the semantic knowledge network comprises at least one of emerging innovation trends, plurality of innovations, innovators, experts, and a demography of the innovators associated with the emerging innovation trends. 13. The method as claimed in claim 10 , further comprises: retraining a Document to Vector (Doc2Vec) model for semantic search based on the additional relevant keyword corresponding to the semantically relevant keywords. 14. The method as claimed in claim 10 , wherein extracting the context-based keyword from the innovation dataset further comprises: extracting, by the processor, n-grams from the innovation dataset, wherein the n-grams corresponds to a sequence of n-consecutive tokens in a string of the innovation dataset; ranking, by the processor, the n-grams based on a frequency of the extracted n-grams in the innovation dataset; determining, by the processor, a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context based keywords for the simil
with fixed number of clusters, e.g. K-means clustering · CPC title
Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram · CPC title
Matching criteria, e.g. proximity measures · CPC title
Clustering; Classification · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.