Generating data associated with underrepresented data based on a received data input

US10915820B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10915820-B2
Application numberUS-201816059399-A
CountryUS
Kind codeB2
Filing dateAug 9, 2018
Priority dateAug 9, 2018
Publication dateFeb 9, 2021
Grant dateFeb 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An example method described herein involves receiving a data input; identifying a plurality of topics in the data input; determining an underrepresented set of data for a first set of topics of the plurality of topics based on a plurality of knowledge graphs associated with the first set of topics; calculating a score for each topic of the first set of topics based on a representative learning technique; determining that the score for a first topic of the first set of topics satisfies a threshold score; selecting a topic specific knowledge graph based on the first topic; identifying representative objects that are similar to objects of the data input based on the topic specific knowledge graph; generating representation data that is similar to the data input based on the representative objects to balance the underrepresented set of data with a set of data associated with a second set of topics of the plurality of topics; and performing an action associated with the representation data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, by a device, a data input; receiving, by the device, a domain knowledge graph associated with objects of the data input; identifying, by the device, a plurality of topics in the data input based on the domain knowledge graph; determining, by the device, a represented set of data for a first set of topics of the plurality of topics; determining, by the device, an underrepresented set of data for a second set of topics of the plurality of topics, the underrepresented set of data being determined using a representative learning technique, and the representative learning technique being machine learning; determining, by the device, a distance between an identified topic of the data input and a first topic in the domain knowledge graph; calculating, by the device, a score for the first topic based on the distance; determining, by the device, that the score for the first topic satisfies a threshold score; determining, by the device and based on determining that the score for the first topic satisfies the threshold score, that the first topic of the plurality of topics is one topic of the second set of topics; selecting, by the device, a topic specific knowledge graph based on the first topic; identifying, by the device, objects of the data input based on a sentence structure of the data input and based on the topic specific knowledge graph; identifying, by the device, representative objects that have a threshold level of similarity with the objects of the data input based on the topic specific knowledge graph, wherein identifying the representative objects that are similar to the objects of the data input comprises: mapping the representative objects to the objects of the data input based on characteristics of the representative objects, characteristics of the objects, and positions of the representative objects within the topic specific knowledge graph; generating, by the device and based on the representative objects, representation data that is of a similar part of speech as the objects of the data input and increases an amount of data associated with the underrepresented set of data, the part of speech being a noun, verb, adjective, adverb, or preposition, and generating the representation data comprising: identifying an organizational structure of the objects of the data input, identifying a characteristic of each of the objects of the data input, mapping the representative objects to the objects of the data input based on the organizational structure and the characteristic of each of the objects according to the topic specific knowledge graph, and substituting a representative object, of the representative objects, for an object, of the objects of the data input, based on an edge distance of the representative object from the object of the data input in the topic specific knowledge graph; and performing, by the device, an action associated with the representation data. 2. The method of claim 1 , further comprising: identifying the objects of the data input by comparing the objects of the data input to objects of a knowledge graph data structure, wherein the knowledge graph data structure includes the domain knowledge graph. 3. The method of claim 1 , wherein determining the underrepresented set of data for the second set of topics comprises: determining that the underrepresented set of data for the second set of topics is underrepresented relative to the represented set of data for the first set of topics. 4. The method of claim 1 , wherein determining the underrepresented set of data for the second set of topics comprises: determining that the underrepresented set of data for the second set of topics is underrepresented relative to the plurality of topics. 5. The method of claim 1 , wherein performing the action comprises: generating a representation knowledge graph based on the representation data, wherein the representation knowledge graph includes a new topic that is associated with the underrepresented set of data. 6. The method of claim 1 , further comprising: converting the topic specific knowledge graph into an embedding space, wherein the objects of the data input are identified in the embedding space and the representative objects are identified in the embedding space. 7. The method of claim 1 , wherein the represented set of data is determined using the representative learning technique. 8. The method of claim 1 , wherein the score is calculated using the representative learning technique. 9. The method of claim 1 , further comprising: encoding the data input based on the domain knowledge graph to identify the objects of the data input. 10. The method of claim 1 , wherein receiving the data input comprises: subscribing to one or more sources to receive the data input from the one or more sources. 11. The method of claim 1 , wherein receiving the domain knowledge graph associated with objects of the data input comprises: obtaining the domain knowledge graph to identify topics associated with the data input, the domain knowledge graph including a knowledge graph of topics of a particular domain, with each topic, of the topics, being a node on the domain knowledge graph, and edges between the topics corresponding to relationships between respective topics. 12. A device, comprising: one or more memories; and one or more processors, communicatively coupled to the one or more memories, to: receive a data input; identify a plurality of topics in the data input; determine an underrepresented set of data for a first set of topics of the plurality of topics based on a plurality of knowledge graphs associated with the first set of topics, the underrepresented set of data being determined using a representative learning technique, and the representative learning technique being machine learning; determine a distance between an identified topic of the data input and a first topic in a domain knowledge graph; calculate a score for the first topic based on the distance; determine that the score for the first topic satisfies a threshold score; select a topic specific knowledge graph based on the first topic; identify objects of the data input based on a sentence structure of the data input; identify representative objects that are similar to the objects of the data input based on the topic specific knowledge graph, wherein the one or more processors, when identifying the representative objects, are to: map the representative objects to the objects of the data input based on characteristics of the representative objects, characteristics of the objects, and positions of the representative objects within the topic specific knowledge graph; generate, based on the representative objects, representation data that is of a similar part of speech as the objects of to the data input to balance the underrepresented set of data with a set of data associated with a second set of topics of the plurality of topics and increase an amount of data associated with the underrepresented set of data, wherein the part of speech is a noun, verb, adjective, adverb, or preposition, and wherein the one or more processors, when generating the representation data, are to: identify an organizational structure of the objects of the data input; identify a characteristic of each of the objects of the data input; map the representative objects to the objects of the data input based on the organizational structure and the characteristic of each of the objects according to the topic specific knowledge graph; and substitute a representative object, of the representat

Assignees

Inventors

Classifications

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • G06N5/02Primary

    Knowledge representation; Symbolic representation · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • G06F16/367Primary

    Ontology · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10915820B2 cover?
An example method described herein involves receiving a data input; identifying a plurality of topics in the data input; determining an underrepresented set of data for a first set of topics of the plurality of topics based on a plurality of knowledge graphs associated with the first set of topics; calculating a score for each topic of the first set of topics based on a representative learning …
Who is the assignee on this patent?
Accenture Global Solutions Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).