What technology area does this patent fall under?

Primary CPC classification G06F40/284. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Preserving and processing ambiguity in natural language

US10528664B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10528664-B2
Application number	US-201816183305-A
Country	US
Kind code	B2
Filing date	Nov 7, 2018
Priority date	Nov 13, 2017
Publication date	Jan 7, 2020
Grant date	Jan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples for efficiently representing, processing and deciding amongst multiple ambiguous interpretations of human natural language text are described. Processing includes creating and augmenting an “interpretation graph” which represents all known ambiguous interpretations of some natural language text. The interpretation graph is made of vertices (junction points which lead to alternative interpretations) and ‘lexical items’ (natural language objects representing data blocks, tokens, word parts, phrases, clauses, parts of speech, entities, or semantic interpretations) that represent alternative ambiguous interpretations of portions of the text. The examples show a set of simple operations for augmenting the interpretation graph to create alternative interpretations. Finally, the method includes a notion of “confidence”, which is computed as the graph is being constructed and can be used by a selector once the graph is complete to choose the most likely interpretation followed by any number of increasingly less likely interpretations. By saving all known ambiguous or alternative interpretations in an interpretation graph, the example system can provide better accuracy, reliability and coverage since possible alternatives are not pruned until the final end-to-end interpretation is selected.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a processor; a data reader coupled to the processor to, receive an input, wherein the input is indicative of a data stream; read data from the data stream and convert the data stream into one or more data blocks, wherein each data block represents a logical division of content from the data; and create an initial interpretation graph comprising of the one or more data blocks joined by vertices, wherein the vertices represent junction points in the input comprising of a start point of a data block and an end point of the data block; a tokenizer coupled to the processor to split the one or more data blocks into tokens to derive a plurality of tokens from the one or more data blocks using tokenization techniques; an interpretation graph creator coupled to the processor to expand the interpretation graph providing alternative interpretations of each token of the one or more data blocks, an alternative interpretation represented as a path through the interpretation graph, wherein the interpretation graph includes, additional vertices and lexical items, wherein a lexical item includes at least one of a data block, a syntactic interpretation, a semantic interpretation, and a token; the additional vertices representing junction points between the lexical points, each of the vertices including zero or more arcs directed to zero or more lexical items, an arc being a connection between two vertices passing through the lexical items; and the lexical items representing the alternative interpretations of the input covering a range of the input spanned by two vertices; wherein to expand the interpretation graph the interpretation graph creator is to augment and refine the interpretation graph by performing at least one of, creating a new lexical item and adding the new lexical item to the interpretation graph from one existing vertex from the vertices to another vertex in the additional vertices; creating a sequence of new lexical items, each lexical item in the sequence being joined in order by a vertex, wherein the sequence is added to the interpretation graph from one existing vertex from the vertices to another vertex in the vertices; determining a confidence score for each of the lexical items based on at least one of tags associated with a lexical item, external data, or predetermined rules; and modifying the confidence score associated with each of the lexical items; and a selector coupled to the processor to select the alternative interpretation from the interpretation graph, wherein the selector is to, compute an overall confidence score for the path in the interpretation graph from one vertex to another; and search through the interpretation graph to identify the path from a first vertex to a last vertex with a highest overall confidence score. 2. The system of claim 1 , wherein the input is at least one of human generated natural language input, a real-time input from a user, a user input from a voice recognition software, and an input previously authored input from an author stored into an electronic file. 3. The system of claim 1 , wherein the lexical items further include at least one of tokens derived from one or more other tokens, syntactic elements derived from other lexical items, and semantic elements derived from the other lexical items. 4. The system of claim 3 , wherein the tokens, the syntactic elements, and the semantic elements are derived using methods, which use an external resource. 5. The system of claim 4 , wherein the external resource is a trained machine learning model. 6. The system of claim 4 , wherein the external resource is a database of patterns which indicate sequences of lexical items combined together to derive other lexical items. 7. The system of claim 1 , wherein the confidence score is one of a floating point number, a multi-dimensional vector, and a complex data structure. 8. The system of claim 1 , wherein the computation of the overall confidence score is based on at least one of: a predefined rule, the predefined rule being based on the confidence score of other lexical items; an external database including semantic information pertaining to a corresponding lexical item, wherein the confidence factor is determined based on how well the external semantic information matches internal contextual information of an interpretation and other alternative interpretations; an optimization formula computed using a quantum computer; and an output of a predictive algorithm trained using machine learning techniques. 9. The system of claim 1 , wherein the interpretation graph creator to augment the interpretation graph is to execute one of a pipeline and a sequence of processing functions executed in a sequential order. 10. The system of claim 1 , wherein the interpretation graph creator to augment the interpretation graph is to execute one of a same function and a set of functions multiple times until no further modifications to the interpretation graph are performed by the functions. 11. A method comprising: receiving an input, wherein the input is indicative of a data stream; reading data from the data stream and convert the data stream into one or more data blocks, wherein each data block represents a logical division of content from the data; creating an initial interpretation graph comprising of the one or more data blocks joined by vertices, wherein the vertices represent junction points in the input comprising of a start point of a data block and an end point of the data block; splitting the one or more data blocks into tokens derive a plurality of tokens from the one or more data blocks using tokenization techniques; expanding the interpretation graph providing alternative interpretations of each token of the one or more data blocks, an alternative interpretation represented as a path through the interpretation graph, wherein the interpretation graph includes, additional vertices and lexical items, wherein a lexical item includes at least one of a data block, a syntactic interpretation, a semantic interpretation, and a token; the additional vertices representing junction points between the lexical points, each of the vertices including zero or more arcs directed to zero or more lexical items, an arc being a connection between two vertices passing through the lexical items; and the lexical items representing the alternative interpretations of the input covering a range of the input spanned by two vertices; wherein expanding further comprises augmenting and refining the interpretation graph by performing at least one of, creating a new lexical item and adding the new lexical item to the interpretation graph from one existing vertex from the vertices to another vertex in the additional vertices; creating a sequence of new lexical items, each lexical item in the sequence being joined in an order by a vertex, wherein the sequence is added to the interpretation graph from one existing vertex from the vertices to another vertex in the vertices; determining a confidence score for each of the lexical items based on at lease one of tags associated with a lexical item, external data, or predetermined rules; and modifying the confidence score associated with each of the lexical items; and selecting the alternative interpretation from the interpretation graph, wherein selecting comprises, computing an overall confidence score for the path in the interpretation graph from one vertex to another; and searching through the interpretation graph to identify the path from a first vertex to a last vertex with a highest overall confidence score. 12. The method of claim 11 ,

Assignees

Accenture Global Solutions Ltd

Inventors

Classifications

G06F40/30
Semantic analysis · CPC title
G06N20/00
Machine learning · CPC title
G06F40/284Primary
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06F40/211
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
G06F17/277Primary
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 64308600

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10528664B2 cover?: Examples for efficiently representing, processing and deciding amongst multiple ambiguous interpretations of human natural language text are described. Processing includes creating and augmenting an “interpretation graph” which represents all known ambiguous interpretations of some natural language text. The interpretation graph is made of vertices (junction points which lead to alternative int…
Who is the assignee on this patent?: Accenture Global Solutions Ltd
What technology area does this patent fall under?: Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).