Automated term extraction

US2019108218A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019108218-A1
Application numberUS-201816213253-A
CountryUS
Kind codeA1
Filing dateDec 7, 2018
Priority dateAug 28, 2015
Publication dateApr 11, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device may obtain a document. The device may identify a skip value for the document. The skip value may relate to a quantity of words or a quantity of characters that are to be skipped in an n-gram. The device may determine one or more skip n-grams using the skip value for the document. A skip n-gram, of the one or more skip n-grams, may include a sequence of one or more words or one or more characters with a set of occurrences in the document. The sequence of one or more words or one or more characters may include a skip value quantity of words or characters within the sequence. The device may extract one or more terms from the document based on the one or more skip n-grams. The device may provide information identifying the one or more terms.

First claim

Opening claim text (preview).

1 - 20 . (canceled) 21 . A method, comprising: performing, by a device, term extraction for a test script document using one or more term extraction techniques, the term extraction for extracting one or more terms; performing, by the device, hierarchy formation for the test script document based on results of performing the term extraction, generating, by the device, a functional diagram of the test script document based on results of performing the term extraction and performing the hierarchy formation; and providing, by the device and to via a user interface, information identifying the functional diagram of the test script document. 22 . The method of claim 21 , wherein the one or more term extraction techniques include at least one of: a skip n-gram based term extraction technique, a regular expression pattern based term extraction technique, a technical terminology identification based term extraction technique, a glossary based term extraction technique, a collocation of words based term extraction technique, a multi-word expressions based term extraction technique, a keywords based term extraction technique, a key-phrases based term extraction technique, or a topics based term extraction technique. 23 . The method of claim 21 , wherein performing hierarchy formation for the test script document comprises: performing hierarchy formation for the test script document to identify one or more relationships for the one or more terms; and wherein generating the functional diagram of the test script document comprises: generating the functional diagram of the test script document based on the one or more relationships for the one or more terms. 24 . The method of claim 21 , further comprising: performing relationship extraction to identify one or more relationships between the one or more terms. 25 . The method of claim 24 , where performing relationship extraction to identify the one or more relationships between the one or more terms comprises: performing relationship extraction to identify the one or more relationships between the one or more terms by processing the test script document to identify an identifier. 26 . The method of claim 24 , where performing relationship extraction to identify the one or more relationships between the one or more terms comprises: performing relationship extraction to identify the one or more relationships between the one or more terms by using an order of the one or more terms in the test script document. 27 . The method of claim 21 , further comprising: performing one or more other term extraction techniques to identify one or more other terms in the test script document; and providing information identifying the one or more other terms. 28 . A device, comprising: one or more memories; and one or more processors, communicatively coupled to the one or more memories, to: receive a test script document; perform term extraction for the test script document using one or more term extraction techniques to extract one or more terms from the test script document; perform hierarchy formation for the test script document to identify one or more relationships for the one or more terms; generate a functional diagram of the test script document based on results of performing the term extraction and performing the hierarchy formation; and provide, via a user interface, information identifying the functional diagram of the test script document. 29 . The device of claim 28 , wherein the one or more processors are further to: determine that a plurality of terms of the one or more terms are duplicate terms; and merge the plurality of terms into a single term. 30 . The device of claim 28 , wherein the functional diagram of the test script document includes at least one of: an indication of a quantity of duplicates of a term of the one or more terms, or an indication of a strength of a relationship between the one or more terms. 31 . The device of claim 28 , wherein the one or more processors are further to: generate a set of blocks to represent the one or more terms and a set of connectors to represent the one or more relationships. 32 . The device of claim 28 , wherein the one or more processors are further to: generate multiple functional diagrams, a first functional diagram of the multiple function diagrams being generated without merged terms; and a second functional diagram of the multiple functional diagrams being generated with merged terms. 33 . The device of claim 28 , wherein the one or more processors are to: identify the one or more terms by using a regular expression pattern based term extraction technique and a skip n-gram term extraction technique. 34 . The device of claim 33 , wherein the one or more processors are further to: process the test script document to determine a skip value for the test script document, the skip value being used to perform the skip n-gram term extraction technique on the test script document. 35 . A non-transitory computer-readable medium storing instructions, the instructions comprising: one or more instructions that, when executed by one or more processors, cause the one or more processors to: perform term extraction for a test script document for extracting one or more terms from the test script document; perform hierarchy formation for the test script document to identify one or more relationships for the one or more terms; generate a functional diagram of the test script document based on results of performing the term extraction and performing the hierarchy formation, the functional diagram including a graphical or textual representation between the one or more terms of the test script document; and provide, via a user interface, information identifying the functional diagram of the test script document. 36 . The non-transitory computer-readable medium of claim 35 , wherein functional diagram is an application flow diagram. 37 . The non-transitory computer-readable medium of claim 35 , wherein the one or more instructions, that cause the one or more processors to generate the functional diagram of the test script document, cause the one or more processors to: generate the functional diagram of the test script document by including contextual information in the functional diagram. 38 . The non-transitory computer-readable medium of claim 37 , wherein the contextual information includes at least one of: an indication of a quantity of duplicates of a term of the one or more terms, or an indication of a strength of a relationship between the one or more terms. 39 . The non-transitory computer-readable medium of claim 35 , wherein the one or more instructions, that cause the one or more processors to perform term extraction for the test script document, cause the one or more processors to: perform term extraction for the test script document using one or more term extraction techniques. 40 . The non-transitory computer-readable medium of claim 39 , wherein the one or more term extraction techniques include at least one of: a skip n-gram based term extraction technique, a regular expression pattern based term extraction technique, a technical terminology identification based term extraction technique, a glossary based term extraction technique, a collocation of words based term extraction technique, a multi-word expressions based term extraction technique, a keywords based term extraction technique, a

Assignees

Inventors

Classifications

  • Semantic analysis · CPC title

  • Morphological analysis · CPC title

  • Reverse engineering; Extracting design information from source code · CPC title

  • G06F8/425Primary

    Lexical analysis · CPC title

  • Error detection; Error correction; Monitoring (error detection, correction or monitoring in information storage based on relative movement between record carrier and transducer G11B20/18; monitoring, i.e. supervising the progress of recording or reproducing G11B27/36; in static stores G11C29/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019108218A1 cover?
A device may obtain a document. The device may identify a skip value for the document. The skip value may relate to a quantity of words or a quantity of characters that are to be skipped in an n-gram. The device may determine one or more skip n-grams using the skip value for the document. A skip n-gram, of the one or more skip n-grams, may include a sequence of one or more words or one or more …
Who is the assignee on this patent?
Accenture Global Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06F8/425. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).