Method and system for extracting entity information from target data

US11270073B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11270073-B2
Application numberUS-201816233736-A
CountryUS
Kind codeB2
Filing dateDec 27, 2018
Priority dateDec 30, 2017
Publication dateMar 8, 2022
Grant dateMar 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a method and a system for extracting entity information from target data. The method comprises: providing the target data; refining the target data to obtain at least one base entity information having a plurality of base entity units using an algorithm, wherein the algorithm is based on a predefined syntax; generating a plurality of strings for each of the base entity information, wherein the plurality of strings comprises at least one base entity unit among the plurality of base entity units; sorting the plurality of strings in a decreasing order of length of the plurality of strings; identifying an entity type of the plurality of strings, based on an ontology, by processing the plurality of strings sequentially; assigning labels to the plurality of strings based on the entity type; and mapping the labelled plurality of strings to a predefined signature to obtain the entity information.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of extracting entity information from target data, wherein the method comprises: providing the target data; refining the target data to obtain a plurality of base entity units, wherein the target data is refined using an algorithm; generating a plurality of strings based on the plurality of base entity units, wherein the plurality of strings comprises one or more base entity unit among the plurality of base entity units; sorting the plurality of strings in a decreasing order of length; processing the sorted plurality of strings sequentially to identify one or more entity types and establish links between the one or more base entity units of the plurality of base entity units, wherein the entity type refers to a specific field to which the base entity unit is associated with, and wherein the entity type and the established units are identified based on an ontology; assigning labels to the one or more entity types; mapping the labelled one or more entity types to a predefined signature, wherein the predefined signature relates to a predefined arrangement of the entity types; processing the plurality of strings with labelled entity type to identify a pattern similar to the predefined signature; and extracting entity information based on the operation of the predefined signature and the plurality of strings. 2. The method of claim 1 , wherein the method further comprises classifying the obtained entity information based on the ontology. 3. The method of claim 1 , wherein the length of a string corresponds to a number of base entity units in the string. 4. The method of claim 1 , wherein the method comprises developing the ontology using at least one curated database by: applying conceptual indexing to plurality of entity units stored in the at least one curated database; identifying semantic associations, between the plurality of entity units, established in the at least one curated database; and identifying at least one class tagged with the plurality of entity units in the at least one curated database. 5. The method of claim 1 , wherein the algorithm used in refining the target data comprises at least one of: natural language processing, text analytics and machine learning techniques. 6. The method of claim 1 , wherein the refining of the target data comprises removing stock entity units from the at least one base entity information. 7. The method of claim 1 , wherein the mapping of the labelled plurality of strings comprises removing entity units stored in a curated English corpus from the at least one base entity information. 8. A system for extracting entity information from target data, wherein the system comprises: a database arrangement operable to store the target data and an ontology; and a processing module communicably coupled to the database arrangement, the processing module operable to: receive the target data; refine the target data to obtain a plurality of base entity units, wherein the target data is refined using an algorithm; generate a plurality of strings based on the plurality of base units, wherein the plurality of strings comprises one or more base entity unit among the plurality of base entity units; sort the plurality of strings in a decreasing order of length; processing the sorted plurality of strings sequentially to identify one or more entity types and establish links between the one or more base entity units of the plurality of base entity units, wherein the entity type refers to a specific field to which the base entity unit is associated with, and wherein the entity type and the established units are identified based on the ontology; assign labels to the one or more entity types; and map the labelled one or more entity types to a predefined signature, wherein the predefined signature relates to a predefined arrangement of the entity types; process the plurality of strings with labelled entity type to identify a pattern similar to the predefined signature; and extract entity information based on the operation of the predefined signature and the plurality of strings. 9. The system of claim 8 , wherein the processing module is further operable to classify the obtained entity information based on the ontology. 10. A non-transitory medium, containing program instructions for execution on a computer system, which when executed by a computer, cause the computer to perform method steps for extracting entity information from target data, the method comprising the steps of: providing the target data; refining the target data to obtain a plurality of base entity units, wherein the target data is refined using an algorithm; generating a plurality of strings based on the plurality of base entity units, wherein the plurality of strings comprises one or more base entity unit among the plurality of base entity units; sorting the plurality of strings in a decreasing order of length; processing the sorted plurality of strings sequentially to identify one or more entity types and establish links between the one or more base entity units of the plurality of base entity units, wherein the entity type refers to a specific field to which the base entity unit is associated with, and wherein the entity type and the established units are identified based on an ontology; assigning labels to the one or more entity types; mapping the labelled one or more entity types to a predefined signature, wherein the predefined signature relates to a predefined arrangement of the entity types; processing the plurality of strings with labelled entity type to identify a pattern similar to the predefined signature; and extracting entity information based on the operation of the predefined signature and the plurality of strings.

Assignees

Inventors

Classifications

  • Indexing; Web crawling techniques · CPC title

  • G06F40/295Primary

    Named entity recognition · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • of unstructured textual data (document management systems G06F16/93) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11270073B2 cover?
Disclosed is a method and a system for extracting entity information from target data. The method comprises: providing the target data; refining the target data to obtain at least one base entity information having a plurality of base entity units using an algorithm, wherein the algorithm is based on a predefined syntax; generating a plurality of strings for each of the base entity information,…
Who is the assignee on this patent?
Innoplexus Ag
What technology area does this patent fall under?
Primary CPC classification G06F40/295. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).