What technology area does this patent fall under?

Primary CPC classification G06F16/313. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Text extraction and processing

US10963490B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10963490-B2
Application number	US-201916287323-A
Country	US
Kind code	B2
Filing date	Feb 27, 2019
Priority date	Feb 27, 2019
Publication date	Mar 30, 2021
Grant date	Mar 30, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, computer program product, and method are provided to selectively index one or more subsets of documents or files. As data is extracted from a document or file, extracted text is organized into data portions and subject to evaluations. Meta characteristic data is leveraged to assess the extracted text. One or more subsets of the organized data portions are selectively identified and subject to enrichment processing, which creates and returns enriched and indexed subsets of the documents or files.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: a processing unit operatively coupled to memory; a knowledge engine in communication with the processing unit and memory, the knowledge engine comprising: a manager to extract text from a document, the extracted text including one or more data portions; the manager to evaluate the extracted text, including calculate a score for each of the one or more extracted data portions, the calculation based on meta characteristic data associated with a position of the one or more data portions in the document and a weight; a director operatively coupled to the manager, the director to selectively identify a subset of the extracted one or more data portions of the extracted text, the identification based on the calculated score; and the director to execute enrichment processing based on the calculated score, wherein the enrichment processing is limited to the identified subset; and an indexed subset of the one or more data portions returned from the director following execution of the enrichment processing. 2. The computer system of claim 1 , wherein the selective identification of a subset further comprises the director to assign an execution priority value to each portions of the identified subset, and execute enrichment processing responsive to the assigned priority. 3. The computer system of claim 1 , further comprising the manager to process two or more documents from two or more separate storage locations, wherein the weight of each of the two or more documents is based on their storage location, an age of the document, or a combination thereof. 4. The computer system of claim 1 , wherein the processed document includes textual data, and score calculation is subject to variation based on document file format. 5. The computer system of claim 1 wherein the meta-characteristic data is selected from the group consisting of: document title, chapter title, section title, location within a chapter, location within a section, and highlighting. 6. The computer system of claim 1 , further comprising the manager to identify a select portion within the document having unstructured text, and further comprising the manager to translate the unstructured text to structured text. 7. A computer program product to process textual data, the computer program product comprising a computer readable storage device having program code embodied therewith, the program code executable by a processing unit to: process a document, including extract text from a document, the extracted text including one or more data portions; evaluate the extracted text, including calculate a score for each of the extracted one or more data portions, the calculation based on meta characteristic data associated with a position of the one or more data portions in the document and a weight; selectively identify a subset of the extracted one or more data portions of the extracted text based on the calculated score; and execute enrichment processing based on the calculated score, wherein the enrichment processing is limited to the identified subset, and an indexed subset of the one or more data portions is returned from execution of the enrichment processing. 8. The computer program product of claim 7 , wherein the program code to selectively identify a subset further comprises program code to assign an execution priority value to each portions of the identified subset, and execute enrichment processing responsive to the assigned priority. 9. The computer program product of claim 7 , further comprising program code to process two or more documents from two or more separate storage locations, wherein the weight of each of the two or more documents is based on their storage location, an age of the document, or a combination thereof. 10. The computer program product of claim 7 , wherein the processed document includes textual data, and score calculation is subject to variation based on document file format. 11. The computer program product of claim 7 , wherein the meta characteristic data is selected from the group consisting of: document title, chapter title, section title, location within a chapter, location within a section, and highlighting. 12. The computer program product of claim 7 , further comprising program code to identify a select portion within the document having unstructured text, and further comprising program code to translate the unstructured text to structured text. 13. A method for processing textual data, comprising: document processing, including extracting text from a document, the extracted text including one or more data portions; evaluating the extracted text, including calculating a score for each of the extracted one or more data portions, the calculation based on meta characteristic data associated with a position of the one or more data portions in the document and a weight; selectively identifying a subset of the extracted one or more data portions of the extracted text based on the calculated score; and executing enrichment processing based on the calculated score, wherein the enrichment processing is limited to the identified subset, and an indexed subset of the one or more data portions is returned from execution of the enrichment processing. 14. The method of claim 13 , wherein selectively identifying a subset further comprises assigning an execution priority value to each portions of the identified subset, and executing enrichment processing responsive to the assigned priority. 15. The method of claim 13 , further comprising processing two or more documents from two or more separate storage locations, wherein the weight of each of the two or more documents is based on their storage location, an age of the document, or a combination thereof. 16. The method of claim 13 , wherein the processed document includes textual data, and score calculation is subject to variation based on document file format. 17. The method of claim 13 , wherein the meta characteristic data is selected from the group consisting of: document title, chapter title, section title, location within a chapter, location within a section, and highlighting. 18. The method of claim 13 , further comprising identifying a select portion within the document having unstructured text, and further comprising translating the unstructured text to structured text.

Assignees

Inventors

Classifications

G06F16/313Primary
Selection or weighting of terms for indexing · CPC title
G06F16/31Primary
Indexing; Data structures therefor; Storage structures · CPC title
G06F16/93
Document management systems · CPC title
G06N5/02
Knowledge representation; Symbolic representation · CPC title

Patent family

Related publications grouped by family.

View patent family 72141687

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10963490B2 cover?: A system, computer program product, and method are provided to selectively index one or more subsets of documents or files. As data is extracted from a document or file, extracted text is organized into data portions and subject to evaluations. Meta characteristic data is leveraged to assess the extracted text. One or more subsets of the organized data portions are selectively identified and su…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F16/313. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).