What technology area does this patent fall under?

Primary CPC classification G06F21/6245. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Smart identification of indicator text with full-text search or optimized document analysis

US12417306B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12417306-B2
Application number	US-202218068022-A
Country	US
Kind code	B2
Filing date	Dec 19, 2022
Priority date	Dec 19, 2022
Publication date	Sep 16, 2025
Grant date	Sep 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Several aspects for optimizing unstructured document analysis comprise operating a document system, where the document system comprises a plurality of documents comprising unstructured content and a full-text index; receiving a request to identify documents comprising a type of data elements; selecting a sample out of the plurality of documents; determining data elements of the type in the sample of documents; determining an indicator context expression for the type of data elements out of the determined data elements of the type; determining a query for searching, using a search engine, the full-text index using the indicator context expression; and determining the documents in the document system being compliant to the determined query.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for optimizing unstructured document analysis, said method comprising: operating a document system, said document system comprising a plurality of documents comprising unstructured content and a full-text index; receiving a request to identify documents comprising a type of data elements; selecting a sample out of said plurality of documents, wherein selecting the sample comprises an existing sampling approach to identify a representative, small subset of the large number of documents in a scope of the request; determining data elements of said type in said sample of said plurality of documents; determining an indicator context expression for said type of data elements out of the determined data elements of said type; determining a query for searching, using a search engine, said full-text index using said indicator context expression; and determining the documents in said document system being compliant to said query. 2. The method according to claim 1 , wherein a number of documents in said sample is at least 10 times smaller than a second number of documents in said document system. 3. The method according to claim 1 , wherein said determining the data elements of said type in said sample comprises: determining a number of relevant sample documents in said sample; and upon determining that said number of relevant sample documents is below a predefined sample threshold value, selecting a larger sample out of said plurality of documents. 4. The method according to claim 1 , wherein said determining the documents in said document system being compliant to said query comprises: applying a full analysis related to the documents in said document system. 5. The method according to claim 1 , further comprising: determining a result number of the documents being compliant with said query; and upon said result number being determined to be equal or outside predefined boundaries, adjusting said query and repeating said determining the documents in said document system being compliant to said query. 6. The method according to claim 5 , further comprising: upon said result number having a value within predefined boundaries and a quality indicator value being larger than a predefined quality indicator threshold value, wherein said quality indicator value being indicative of a quality criterion of said type of data elements, stopping the repeating. 7. The method according to claim 1 , further comprising: repeating said steps of: determining indicator context expressions, determining the query for searching said full-text index, and determining the documents in said document system being compliant to said query, thereby redefining a scope of said indicator context expression. 8. The method according to claim 1 , wherein said determining said indicator context expression comprises: selecting an expression to a left of a determined data element as one indicator context expression; and selecting another expression to a right of said determined data element as another indicator context expression. 9. The method according to claim 1 , wherein determining said indicator context expression comprises: selecting an expression as said indicator context expression in a surrounding of a determined data element, wherein said expression has another format than other elements in said surrounding of said determined data element. 10. The method according to claim 1 , wherein determining said indicator context expression comprises: using a trained machine-learning model that has been trained to determine said indicator context expression for a determined data element in a given document, wherein said machine-learning model has been developed by a training of a machine-learning system with documents with labelled selected data elements and related indicator context expressions. 11. The method according to claim 1 , wherein determining said indicator context expressions comprises: using an association model adapted for detecting strong relationship patterns between a determined data element and a potential indicator context expression; and confirming said potential indicator context expression as an actual indicator context expression based on an analysis of other documents comprising said relationship of said potential indicator context expression and said determined data element. 12. A computer-implemented document analysis system for optimizing unstructured document analysis, said system comprising: a processor and a memory operatively coupled to said processor, wherein said memory stored program code portions, which, when executed enable said processor to: operate a document system, said document system comprising a plurality of documents comprising unstructured content and a full-text index; receive a request to identify documents comprising a type of data elements; select a sample out of said plurality of documents, wherein selecting the sample comprises an existing sampling approach to identify a representative, small subset of the large number of documents in a scope of the request; determine data elements of said type in said sample of said plurality of documents; determine an indicator context expression for said type of data elements out of the determined data elements of said type; determine a query for searching, using a search engine, said full-text index using said indicator context expression; and determine the documents in said document system being compliant to said query. 13. The system of claim 12 , wherein a number of documents in said sample is at least ten times smaller than a second number of documents in said document system. 14. The system of claim 12 , wherein, during said determining said data elements of said type in said sample of documents, said processor is also adapted to: determine a number of relevant sample documents in said sample; and upon determining that said number of relevant sample documents is below a predefined sample threshold value, selecting a larger sample out of said plurality of documents. 15. The system of claim 12 , wherein during said determining the documents in said document system, said processor is also adapted to: apply a full analysis system related to said document system. 16. The system of claim 12 , wherein said processor is also adapted to: determine a result number of the documents being compliant with said query; and upon a determination that said result number is equal or outside predefined boundaries, adjust said query and execute a repetition of said determining the documents in said document system being compliant to said query. 17. The system according to claim 16 , wherein said processor, upon said result number having a value within predefined boundaries, is also adapted to: upon a quality indicator value being larger than a predefined quality indicator threshold value, wherein said quality indicator value being indicative of a quality criterion of said type of determined data element, stop said repetition. 18. The system according to claim 12 , wherein said processor is also adapted to: repeat said determining indicator context expressions, said determining said query for searching said full-text index, and said determining the documents in said document system being compliant to said query, thereby redefining a scope of said indicator context expressions. 19. The system according to claim 12 , wherein said processor, during said determining said indicator context expression, is also adapted to: selec

Assignees

Inventors

Classifications

G06F16/31
Indexing; Data structures therefor; Storage structures · CPC title
G06F16/3325
Reformulation based on results of preceding query · CPC title
G06F16/93
Document management systems · CPC title
G06F21/6245Primary
Protecting personal data, e.g. for financial or medical purposes · CPC title

Patent family

Related publications grouped by family.

View patent family 91472608

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12417306B2 cover?: Several aspects for optimizing unstructured document analysis comprise operating a document system, where the document system comprises a plurality of documents comprising unstructured content and a full-text index; receiving a request to identify documents comprising a type of data elements; selecting a sample out of the plurality of documents; determining data elements of the type in the samp…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Composite symbolic and non-symbolic artificial intelligence system for advanced reasoning and semantic search

Techniques for generating predictive outcomes relating to oncological lines of therapy using artificial intelligence

Multi-stage image querying

Method of and system for recommending fresh search query suggestions on search engine

Systems and methods for searching unstructured documents for structured data

Entity and attribute resolution in conversational applications

Frequently asked questions