What technology area does this patent fall under?

Primary CPC classification G06V30/412. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Exploration and production document content and metadata scanner

US12437570B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12437570-B2
Application number	US-202218260526-A
Country	US
Kind code	B2
Filing date	Jan 7, 2022
Priority date	Jan 8, 2021
Publication date	Oct 7, 2025
Grant date	Oct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method involves extracting, from a file comprising an unstructured oilfield document, terms, calculating term frequency inverse document frequency (TF-IDF) of the terms to generate an input vector, execute a document content classification model on the input vector to generate a document content classification of unstructured oilfield document, and extract table information from a table in the unstructured oilfield document. The method further involves storing, with the file in storage, the document content classification and the table information.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining, for a plurality of oilfield document content classes, a training set comprising a plurality of documents; calculating an inverse document frequency from the plurality of documents in the training set; calculating term frequency inverse document frequency (TF-IDF) of terms in the training data set to generate a plurality of TF-IDF vector results related to a plurality of document content classes; training the document content type classification model using the plurality of TF-IDF vector results; extracting, from a file comprising an unstructured oilfield document, a plurality of terms; calculating TF-IDF of the plurality of terms to generate an input vector; executing a document content classification model on the input vector to generate a document content classification of unstructured oilfield document; extracting table information from a table in the unstructured oilfield document; and storing, with the file in storage, the document content classification and the table information. 2. The method of claim 1 , wherein the document content classification comprises: a plurality of document content classes each associated with a corresponding probability of the unstructured oilfield document being in the document content class. 3. The method of claim 1 , wherein extracting table information comprises: detecting a table in the unstructured oilfield document; generating a bounding box around the table; detecting a plurality of rows and a plurality of columns of the table using the bounding box; extracting contents from the plurality of rows and the plurality of columns; interrelating the contents in the plurality of rows to obtain related contents; and storing the related contents in a comma separated value file. 4. The method of claim 3 , further comprising: obtaining, from a table control file, a table parameter of the table, wherein the table parameter specifies whether the table comprises a plurality of vertical lines, detecting the plurality of vertical lines in the table based on the table parameter; and wherein detecting the plurality of columns is performed using the plurality of vertical lines. 5. The method of claim 3 , further comprising: obtaining, from a table control file, a table parameter of the table, wherein the table parameter specifies whether the table comprises a plurality of horizontal lines, detecting the plurality of horizontal lines in the table based on the table parameter; and wherein detecting the plurality of columns is performed using the plurality of horizontal lines. 6. The method of claim 1 , further comprising: obtaining a control file comprising: a model specification of the document type classification model, and a data extraction control file path specifying a location to store the document content classification and the table information. 7. The method of claim 1 , further comprising: extracting file metadata of the file; and cataloging the unstructured oilfield document using the file metadata. 8. A system comprising: memory; and a processor for executing computer readable code configured to perform operations comprising: obtaining, for a plurality of oilfield document content classes, a training set comprising a plurality of documents; calculating an inverse document frequency from the plurality of documents in the training set; calculating term frequency inverse document frequency (TF-IDF) of terms in the training data set to generate a plurality of TF-IDF vector results related to a plurality of document content classes; training the document content type classification model using the plurality of TF-IDF vector results; extracting, from a file comprising an unstructured oilfield document, a plurality of terms, calculating TF-IDF of the plurality of terms to generate an input vector, executing a document content classification model on the input vector to generate a document content classification of unstructured oilfield document, extracting table information from a table in the unstructured oilfield document, and storing, with the file in storage, the document content classification and the table information. 9. The system of claim 8 , wherein the document content classification comprises: a plurality of document content classes each associated with a corresponding probability of the unstructured oilfield document being in the document content class. 10. The system of claim 8 , wherein extracting table information comprises: detecting a table in the unstructured oilfield document; generating a bounding box around the table; detecting a plurality of rows and a plurality of columns of the table using the bounding box; extracting contents from the plurality of rows and the plurality of columns; interrelating the contents in the plurality of rows to obtain related contents; and storing the related contents in a comma separated value file. 11. The system of claim 10 , the operations further comprising: obtaining, from a table control file, a table parameter of the table, wherein the table parameter specifies whether the table comprises a plurality of vertical lines, detecting the plurality of vertical lines in the table based on the table parameter; and wherein detecting the plurality of columns is performed using the plurality of vertical lines. 12. The system of claim 10 , the operations further comprising: obtaining, from a table control file, a table parameter of the table, wherein the table parameter specifies whether the table comprises a plurality of horizontal lines, detecting the plurality of horizontal lines in the table based on the table parameter; and wherein detecting the plurality of columns is performed using the plurality of horizontal lines. 13. The system of claim 8 , the operations further comprising: obtaining a control file comprising: a model specification of the document type classification model, and a data extraction control file path specifying a location to store the document content classification and the table information. 14. The system of claim 8 , the operations further comprising: extracting file metadata of the file; and cataloging the unstructured oilfield document using the file metadata. 15. A non-transitory computer readable medium comprising instructions that, when executed by a computer processor, perform operations comprising: obtaining, for a plurality of oilfield document content classes, a training set comprising a plurality of documents; calculating an inverse document frequency from the plurality of documents in the training set; calculating term frequency inverse document frequency (TF-IDF) of terms in the training data set to generate a plurality of TF-IDF vector results related to a plurality of document content classes; training the document content type classification model using the plurality of TF-IDF vector results; extracting, from a file comprising an unstructured oilfield document, a plurality of terms; calculating TF-IDF of the plurality of terms to generate an input vector; executing a document content classification model on the input vector to generate a document content classification of unstructured oilfield document; extracting table information from a table in the unstructured oilfield document; and storing, with the file in storage, the document content classification and the table information. 16. The non-transitory computer readable medium of claim 15 , wherein the document content classification comprises: a plurality of document content classes each associated with a corresponding p

Assignees

Schlumberger Technology Corp

Inventors

Classifications

G06V2201/10
Recognition assisted with metadata · CPC title
G06V30/414
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
G06V30/42
based on the type of document · CPC title
G06V30/412Primary
Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title
G06V30/19147
Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

View patent family 82358341

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12437570B2 cover?: A method involves extracting, from a file comprising an unstructured oilfield document, terms, calculating term frequency inverse document frequency (TF-IDF) of the terms to generate an input vector, execute a document content classification model on the input vector to generate a document content classification of unstructured oilfield document, and extract table information from a table in th…
Who is the assignee on this patent?: Schlumberger Technology Corp
What technology area does this patent fall under?: Primary CPC classification G06V30/412. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).