Exploration and production document content and metadata scanner

US12437570B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12437570-B2
Application numberUS-202218260526-A
CountryUS
Kind codeB2
Filing dateJan 7, 2022
Priority dateJan 8, 2021
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method involves extracting, from a file comprising an unstructured oilfield document, terms, calculating term frequency inverse document frequency (TF-IDF) of the terms to generate an input vector, execute a document content classification model on the input vector to generate a document content classification of unstructured oilfield document, and extract table information from a table in the unstructured oilfield document. The method further involves storing, with the file in storage, the document content classification and the table information.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining, for a plurality of oilfield document content classes, a training set comprising a plurality of documents; calculating an inverse document frequency from the plurality of documents in the training set; calculating term frequency inverse document frequency (TF-IDF) of terms in the training data set to generate a plurality of TF-IDF vector results related to a plurality of document content classes; training the document content type classification model using the plurality of TF-IDF vector results; extracting, from a file comprising an unstructured oilfield document, a plurality of terms; calculating TF-IDF of the plurality of terms to generate an input vector; executing a document content classification model on the input vector to generate a document content classification of unstructured oilfield document; extracting table information from a table in the unstructured oilfield document; and storing, with the file in storage, the document content classification and the table information. 2. The method of claim 1 , wherein the document content classification comprises: a plurality of document content classes each associated with a corresponding probability of the unstructured oilfield document being in the document content class. 3. The method of claim 1 , wherein extracting table information comprises: detecting a table in the unstructured oilfield document; generating a bounding box around the table; detecting a plurality of rows and a plurality of columns of the table using the bounding box; extracting contents from the plurality of rows and the plurality of columns; interrelating the contents in the plurality of rows to obtain related contents; and storing the related contents in a comma separated value file. 4. The method of claim 3 , further comprising: obtaining, from a table control file, a table parameter of the table, wherein the table parameter specifies whether the table comprises a plurality of vertical lines, detecting the plurality of vertical lines in the table based on the table parameter; and wherein detecting the plurality of columns is performed using the plurality of vertical lines. 5. The method of claim 3 , further comprising: obtaining, from a table control file, a table parameter of the table, wherein the table parameter specifies whether the table comprises a plurality of horizontal lines, detecting the plurality of horizontal lines in the table based on the table parameter; and wherein detecting the plurality of columns is performed using the plurality of horizontal lines. 6. The method of claim 1 , further comprising: obtaining a control file comprising: a model specification of the document type classification model, and a data extraction control file path specifying a location to store the document content classification and the table information. 7. The method of claim 1 , further comprising: extracting file metadata of the file; and cataloging the unstructured oilfield document using the file metadata. 8. A system comprising: memory; and a processor for executing computer readable code configured to perform operations comprising: obtaining, for a plurality of oilfield document content classes, a training set comprising a plurality of documents; calculating an inverse document frequency from the plurality of documents in the training set; calculating term frequency inverse document frequency (TF-IDF) of terms in the training data set to generate a plurality of TF-IDF vector results related to a plurality of document content classes; training the document content type classification model using the plurality of TF-IDF vector results; extracting, from a file comprising an unstructured oilfield document, a plurality of terms, calculating TF-IDF of the plurality of terms to generate an input vector, executing a document content classification model on the input vector to generate a document content classification of unstructured oilfield document, extracting table information from a table in the unstructured oilfield document, and storing, with the file in storage, the document content classification and the table information. 9. The system of claim 8 , wherein the document content classification comprises: a plurality of document content classes each associated with a corresponding probability of the unstructured oilfield document being in the document content class. 10. The system of claim 8 , wherein extracting table information comprises: detecting a table in the unstructured oilfield document; generating a bounding box around the table; detecting a plurality of rows and a plurality of columns of the table using the bounding box; extracting contents from the plurality of rows and the plurality of columns; interrelating the contents in the plurality of rows to obtain related contents; and storing the related contents in a comma separated value file. 11. The system of claim 10 , the operations further comprising: obtaining, from a table control file, a table parameter of the table, wherein the table parameter specifies whether the table comprises a plurality of vertical lines, detecting the plurality of vertical lines in the table based on the table parameter; and wherein detecting the plurality of columns is performed using the plurality of vertical lines. 12. The system of claim 10 , the operations further comprising: obtaining, from a table control file, a table parameter of the table, wherein the table parameter specifies whether the table comprises a plurality of horizontal lines, detecting the plurality of horizontal lines in the table based on the table parameter; and wherein detecting the plurality of columns is performed using the plurality of horizontal lines. 13. The system of claim 8 , the operations further comprising: obtaining a control file comprising: a model specification of the document type classification model, and a data extraction control file path specifying a location to store the document content classification and the table information. 14. The system of claim 8 , the operations further comprising: extracting file metadata of the file; and cataloging the unstructured oilfield document using the file metadata. 15. A non-transitory computer readable medium comprising instructions that, when executed by a computer processor, perform operations comprising: obtaining, for a plurality of oilfield document content classes, a training set comprising a plurality of documents; calculating an inverse document frequency from the plurality of documents in the training set; calculating term frequency inverse document frequency (TF-IDF) of terms in the training data set to generate a plurality of TF-IDF vector results related to a plurality of document content classes; training the document content type classification model using the plurality of TF-IDF vector results; extracting, from a file comprising an unstructured oilfield document, a plurality of terms; calculating TF-IDF of the plurality of terms to generate an input vector; executing a document content classification model on the input vector to generate a document content classification of unstructured oilfield document; extracting table information from a table in the unstructured oilfield document; and storing, with the file in storage, the document content classification and the table information. 16. The non-transitory computer readable medium of claim 15 , wherein the document content classification comprises: a plurality of document content classes each associated with a corresponding p

Assignees

Inventors

Classifications

  • Recognition assisted with metadata · CPC title

  • Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title

  • based on the type of document · CPC title

  • G06V30/412Primary

    Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title

  • Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12437570B2 cover?
A method involves extracting, from a file comprising an unstructured oilfield document, terms, calculating term frequency inverse document frequency (TF-IDF) of the terms to generate an input vector, execute a document content classification model on the input vector to generate a document content classification of unstructured oilfield document, and extract table information from a table in th…
Who is the assignee on this patent?
Schlumberger Technology Corp
What technology area does this patent fall under?
Primary CPC classification G06V30/412. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).