What technology area does this patent fall under?

Primary CPC classification G06T7/0004. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Oct 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods and system for industrial defect identification

US2025329008A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2025329008-A1
Application number	US-202418641576-A
Country	US
Kind code	A1
Filing date	Apr 22, 2024
Priority date	Apr 22, 2024
Publication date	Oct 23, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An inspection system includes a model architecture for industrial defect identification. The model architecture includes a text encoder model that receives a text object having free-form text and generates a text embedding. A visual encoder model receives a region of interest of an image and generates a region embedding. A cross-modality fusion layer acts between the text encoder model and the visual encoder model to fuse outputs of nodes within the models to be used as inputs to nodes in a subsequent layer. A cross-modality decoder model aligns the text embedding and the region embedding to generate a bounding box for the region if it is similar to the text object. A positional encoder generates a positional embedding based on the bounding box. A mask decoder model generates a segmentation mask based on the positional embedding within an output to highlight the region defined by the text object.

First claim

Opening claim text (preview).

1 . A method comprising: generating a text embedding using a text encoder model for a text object of free-form text; generating a region embedding within an image using a visual encoder model, wherein the region embedding defines a region of interest within the image; fusing output of a layer within the text encoder model with output of a layer within the visual encoder model using a cross-modality fusion layer; using the fused outputs of the layers of the text encoder model and the visual encoder model as input to a subsequent layer of the text encoder model and the visual encoder model; aligning the text embedding with the region embedding to generate a bounding box for at least one instance of the text object using a cross-modality decoder model if the at least one instance of the text object is present in the image; and generating a positional embedding using a positional encoder based on coordinates of the bounding box, wherein the positional embedding indicates a location of the at least one instance of the text object within the image. 2 . The method of claim 1 , further comprising providing the image to an image encoder model; and generating an image embedding for the image. 3 . The method of claim 2 , further comprising receiving the positional embedding from the positional encoder and the image embedding from the image encoder model at a mask decoder. 4 . The method of claim 3 , further comprising generating a segmentation mask for the at least one instance of the text object within the image using the mask decoder. 5 . The method of claim 4 , wherein the segmentation mask covers an area of a defect to be identified within the image for a component under inspection. 6 . The method of claim 1 , further comprising creating a bounding box for the region of interest of the region embedding. 7 . The method of claim 6 , further comprising determining a confidence score for the bounding box. 8 . The method of claim 7 , wherein the confidence score is based a similarity between the text embedding and the region embedding. 9 . The method of claim 1 , wherein the text embedding is a vector generated by the text encoder model. 10 . The method of claim 1 , wherein the region embedding is a vector generated by the visual encoder model. 11 . The method of claim 1 , wherein the text encoder model is trained using a curated natural language dataset. 12 . A method for industrial defect identification, the method comprising: receiving an image of a component; receiving a text object of free-form text describing a defect of the component to be identified within the image; generating a text embedding using a text encoder model based on the text object; generating a region embedding for the image using a visual encoder model, wherein the region embedding defines a region of interest within the image, and wherein the outputs of at least one layer within the visual encoder model are fused with outputs of at least one layer within the text encoder model so that the fused outputs are input into a subsequent layer within the text encoder model and the visual encoder model; predicting how similar the text embedding and the region embedding are to each other using a cross-modality decoder model; determining a positional embedding using a positional encoder based on the prediction, wherein the positional embedding indicates a location of an instance of the text object within the image; and generating a segmentation mask for the instance of the text object based on the positional embedding. 13 . The method of claim 12 , further comprising providing the image to an image encoder model; and generating an image embedding for the image. 14 . The method of claim 13 , further comprising receiving the positional embedding from the positional encoder and the image embedding from the image encoder model at a mask decoder. 15 . The method of claim 14 , further comprising generating the segmentation mask for the instance of the text object within the image using the mask decoder. 16 . The method of claim 12 , further comprising creating a bounding box for the instance of the text object if the instance of the text object is present in the image; and determining a confidence score for the bounding box. 17 . A system for industrial defect identification, the system comprising: a text encoder model configured to generate a text embedding for a text object of free-form text, wherein the text object relates to a feature within an image of a component; a visual encoder model configured to generate a region embedding within the image of the component, wherein the region embedding defines a region of interest within the image; a cross-modality fusion layer configured to fuse output of a layer within the text encoder model with output of a layer within the visual encoder model, wherein the fused outputs of the layers of the text encoder model and the visual encoder model are used as inputs to a subsequent layer of the text encoder model and the visual encoder model; a cross-modality decoder model configured to align the text embedding with the region embedding to generate a bounding box for at least one instance of the text object if the at least one instance of the text object is present in the image; and a positional encoder configured to generate a positional embedding based on coordinates of the bounding box, wherein the positional embedding indicates a location of the text object within the image. 18 . The system of claim 17 , further comprising an image encoder model configured to generate an image embedding for the image. 19 . The system of claim 18 , further comprising a mask decoder configured to receive the positional embedding from the positional encoder and the image embedding from the image encoder model. 20 . The system of claim 19 , wherein the mask decoder is further configured to generate a segmentation mask for the at least one instance of the text object within the image.

Assignees

Rtx Corp

Inventors

Yu Yang

Classifications

G06T2207/30136
Metal · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T2207/20081
Training; Learning · CPC title
G06T7/0004Primary
Industrial image inspection · CPC title
G06T7/11
Region-based segmentation · CPC title

Patent family

Related publications grouped by family.

View patent family 97383654

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025329008A1 cover?: An inspection system includes a model architecture for industrial defect identification. The model architecture includes a text encoder model that receives a text object having free-form text and generates a text embedding. A visual encoder model receives a region of interest of an image and generates a region embedding. A cross-modality fusion layer acts between the text encoder model and the …
Who is the assignee on this patent?: Rtx Corp
What technology area does this patent fall under?: Primary CPC classification G06T7/0004. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Oct 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).