Automated inspection system
US-2024420305-A1 · Dec 19, 2024 · US
US2025329008A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025329008-A1 |
| Application number | US-202418641576-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 22, 2024 |
| Priority date | Apr 22, 2024 |
| Publication date | Oct 23, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An inspection system includes a model architecture for industrial defect identification. The model architecture includes a text encoder model that receives a text object having free-form text and generates a text embedding. A visual encoder model receives a region of interest of an image and generates a region embedding. A cross-modality fusion layer acts between the text encoder model and the visual encoder model to fuse outputs of nodes within the models to be used as inputs to nodes in a subsequent layer. A cross-modality decoder model aligns the text embedding and the region embedding to generate a bounding box for the region if it is similar to the text object. A positional encoder generates a positional embedding based on the bounding box. A mask decoder model generates a segmentation mask based on the positional embedding within an output to highlight the region defined by the text object.
Opening claim text (preview).
1 . A method comprising: generating a text embedding using a text encoder model for a text object of free-form text; generating a region embedding within an image using a visual encoder model, wherein the region embedding defines a region of interest within the image; fusing output of a layer within the text encoder model with output of a layer within the visual encoder model using a cross-modality fusion layer; using the fused outputs of the layers of the text encoder model and the visual encoder model as input to a subsequent layer of the text encoder model and the visual encoder model; aligning the text embedding with the region embedding to generate a bounding box for at least one instance of the text object using a cross-modality decoder model if the at least one instance of the text object is present in the image; and generating a positional embedding using a positional encoder based on coordinates of the bounding box, wherein the positional embedding indicates a location of the at least one instance of the text object within the image. 2 . The method of claim 1 , further comprising providing the image to an image encoder model; and generating an image embedding for the image. 3 . The method of claim 2 , further comprising receiving the positional embedding from the positional encoder and the image embedding from the image encoder model at a mask decoder. 4 . The method of claim 3 , further comprising generating a segmentation mask for the at least one instance of the text object within the image using the mask decoder. 5 . The method of claim 4 , wherein the segmentation mask covers an area of a defect to be identified within the image for a component under inspection. 6 . The method of claim 1 , further comprising creating a bounding box for the region of interest of the region embedding. 7 . The method of claim 6 , further comprising determining a confidence score for the bounding box. 8 . The method of claim 7 , wherein the confidence score is based a similarity between the text embedding and the region embedding. 9 . The method of claim 1 , wherein the text embedding is a vector generated by the text encoder model. 10 . The method of claim 1 , wherein the region embedding is a vector generated by the visual encoder model. 11 . The method of claim 1 , wherein the text encoder model is trained using a curated natural language dataset. 12 . A method for industrial defect identification, the method comprising: receiving an image of a component; receiving a text object of free-form text describing a defect of the component to be identified within the image; generating a text embedding using a text encoder model based on the text object; generating a region embedding for the image using a visual encoder model, wherein the region embedding defines a region of interest within the image, and wherein the outputs of at least one layer within the visual encoder model are fused with outputs of at least one layer within the text encoder model so that the fused outputs are input into a subsequent layer within the text encoder model and the visual encoder model; predicting how similar the text embedding and the region embedding are to each other using a cross-modality decoder model; determining a positional embedding using a positional encoder based on the prediction, wherein the positional embedding indicates a location of an instance of the text object within the image; and generating a segmentation mask for the instance of the text object based on the positional embedding. 13 . The method of claim 12 , further comprising providing the image to an image encoder model; and generating an image embedding for the image. 14 . The method of claim 13 , further comprising receiving the positional embedding from the positional encoder and the image embedding from the image encoder model at a mask decoder. 15 . The method of claim 14 , further comprising generating the segmentation mask for the instance of the text object within the image using the mask decoder. 16 . The method of claim 12 , further comprising creating a bounding box for the instance of the text object if the instance of the text object is present in the image; and determining a confidence score for the bounding box. 17 . A system for industrial defect identification, the system comprising: a text encoder model configured to generate a text embedding for a text object of free-form text, wherein the text object relates to a feature within an image of a component; a visual encoder model configured to generate a region embedding within the image of the component, wherein the region embedding defines a region of interest within the image; a cross-modality fusion layer configured to fuse output of a layer within the text encoder model with output of a layer within the visual encoder model, wherein the fused outputs of the layers of the text encoder model and the visual encoder model are used as inputs to a subsequent layer of the text encoder model and the visual encoder model; a cross-modality decoder model configured to align the text embedding with the region embedding to generate a bounding box for at least one instance of the text object if the at least one instance of the text object is present in the image; and a positional encoder configured to generate a positional embedding based on coordinates of the bounding box, wherein the positional embedding indicates a location of the text object within the image. 18 . The system of claim 17 , further comprising an image encoder model configured to generate an image embedding for the image. 19 . The system of claim 18 , further comprising a mask decoder configured to receive the positional embedding from the positional encoder and the image embedding from the image encoder model. 20 . The system of claim 19 , wherein the mask decoder is further configured to generate a segmentation mask for the at least one instance of the text object within the image.
Metal · CPC title
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Industrial image inspection · CPC title
Region-based segmentation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.