Reflowing documents to display semantically related content

US12400384B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12400384-B2
Application numberUS-202318460401-A
CountryUS
Kind codeB2
Filing dateSep 1, 2023
Priority dateSep 1, 2023
Publication dateAug 26, 2025
Grant dateAug 26, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are disclosed for reflowing documents to display semantically related content. The method may include receiving a request to view a document that includes body text and one or more images. A trimodal document relationship model identifies relationships between segments of the body text and the one or more images. A linearized view of the document is generated based on the relationships and the linearized view is caused to be displayed on a user device.

First claim

Opening claim text (preview).

We claim: 1. A method, comprising: receiving a request to view a document that includes body text, one or more images, and one or more associated captions; identifying, using a trimodal document relationship model, relationships between segments of the body text and the one or more images, wherein the trimodal document relationship model: generates a contextual embedding for each segment of the body text, image, and associated caption, and predicts at least one segment of the body text associated with each image from the one or more images based on a similarity score determined between a plurality of image-caption pairs and segments of the body text based on their contextual embeddings which encode a combination of image embeddings, text embeddings, segment embeddings, and position embeddings; generating a linearized view of the document based on the relationships; and causing the linearized view to be displayed on a user device. 2. The method of claim 1 , wherein identifying, using a trimodal document relationship model, relationships between segments of the body text and the one or more images, further comprises: receiving, by the trimodal document relationship model, a plurality of segments of the body text, the one or more images, and one or more associated captions from the document. 3. The method of claim 2 , wherein the trimodal document relationship model includes a transformer encoder. 4. The method of claim 3 , wherein each segment embedding defines a segment type and each position embedding indicates a position of the segment of body text, image, or associated caption. 5. The method of claim 1 , wherein a segment of body text includes a section, a paragraph, or a sentence. 6. The method of claim 1 , wherein the linearized view is a linear presentation of the segments of the body text, wherein each segment of the body text determined to be associated with an image from the one or more images has an associated user interface element rendered in the linearized view. 7. The method of claim 6 , further comprising: receiving a selection of a first user interface element associated with a first segment of the body text in the linearized view; and causing an adjustable split screen to be displayed on the user device, wherein a first pane of the split screen displays a first image determined to be associated with the first segment, and a second pane of the split screen displays at least some of the first segment of the body text. 8. The method of claim 7 , wherein multiple images are associated with the first segment of the body text, and wherein the first screen of the split screen includes a second user interface element which, when selected, causes a different image from the multiple images to be displayed in the first screen. 9. The method of claim 7 , wherein the first pane and the second pane of the adjustable split screen are resizable using an interactive user interface element. 10. A non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: receiving a request to view a document that includes body text, one or more images, and one or more associated captions; identifying, using a trimodal document relationship model, relationships between segments of the body text and the one or more images, wherein the trimodal document relationship model: generates a contextual embedding for each segment of the body text, image, and associated caption, and predicts at least one segment of the body text associated with each image from the one or more images based on a similarity score determined between a plurality of image-caption pairs and segments of the body text based on their contextual embeddings which encode a combination of image embeddings, text embeddings, segment embeddings, and position embeddings; generating a linearized view of the document based on the relationships; and causing the linearized view to be displayed on a user device. 11. The non-transitory computer-readable medium of claim 10 , wherein the operation of identifying, using a trimodal document relationship model, relationships between segments of the body text and the one or more images, further comprises: receiving, by the trimodal document relationship model, a plurality of segments of the body text, the one or more images, and one or more associated captions from the document. 12. The non-transitory computer-readable medium of claim 11 , wherein the trimodal document relationship model includes a transformer encoder. 13. The non-transitory computer-readable medium of claim 12 , wherein each segment embedding defines a segment type and each position embedding indicates a position of the segment of body text, image, or associated caption. 14. The non-transitory computer-readable medium of claim 10 , wherein the linearized view is a linear presentation of the segments of the body text, wherein each segment of the body text determined to be associated with an image from the one or more images has an associated user interface element rendered in the linearized view. 15. The non-transitory computer-readable medium of claim 14 , wherein the operations further comprise: receiving a selection of a first user interface element associated with a first segment of the body text in the linearized view; and causing an adjustable split screen to be displayed on the user device, wherein a first pane of the split screen displays a first image determined to be associated with the first segment, and a second pane of the split screen displays at least some of the first segment of the body text. 16. The non-transitory computer-readable medium of claim 15 , wherein multiple images are associated with the first segment of the body text, and wherein the first screen of the split screen includes a second user interface element which, when selected, causes a different image from the multiple images to be displayed in the first screen. 17. The non-transitory computer-readable medium of claim 15 , wherein the first pane and the second pane of the adjustable split screen are resizable using an interactive user interface element. 18. A system, comprising: a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising: receiving, by a trimodal document relationship model, a plurality of elements of a document, wherein the elements include segments of body text, images, and image captions; generating, by a feature extractor of the trimodal document relationship model, an element embedding for each element of the document; generating a segment embedding, indicating an element type, and a position embedding, indicating an element position within the document, for each element of the document; combining each element embedding, segment embedding, and position to create a combined embedding for each element of the document; generating, by a transformer encoder of the trimodal document relationship model, a contextual embedding for each element of the document corresponding to each combined embedding; and determining semantic relationships between the segments of the body text and the images in the document using their contextual embeddings. 19. The system of claim 18 , wherein the operations further comprise: generating a reflowed document based on the semantic relationships. 20. The system of claim 18 , wherein the operation of determining semantic relationships bet

Assignees

Inventors

Classifications

  • for image manipulation, e.g. dragging, rotation, expansion or change of colour · CPC title

  • Split screen, i.e. subdividing the display area or the window area into separate subareas · CPC title

  • involving graphical user interfaces [GUIs] · CPC title

  • Selection of displayed objects or displayed text elements (G06F3/0482 takes precedence) · CPC title

  • Annotation, e.g. comment data or footnotes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12400384B2 cover?
Embodiments are disclosed for reflowing documents to display semantically related content. The method may include receiving a request to view a document that includes body text and one or more images. A trimodal document relationship model identifies relationships between segments of the body text and the one or more images. A linearized view of the document is generated based on the relationsh…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 26 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).