Hierarchical segmentation of unstructured text using neural networks

US12346361B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12346361-B2
Application numberUS-202318511186-A
CountryUS
Kind codeB2
Filing dateNov 16, 2023
Priority dateNov 16, 2023
Publication dateJul 1, 2025
Grant dateJul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are disclosed for a digital design system trained to segment unstructured text into topically coherent segments. The method may include receiving unstructured text, the unstructured text including a sequence of sentences. The disclosed systems and methods further comprise generating, by a neural network, a hierarchically segmented tree structure representing the unstructured text. The tree structure comprises a plurality of tree structure nodes, where a node of the tree structure nodes represents a sentence from the sequence of sentences. The segments and sub-segments of the unstructured text can then be determined based on node data for nodes of the hierarchically segmented tree structure. Using the determined segments and sub-segments of the unstructured text, a modified representation of the unstructured text can be displayed.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: receiving unstructured text, the unstructured text including a sequence of sentences; generating, by a neural network, a hierarchically segmented tree structure representing the unstructured text, the tree structure comprising tree structure nodes, wherein a leaf node of the tree structure nodes represents a sentence from the sequence of sentences; determining segments and sub-segments of the unstructured text based on node data for the tree structure nodes of the hierarchically segmented tree structure; and presenting for display a modified representation of the unstructured text based on the determined segments and sub-segments of the unstructured text. 2. The method of claim 1 , wherein generating the hierarchically segmented tree structure representing the unstructured text further comprises: converting the hierarchically segmented tree structure representing the unstructured text from a binarized tree structure by: identifying reducible nodes in the binarized tree structure, wherein reducible nodes do not represent a sentence from the sequence of sentences, and for each reducible node, removing the reducible node and connecting direct descendants of the reducible node with a direct parent node of the reducible node. 3. The method of claim 2 , wherein generating the hierarchically segmented tree structure representing the unstructured text further comprises: generating, using a text encoder, feature vectors for each sentence of the sequence of sentences; generating, using a recurrent neural network, contextualized feature vectors using the generated feature vectors; predicting a structure of the binarized tree structure; and labeling each node of a plurality of nodes in the binarized tree structure as a reducible node or an irreducible node. 4. The method of claim 3 , wherein labeling each node of the plurality of nodes in the binarized tree structure as a reducible node or an irreducible node comprises: for each node in the binarized tree structure: determining a first probability value that a node is a reducible node and a second probability value that a node is an irreducible node, when the first probability value is greater than the second probability value, determining the node is a reducible node, and when the first probability value is lower than the second probability value, determining the node is an irreducible node. 5. The method of claim 1 , wherein presenting for display the modified representation of the unstructured text further comprises: generating, by a topic generating model, summaries for each determined segment and sub-segment of the of the unstructured text; and generating the modified representation of the unstructured text as a table of contents using the generated summaries. 6. The method of claim 5 , further comprising: receiving a first user input selecting an entry in the table of contents for the unstructured text; presenting a first summary of the entry, the first summary including second summaries for one or more sub-entries of the entry; receiving a second user input selecting one of the second summaries; and presenting a portion of the unstructured text corresponding to the selected second one of the second summaries. 7. The method of claim 1 , wherein child nodes of the hierarchically segmented tree structure having a same parent node represent sentences that are topically related. 8. A non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: receiving unstructured text, the unstructured text including a sequence of sentences; generating, by a neural network, a hierarchically segmented tree structure representing the unstructured text, the tree structure comprising tree structure nodes, wherein a leaf node of the tree structure nodes represents a sentence from the sequence of sentences; determining segments and sub-segments of the unstructured text based on node data for the tree structure nodes of the hierarchically segmented tree structure; and presenting for display a modified representation of the unstructured text based on the determined segments and sub-segments of the unstructured text. 9. The non-transitory computer-readable medium of claim 8 , wherein the operation of generating the hierarchically segmented tree structure representing the unstructured text further comprises: converting the hierarchically segmented tree structure representing the unstructured text from a binarized tree structure by: identifying reducible nodes in the binarized tree structure, wherein reducible nodes do not represent a sentence from the sequence of sentences, and for each reducible node, removing the reducible node and connecting direct descendants of the reducible node with a direct parent node of the reducible node. 10. The non-transitory computer-readable medium of claim 9 , wherein the operation of generating the hierarchically segmented tree structure representing the unstructured text further comprises: generating, using a text encoder, feature vectors for each sentence of the sequence of sentences; generating, using a recurrent neural network, contextualized feature vectors using the generated feature vectors; predicting a structure of the binarized tree structure; and labeling each node of a plurality of nodes in the binarized tree structure as a reducible node or an irreducible node. 11. The non-transitory computer-readable medium of claim 10 , wherein the operation of labeling each node of the plurality of nodes in the binarized tree structure as a reducible node or an irreducible node further comprises: for each node in the binarized tree structure: determining a first probability value that a node is a reducible node and a second probability value that a node is an irreducible node, when the first probability value is greater than the second probability value, determining the node is a reducible node, and when the first probability value is lower than the second probability value, determining the node is an irreducible node. 12. The non-transitory computer-readable medium of claim 8 , wherein the operation of presenting for display the modified representation of the unstructured text further comprises: generating, by a topic generating model, summaries for each determined segment and sub-segment of the of the unstructured text; and generating the modified representation of the unstructured text as a table of contents using the generated summaries. 13. The non-transitory computer-readable medium of claim 12 , storing instructions that further cause the processing device to perform operations comprising: receiving a first user input selecting an entry in the table of contents for the unstructured text; presenting a first summary of the entry, the first summary including second summaries for one or more sub-entries of the entry; receiving a second user input selecting one of the second summaries; and presenting a portion of the unstructured text corresponding to the selected second one of the second summaries. 14. The non-transitory computer-readable medium of claim 8 , wherein child nodes of the hierarchically segmented tree structure having a same parent node represent sentences that are topically related. 15. A system comprising: a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising: receiving unstructured text, the unstructured text including a sequence of sentences; generat

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12346361B2 cover?
Embodiments are disclosed for a digital design system trained to segment unstructured text into topically coherent segments. The method may include receiving unstructured text, the unstructured text including a sequence of sentences. The disclosed systems and methods further comprise generating, by a neural network, a hierarchically segmented tree structure representing the unstructured text. T…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/345. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).