Who is the assignee on this patent?

Microsoft Technology Licensing Llc, Micrsoft Tech Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automated structured textual content categorization accuracy with neural networks

US11734559B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11734559-B2
Application number	US-202016907165-A
Country	US
Kind code	B2
Filing date	Jun 19, 2020
Priority date	Jun 19, 2020
Publication date	Aug 22, 2023
Grant date	Aug 22, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

To provide automated categorization of structured textual content individual nodes of textual content, from a document object model encapsulation of the structured textual content, have a multidimensional vector associated with them, where the values of the various dimensions of the multidimensional vector are based on the textual content in the corresponding node, the visual features applied or associated with the textual content of the corresponding node, and positional information of the textual content of the corresponding node. The multidimensional vectors are input to a neighbor-imbuing neural network. The enhanced multidimensional vectors output by the neighbor-imbuing neural network are then be provided to a categorization neural network. The resulting output can be in the form of multidimensional vectors whose dimensionality is proportional to categories into which the structured textual content is to be categorized. A weighted merge takes into account multiple nodes that are grouped together.

First claim

Opening claim text (preview).

We claim: 1. A method of improving automated structured textual content categorization accuracy, the method comprising: providing a first vector to a first neural network, where the first vector corresponds to a first discrete portion of the structured textual content; immediately after providing the first vector to the first neural network, providing a second vector to the first neural network, where the second vector corresponds to the second discrete portion of the structured textual content, where the second vector is provided to the first neural network immediately after the first neural network is provided to the first neural network due to the second discrete portion neighboring the first discrete portion in the structured textual content, where the first neural network generates a third vector that corresponds to the first discrete portion based upon the first vector and the second vector, and further where the first neural network generates a fourth vector that corresponds to the second discrete portion based upon the first vector and the second vector; providing the third vector to a second neural network, where the second neural network outputs a fifth vector that corresponds to the first discrete portion, and further where the fifth vector includes first values for categories that are assignable to discrete portions of the structured textual document; providing the fourth vector to the second neural network, where the second neural network outputs a sixth vector that corresponds to the second discrete portion, and further where the sixth vector includes second values for the categories; categorizing the first discrete portion in accordance with the first values of the fifth vector output by the second neural network; and categorizing the second discrete portion in accordance with the second values of the sixth vector output by the second neural network. 2. The method of claim 1 , further comprising: generating a document object model representation of the structured textual content; wherein the first discrete portion and the second discrete portion are delineated by discrete leaf nodes of the generated document object model representation. 3. The method of claim 1 , wherein the first vector comprises first dimensional values representing first visual features applied to the first discrete portion of the structured textual content when the structured textual content is rendered for display, and further wherein the second vector comprises second dimensional values representing second visual features applied to the second discrete portion of the structured textual content when the structured textual content is rendered for the display. 4. The method of claim 3 , wherein the first and second dimensional values comprise dimensional values that are based on fonts associated with the first and second discrete portions of the structured textual content. 5. The method of claim 3 , wherein the first and second dimensional values comprise dimensional values that are based on HTML element types associated with the first and second discrete portions of the structured textual content. 6. The method of claim 3 , wherein the first and second dimensional values comprise dimensional values based on style sheets associated with the first and second discrete portions of the structured textual content. 7. The method of claim 3 , wherein the first and second vectors further comprise dimensional values representing text of the first and second discrete portions of the structured textual content. 8. The method of claim 7 , wherein the first and second dimensional values are based on semantic meanings of the text of the first and second discrete portions of the structured textual content. 9. The method of claim 1 , further comprising: generating a visual feature vector that corresponds to the first discrete portion, the visual feature vector comprising first dimensional values representing visual features applied to the first discrete portion of the structured textual content when the structured textual content is rendered for display; generating a text vector that corresponds to the first discrete portion, the text vector comprising second dimensional values representing text of the first discrete portion of the structured textual content; and generating the first vector by amalgamating the visual feature vector and the text vector. 10. The method of claim 9 , further comprising: projecting the text vector to a higher dimensional text vector; and projecting the visual feature vector to a higher dimensional visual feature vector; wherein the amalgamating comprises adding the higher dimensional text vectors to the higher dimensional visual features vectors. 11. The method of claim 9 , wherein the first dimensional values are assigned to different dimensions than the second dimensional values; and wherein further the first dimensional values comprise third dimensional values that are based on the text of the first discrete portion. 12. The method of claim 3 , wherein the first and second vectors further comprise dimensional values representing visual positionings of the first and second discrete portions of the structured textual content when the structured textual content is rendered for display. 13. The method of claim 1 , further comprising: generating a visual feature vector that corresponds to the first discrete portion, the visual feature vector comprising first dimensional values representing visual features applied to the first discrete portion of the structured textual content when the structured textual content is rendered for display; generating a positional vector that corresponds to the first discrete portion, the positional vector comprising second dimensional values representing visual positioning of the first discrete portion of the structured textual content when the structured textual content is rendered for display; and generating the first feature vector by amalgamating the visual feature vector and the positional vector. 14. The method of claim 1 , wherein the first discrete portion is categorized in accordance with a category corresponding to a dimension having a largest dimensional value in the fifth vector. 15. The method of claim 1 , wherein the first discrete portion is categorized in accordance with categories corresponding to dimensions having dimensional values in the fifth vector that are greater than a threshold. 16. The method of claim 15 , wherein the first discrete portion is simultaneously categorized into two or more categories. 17. The method of claim 1 , wherein the first discrete portion is categorized in accordance with dimensional values of an aggregated vector generated by summing vectors output by the third neural network, where the vectors correspond to individual sub-portions of the first discrete portion of the structured textual content. 18. The method of claim 17 , wherein the summing the vectors comprises first weighting each individual vector based on a quantity of characters of textual content in a corresponding individual sub-portion. 19. One or more computer-readable storage media comprising computer-executable instructions, which, when executed, cause one or more computing devices to perform acts comprising: providing a first vector to a first neural network, where the first vector corresponds to a first discrete portion of the structured textual content; immediately after providing the first vector to the first neural network, providing a second vector to the first neural network,

Assignees

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06F9/30036
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
G06N3/08Primary
Learning methods · CPC title
G06F16/9577
Optimising the visualization of content, e.g. distillation of HTML documents · CPC title

Patent family

Related publications grouped by family.

View patent family 75919422

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11734559B2 cover?: To provide automated categorization of structured textual content individual nodes of textual content, from a document object model encapsulation of the structured textual content, have a multidimensional vector associated with them, where the values of the various dimensions of the multidimensional vector are based on the textual content in the corresponding node, the visual features applied o…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc, Micrsoft Tech Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for a transformer network with tree-based attention for natural language processing

Automatically detecting user-requested objects in images

Classifying Structural Features of a Digital Document by Feature Type using Machine Learning

Automatic definition of set of categories for document classification

Generating interactive content items based on content displayed on a computing device

Concept indexing among database of documents using machine learning techniques

Frequently asked questions