Extracting a portion of a document, such as a web page

US9430583B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9430583-B1
Application numberUS-201113158343-A
CountryUS
Kind codeB1
Filing dateJun 10, 2011
Priority dateJun 10, 2011
Publication dateAug 30, 2016
Grant dateAug 30, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A portion data structure representing a portion extracted from a formatted source document is described. A portion data structure contains a first subtree of nodes that is modeled after a second subtree of a complete hierarchical representation of the formatted source document. Explicit formatting attribute values are specified for nodes of the first subtree only where a value calculated for the formatting attribute in a node of the first subtree differs from a value calculated for the formatting attribute in the corresponding node in the second subtree at a time when the node of the first subtree descends from a reset node specifying standardized formatting attribute values. The contents of the portion data structure are usable to display the portion extracted from the formatted source document in a context other than the formatted source document.

First claim

Opening claim text (preview).

I claim: 1. A method caused to be performed by at least one computing system having a processor, the method comprising: generating a list of nodes in a rendered version of a subject web page, each node having a respective location score having a value identical to other location scores of each node in the list; determining a location of a pointer displayed in relation to the rendered version of the subject web page according to scrolling adjusted coordinates of the displayed pointer, the location indicating a first node of the list of nodes, the first node having a first location score reduced by a factor; for each node of the list of nodes, updating the respective location score, the updated respective location score for each node having an updated value based on a distance of the pointer from a corner associated with the node and a square root of an area of the node; identifying, based on a lowest updated value of the updated location scores, a portion of the rendered version of the subject web page corresponding to at least one subtree of a document object model tree created for the subject web page; establishing in the document object model tree a reset node comprising a stylesheet specifying a predetermined standardized set of formatting attribute values inheritable by descendents of the reset node; for each subtree of the document object model tree created for the subject web page to which the identified first node of the rendered version of the subject web page corresponds: traversing the subtree; for each node of the subtree visited during the traversal: establishing a corresponding node as a descendent of the reset node, the established corresponding node having a type matching a type of the node of the subtree; where the node of the subtree has calculated values for any of a plurality of formatting attributes, for each of the plurality of formatting attributes: determining a calculated value of the formatting attribute in the node of the subtree; determining a calculated value of the formatting attribute in the corresponding node, the determined calculated value of the formatting attribute in the corresponding node being inherited from the predetermined standardized set of formatting attribute values; determining that the calculated values differ; and only when it is determined that calculated values differ, explicitly specifying for the corresponding node the determined calculated value of the formatting attribute in the node of the subtree. 2. The method of claim 1 , further comprising transforming a subtree of the document object model tree defined by the reset node or a descendent of the reset node into a tag language representation. 3. The method of claim 2 wherein the tag language representation into which the subtree is transformed is an HTML representation. 4. The method of claim 1 , further comprising storing a representation of a subtree of the document object model tree defined by the reset node or a descendent of the reset node on a server. 5. The method of claim 1 , further comprising generating a destination web page containing the established corresponding nodes. 6. The method of claim 1 , wherein the act of receiving user input comprises: for each of a plurality of selectable nodes of the document object model tree, determining a location and size of the node in the rendered version of the subject web page; receiving a mouse cursor location; and selecting one of the plurality of selectable nodes based upon the size determined for the node in the rendered version of the subject web page and location of the mouse cursor relative to the location determined for the node in the rendered version of the subject web page. 7. The method of claim 6 , wherein the selecting comprises selecting from among the nodes containing the location of the mouse cursor the node. 8. The method of claim 6 , further comprising limiting the plurality of selectable nodes to include only nodes explicitly designated within the subject web page as selectable nodes. 9. The method of claim 1 , wherein the reset node is at a same level of a hierarchy of nodes of the document object model tree as a highest node of the subtrees. 10. A computer-readable storage medium having contents configured to cause a computing system to perform a method for extracting a portion of a selected hierarchical document, the selected hierarchical document comprised of nodes in an arrangement in which a node may be a descendent of another node, each node having a type, a node and all of its descendent nodes collectively constituting a subtree of the document hierarchy, the method comprising: generating a list of nodes in a rendered version of a subject web page, each node having a respective location score having a value identical to other location scores of each node in the list; determining a location of a pointer displayed in relation to the rendered version of the subject web page according to scrolling adjusted coordinates of the displayed pointer, the location indicating a first node of the list of nodes, the first node having a first location score reduced by a factor; for each node of the list of nodes, updating the respective location score, the updated respective location score for each node having an updated value based on a distance of the pointer from a corner associated with the node and a square root of an area of the node; selecting, based on a lowest updated value of the updated respective locations, one of the nodes of the document hierarchy as the root of a subtree of the document hierarchy that corresponds to the portion to be extracted; establishing a reset node comprising a stylesheet specifying a predetermined standardized set of formatting attribute values inheritable by descendents of the reset node; for each of one or more of the nodes of the subtree defined by the selected node of the document hierarchy: establishing a descendent of the reset node having the same type as the node of the subtree; for each of a plurality of formatting attributes, determining calculated values of the formatting attributes in both the descendent of the reset node and the node of the subtree, the determined calculated value of the formatting attribute in the descendant of the reset node being inherited from the predetermined standardized set of formatting attribute values; for one or more of the plurality of formatting attributes, determining that the calculated value of the formatting attribute in the descendent of the reset node does not match the calculated value of the formatting attribute in the node of the subtree; and for only those formatting attributes of the plurality of formatting attributes for which the determined calculated value of the formatting attribute in the descendent of the reset node does not match the determined calculated value of the formatting attribute in the node of the subtree, explicitly specifying for the formatting attribute in the descendent of the reset node the determined calculated value of the formatting attribute in the node of the subtree. 11. The computer-readable storage medium of claim 10 wherein the reset node specifies inherited formatting attribute values directly. 12. The computer-readable storage medium of claim 10 wherein the reset node specifies attribute values by attributing to the reset node a class that specifies formatting attribute values directly. 13. The computer-readable storage medium of claim 10 wherein the reset node is established in the document hierarchy of the selected hierarchical document. 14. The computer-readable storage medium of claim 10 wherein the reset node is

Assignees

Inventors

Classifications

  • G06F40/103Primary

    Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title

  • G06F16/986Primary

    Document structures and storage, e.g. HTML extensions · CPC title

  • Mark-up to mark-up conversion (conversion for visualization in web browsing G06F16/9577) · CPC title

  • based on web technology, e.g. hypertext transfer protocol [HTTP] · CPC title

  • Trees, e.g. B+trees · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9430583B1 cover?
A portion data structure representing a portion extracted from a formatted source document is described. A portion data structure contains a first subtree of nodes that is modeled after a second subtree of a complete hierarchical representation of the formatted source document. Explicit formatting attribute values are specified for nodes of the first subtree only where a value calculated for th…
Who is the assignee on this patent?
Flake Gary W, Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/103. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).