Page journey determination from web event journals

US10831809B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10831809-B2
Application numberUS-201715693441-A
CountryUS
Kind codeB2
Filing dateAug 31, 2017
Priority dateAug 31, 2017
Publication dateNov 10, 2020
Grant dateNov 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Large amounts of data from user interactions with web resources is available as data logs. Analysis may be performed to process the data log in order to determine the characteristics of the user interactions. Data log analysis may include identifying page states, which may be sets of frequent attributes and values that occur together in a session. The data log analysis may also include generating semantic labels of page states, which may describe the function of pages corresponding to different page states. Text mining models may be used to determine the semantic labels. Analysis may also include aggregating sets of page paths to create page journeys. These page journeys may be aggregated over all users, all user sessions, or other subsets of the clickstream. In some embodiments, comparing page journeys may provide recommendations for potential methods to improve the site and enhance user experiences.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: processing a data log that includes entries corresponding to interactions between a plurality of users and a set of web resources that are at least partially interlinked, including by: grouping, by a computer system, entries in the data log into a plurality of groups corresponding to particular user sessions; for individual ones of at least two groups within the plurality of groups, identifying, by the computer system: page states accessed during a user session associated with that group; and page paths through two or more page states of the user session associated with that group; aggregating, by the computer system, at least two identified page paths from the individual ones of the plurality of groups to determine page journey information that indicates relative frequency of users proceeding between different page states within the set of web resources; and storing, by the computer system, the page journey information for the set of web resources. 2. The method of claim 1 , wherein the set of web resources corresponds to portions of a website, and wherein the data log is a set of clickstream data from a plurality of users that have visited the website. 3. The method of claim 1 , wherein the identifying page states includes classifying attributes within entries in the data log based on a frequency of occurrence of the attributes and associated attribute values. 4. The method of claim 3 , wherein the classifying includes assigning attributes in the data log into a plurality of attribute categories that includes at least a user attribute category, a site attribute category, and a page attribute category. 5. The method of claim 1 , wherein the page journey information includes semantic labels that describe a function of pages corresponding to the different page states. 6. The method of claim 5 , wherein the semantic labels are determined by running a semantic classifier separately for different classifications of attributes within the data log. 7. The method of claim 6 , wherein the semantic classifier runs separately for: attributes classified as user attributes; attributes classified as application attributes; and attributes classified as page attributes. 8. The method of claim 5 , wherein the semantic labels are determined using a text frequency-inverse document frequency model. 9. A non-transitory computer-readable storage medium having instructions stored thereon that are executable by a computing system to perform operations comprising: processing a data log that includes entries corresponding to interactions between a plurality of users and a set of web resources that are at least partially interlinked, including by: grouping entries in the data log into a plurality of groups corresponding to particular user sessions; for individual ones of at least two groups within the plurality of groups, identifying: page states accessed during a user session associated with that group; and page paths through two or more page states of the user session associated with that group; aggregating at least two identified page paths from the individual ones of the plurality of groups to determine page journey information that indicates relative frequency of users proceeding between different page states within the set of web resources; and storing the page journey information for the set of web resources. 10. The medium of claim 9 , wherein the instructions are further executable to compare page journey information from subsections of the data log corresponding to time periods before and after a specific time. 11. The medium of claim 9 , wherein the operations further comprise determining, using a text mining model, semantic labels for the page states within the page journey information, wherein the semantic labels describe the function of pages corresponding to different page states. 12. The medium of claim 11 , wherein the text mining model is modified using sigmoid cross entropy. 13. The medium of claim 9 , wherein the aggregating of page journey information is based on frequencies of transitioning between a first set of page states and one or more other sets of page states. 14. The medium of claim 9 , wherein the operations further comprise using the page journey information to modify, in real time, a user experience of a user interacting with the set of web resources by changing user options at a particular page state. 15. The medium of claim 9 , wherein the instructions are further executable to display the stored page journey information using a graphical user interface. 16. A method, comprising: receiving a first clickstream data log for a first set of web resources; determining a first set of page journey information by classifying attributes within entries in the first clickstream data log for the first set of web resources based on a frequency of occurrence of attributes and associated attribute values of the entries without receiving user input specifying a format of the first clickstream data log; receiving a second clickstream data log for a second set of web resources, wherein the second clickstream data log has a different format from the first clickstream data log; determining a second set of page journey information by the classifying attributes within entries in the second clickstream data log for the second set of web resources based on a frequency of occurrence of attributes and associated attribute values of the entries without receiving user input specifying a format of the second clickstream data log; grouping entries in the first and second clickstream data logs into a plurality of groups corresponding to a particular user sessions, wherein the grouping is based on the classified attributes; processing the plurality of groups using the classified attributes to identify two or more page states for the first and second set of web resources; identifying at least two page paths through the two or more page states; and aggregating the at least two identified page paths to produce information indicative of relative frequency of users proceeding between different page states for the first and second set of web resources. 17. The method of claim 16 , wherein the formats of the first and second clickstream data logs differ in at least one of attribute name or attribute definition. 18. The method of claim 16 , wherein the different format of the second clickstream data log is based at least in part on the second set of web resources being programmed using a different programming language than the first set of web resources. 19. The method of claim 16 , further comprising using the first and second set of page journey information to modify, in real time, a user experience at different page states for the first and second set of web resources. 20. The method of claim 16 , wherein the first and second set of web resources corresponds to portions of a website. 21. A method, comprising: processing a data log that includes entries corresponding to interactions between a plurality of users and a set of web resources that are at least partially interlinked, including by: grouping, by a computer system, entries in the data log into a plurality of groups; for individual ones of at least two groups within the plurality of groups, identifying, by the computer system: page states accessed associated with that group; and page paths through two or more page states associated with that group; aggregating, by the computer system, at l

Assignees

Inventors

Classifications

  • Biometric identity checks · CPC title

  • Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking · CPC title

  • Data mining · CPC title

  • G06F16/353Primary

    into predefined classes · CPC title

  • using natural language analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10831809B2 cover?
Large amounts of data from user interactions with web resources is available as data logs. Analysis may be performed to process the data log in order to determine the characteristics of the user interactions. Data log analysis may include identifying page states, which may be sets of frequent attributes and values that occur together in a session. The data log analysis may also include generati…
Who is the assignee on this patent?
Ca Inc, Ca Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06Q20/40145. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).