Identifying subsets of signifiers to analyze

US9704136B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9704136-B2
Application numberUS-201313755836-A
CountryUS
Kind codeB2
Filing dateJan 31, 2013
Priority dateJan 31, 2013
Publication dateJul 11, 2017
Grant dateJul 11, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Identifying a subset of signifiers to analyze can include determining a set of distance metrics between a first signifier and each of a plurality of second signifiers, identifying a subset of the plurality of second signifiers to analyze based on the set of distance metrics using a computing device, and determining a relation between the subset of the plurality of second signifiers and the first signifier based a subset of the set of distance metrics.

First claim

Opening claim text (preview).

What is claimed: 1. A method comprising: determining a set of distance metrics between a first signifier and each of a plurality of second signifiers acquired from unstructured content residing on different domains in an enterprise communications network; identifying a subset of the plurality of second signifiers to analyze based on the set of distance metrics using a computing device, wherein identifying the subset of the second signifiers comprises: utilizing a data tree model; growing a number of trees of relevant signifiers; splitting the number of trees into a number of subtrees; and pruning the number of subtrees to include the subset of the second signifiers to analyze with a cost function that is an increasing convex function satisfying Jensen's inequality; analyzing just the subset of the second signifiers of the existing signifiers, including determining a relation between the second subset of the existing signifiers and the first signifier based on a subset of the plurality of distance metrics, wherein analysis of just the subset of the second signifiers reduces analysis time in determining the relation of the first signifier; and identifying content in the enterprise communication network based upon the determining of the relation between the subset of the plurality of second signifiers and the first signifier. 2. The method of claim 1 , wherein determining the relation between the subset of the plurality of second signifiers and the first signifier comprises calculating an average of the distance metric between the first signifier and each of the subset of the second signifiers. 3. The method of claim 1 , wherein pruning the number of subtrees comprises determining to terminate pruning based on a ratio of a cost function. 4. The method of claim 1 , comprising crawling an enterprise network to identify the first signifier. 5. A non-transitory computer-readable medium storing a set of instructions executable by a processing resource, wherein the set of instructions can executed by the processing resource to: determine a set of distance metrics between a new signifier and each of a plurality of existing signifiers acquired from unstructured content residing on different domains in an enterprise communications network; determine a cost function to analyze a relation between the plurality of existing signifiers and the new signifier; identify a first subset of the existing signifiers utilizing a data tree model; identify a second subset of the existing signifiers to analyze based on the set of distance metrics and the cost function, wherein the second subset is a subset of the first subset, wherein identifying the subset of the second signifiers comprises: utilizing a data tree model; growing a number of trees of relevant signifiers; splitting the number of trees into a number of subtrees; and pruning the number of subtrees to include the subset of the second signifiers to analyze with a cost function that is an increasing convex function satisfying Jensen's inequality; and analyze just the subset of the second signifiers of the existing signifiers, including determining a relation between the second subset of the existing signifiers and the new signifier based on a subset of the plurality of distance metrics, wherein analysis of just the subset of the second signifiers reduces analysis time in determining the relation of the new signifier; identify content in an enterprise communication network based upon the determining of the relation between the subset of the plurality of existing signifiers and the new signifier. 6. The medium of claim 5 , wherein the second subset of the existing signifiers comprises a cluster of signifiers including the identified new signifier. 7. The medium of claim 5 , wherein the instructions executable to identify the first subset comprise instructions executable to: utilize the data tree model to split a single node data tree into subtrees; and compare subtrees to one another utilizing a Lloyd model. 8. The medium of claim 5 , wherein the instructions executable to identify the second subset comprise instructions executable to utilize the data tree model to prune the subtrees of irrelevant content utilizing a Breiman, Friedman, Olshen, and Stone (BFOS) model. 9. The medium of claim 5 , wherein the instructions executable to determine a relation between the second subset of the plurality of existing signifiers and the new signifier comprise instructions executable to approximate a measurement of a relation of related phrases in a cluster, wherein the cluster includes the second subset of the plurality of existing signifiers and the new signifier. 10. A system for identifying a subset of signifiers to analyze comprising: a processing resource; and a memory resource communicatively coupled to the processing resource containing instructions executable by the processing resource to: identify a new signifier associated with content on an enterprise network; determine a set of distance metrics between the new signifier and each of a plurality of existing signifiers acquired from unstructured content residing on different domains in an enterprise communications network; identify a cost function to analyze a relation between the plurality of existing signifiers and the new signifier; identify a subset of the plurality of existing signifiers to analyze based on the set of distance metrics and the cost function utilizing a data tree model; analyze just the subset of the existing signifiers, including determining a relation between the subset of the plurality of existing signifiers and the new signifier based on a distance metric between each, wherein analysis of just the subset of the existing signifiers reduces analysis time in determining the relation of the new signifier; and identify content in an enterprise communication network based upon the determining of the relation between the subset of the plurality of existing signifiers and the new signifier, wherein the instructions executable to identify the cost function comprise instructions to identify a first component of the cost function that is minimized using a Lloyd function and a second component of the cost function is an increasing convex function that satisfies Jensen's inequality. 11. The system of claim 10 , wherein the instructions executable to identify the subset of the plurality of existing signifiers to analyze comprise instructions to group the plurality of existing signifiers and the new signifier into a plurality of clusters based on the cost function and the set of distance metrics utilizing a data tree model. 12. The system of claim 10 , wherein the instructions executable to identify the subset of the plurality of existing signifiers to analyze comprise instructions to identify a terminal node in a data tree structure that the new signifier belongs to. 13. The system of claim 10 , wherein the instructions executable to determine the relations between the subset of the plurality of existing signifiers and the new signifier comprise instructions to approximate the relation of the new signifier with the existing signifiers in the subset. 14. The method of claim 1 , wherein the set of distance metrics comprise a frequency of co-occurrence of the first signifier and the second signifier. 15. The method of claim 1 , wherein the set of distance metrics comprise a metric based upon a proximity of the first signifier to the second signifier. 16. The method of claim 1 , wherein the content comprises unstructured content. 17. The method of claim 1 , where

Assignees

Inventors

Classifications

  • G06Q10/101Primary

    Collaborative creation, e.g. joint development of products or services · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9704136B2 cover?
Identifying a subset of signifiers to analyze can include determining a set of distance metrics between a first signifier and each of a plurality of second signifiers, identifying a subset of the plurality of second signifiers to analyze based on the set of distance metrics using a computing device, and determining a relation between the subset of the plurality of second signifiers and the firs…
Who is the assignee on this patent?
Hewlett Packard Development Co Lp, Hewlett Packard Entpr Dev Lp
What technology area does this patent fall under?
Primary CPC classification G06Q10/101. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).