Identifying a topic for text using a database system

US9292589B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9292589-B2
Application numberUS-201314018107-A
CountryUS
Kind codeB2
Filing dateSep 4, 2013
Priority dateSep 4, 2012
Publication dateMar 22, 2016
Grant dateMar 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are methods, apparatus, systems, and computer-readable storage media for identifying a topic for a text. In some implementations, one or more servers maintain a plurality of data entries in one or more database tables storing text data, each data entry of a first portion of the data entries including: a text sequence, a topic, and a text-to-topic association score indicating a number of times that the text sequence appears in a processed text associated with the topic, each data entry of a second portion of the data entries including a total word score indicating a number of times that a respective text sequence appears in one or more processed texts. The one or more servers may receive an incoming text and identify a topic for the incoming text by processing the text sequences of the incoming text in relation to the data entries in the database tables.

First claim

Opening claim text (preview).

What is claimed is: 1. A database system comprising: one or more databases; one or more servers having one or more processors operable to cause: maintaining a plurality of data entries in the one or more databases, each data entry of a first portion of the data entries identifying: a text sequence, a topic, and a text-to-topic association score indicating a number of times that the text sequence appears in a processed text associated with the topic, each data entry of a second portion of the data entries identifying a total word score indicating a number of times that a respective text sequence appears in one or more processed texts; processing incoming text having a length and including one or more text sequences; identifying a topic for the incoming text by processing the one or more text sequences of the incoming text in relation to the data entries in the one or more databases; and responsive to a request to assign a topic to the incoming text, updating the one or more databases, the updating including, for each text sequence of the incoming text: identifying or creating a first data entry of the first portion of data entries that identifies the text sequence of the incoming text and the requested topic, incrementing the text-to-topic association score of the first data entry by an inflation factor, identifying or creating a second data entry of the second portion of data entries that identifies the text sequence of the incoming text, and incrementing the total word score of the second data entry by the inflation factor. 2. The database system of claim 1 , wherein a text sequence of the incoming text includes one or more words. 3. The database system of claim 1 , wherein identifying the topic for the incoming text comprises: for each text sequence of the incoming text, generating a subtotal topic score for the text sequence and the topic; determining a total topic score for the incoming text and the topic by summing the subtotal topic scores, the total topic score indicating relevance of the topic to the incoming text; and determining that the total topic score for the incoming text meets or traverses a threshold. 4. The database system of claim 3 , the one or more processors further comprising operable to cause: comparing the total topic score with other total topic scores for other topics; and ranking the total topic score with the other total topic scores in order of relevance. 5. The database system of claim 3 , wherein generating the subtotal topic score for the text sequence and the topic comprises: identifying a first data entry containing the text sequence and the topic; normalizing the text-to-topic association score of the identified first data entry and a total word score associated with the text sequence; dividing the normalized text-to-topic association score by a first function of the normalized total word score to generate an intermediate score; and dividing the intermediate score by a second function of the length of the incoming text to generate a subtotal topic score. 6. The database system of claim 5 , wherein normalizing the text-to-topic association score and the total word score comprises: dividing the text-to-topic association score by an inflation factor; and dividing the total word score by the inflation factor. 7. The database system of claim 6 , wherein the inflation factor has a value based on a measure of time. 8. The database system of claim 7 , wherein the inflation factor is an exponential function of a measure of time. 9. The database system of claim 5 , wherein the first function and the second function are a square root. 10. The database system of claim 1 , the one or more processors further operable to cause: transmitting the identified topic to a computing device for display as a suggested topic for the incoming text. 11. The database system of claim 10 , wherein the identified topic is transmitted to the computing device with one or more other suggested topics. 12. The database system of claim 1 , the one or more processors further operable to cause: providing a first database table configured to store the first data entries; providing a second database table configured to store text sequences and corresponding total word scores; identifying one or more texts associated with one or more topics, each text having a plurality of text sequences; and for each identified text, updating the first and second database tables using the text sequences of the identified text and the topic associated with the identified text. 13. The database system of claim 1 , wherein a server identifies the topic for the incoming text in response to receiving from a computing device a request for suggested topics based on the incoming text. 14. The database system of claim 13 , wherein the request for suggested topics is generated by the computing device in response to a request to post the incoming text. 15. The database system of claim 13 , wherein the request for suggested topics is generated by the computing device in response to a request to select a topic to be assigned to the incoming text. 16. A method comprising: maintaining, using a database system, a plurality of data entries in one or more databases of the database system, each data entry of a first portion of the data entries identifying: a text sequence, a topic, and a text-to-topic association score indicating a number of times that the text sequence appears in a processed text associated with the topic, each data entry of a second portion of the data entries identifying a total word score indicating a number of times that a respective text sequence appears in one or more processed texts; receive processing incoming text having a length and including one or more text sequences; identifying a topic for the incoming text by processing the one or more text sequences of the incoming text in relation to the data entries in the one or more databases; and responsive to a request to assign a topic to the incoming text, updating the one or more databases, the updating including, for each text sequence of the incoming text: identifying or creating a first data entry of the first portion of data entries that identifies the text sequence of the incoming text and the requested topic, incrementing the text-to-topic association score of the first data entry by an inflation factor, identifying or creating a second data entry of the second portion of data entries that identifies the text sequence of the incoming text, and incrementing the total word score of the second data entry by the inflation factor. 17. A computer program product comprising a non-transitory computer-readable storage medium and further comprising program code to be executed by at least one processor when retrieved from the non-transitory computer-readable storage medium, the program code comprising instructions configured to cause: maintaining, using a database system, a plurality of data entries in one or more databases of the database system, each data entry of a first portion of the data entries identifying: a text sequence, a topic, and a text-to-topic association score indicating a number of times that the text sequence appears in a processed text associated with the topic, each data entry of a second portion of the data entries identifying a total word score indicating a number of times that a respective text sequence appears in one or more processed texts; processing incoming text having a length and including one or more text sequences; and identifying a topic for the incoming text

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9292589B2 cover?
Disclosed are methods, apparatus, systems, and computer-readable storage media for identifying a topic for a text. In some implementations, one or more servers maintain a plurality of data entries in one or more database tables storing text data, each data entry of a first portion of the data entries including: a text sequence, a topic, and a text-to-topic association score indicating a number …
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30598. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).