Method and system to detect use cases in documents for providing structured text objects

US2016171111A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016171111-A1
Application numberUS-201414572339-A
CountryUS
Kind codeA1
Filing dateDec 16, 2014
Priority dateDec 16, 2014
Publication dateJun 16, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present teaching relates to providing structured text. In one example, a document is obtained. One or more keywords are identified in the document. One or more topics are determined based on the one or more keywords. Each of the one or more topics is related to at least one of the one or more keywords residing in one or more portions of the document. A snippet is generated for each of the portions associated with a corresponding topic based on content in the portion of the document.

First claim

Opening claim text (preview).

We claim: 1 . A method, implemented on a machine having at least one processor, storage, and a communication platform connected to a network for generating a snippet, the method comprising: obtaining a document; identifying one or more keywords in the document; determining one or more topics based on the one or more keywords, wherein each of the one or more topics is related to at least one of the one or more keywords residing in one or more portions of the document; and generating a snippet for each of the portions associated with a corresponding topic based on content in the portion of the document. 2 . The method of claim 1 , further comprising generating an index for each of the snippets based on the corresponding topic associated with the snippet. 3 . The method of claim 1 , wherein the determining comprises: matching each of the one or more keywords with the one or more topics; generating a score for each of the one or more topics based on the matching; and ranking the one or more topics based on their respective scores. 4 . The method of claim 1 , wherein the generating comprises: obtaining one or more parameters associated with the corresponding topic; extracting information from the portion of the document according to the one or more parameters; and generating the snippet based on the extracted information. 5 . The method of claim 1 , further comprising storing the snippet associated with the corresponding topic and the portion of the document in a database. 6 . The method of claim 1 , wherein the snippet is also associated with at least one of: the document; a URL (uniform resource locator) associated with the document; the portion of the document; one or more parameters representing a structure of the snippet; and a confidence score indicating how likely the snippet can represent the portion of the document. 7 . A method, implemented on a machine having at least one processor, storage, and a communication platform connected to a network for providing a search result, the method comprising: receiving a query; identifying one or more keywords from the query; determining one or more topics associated with the query based on the one or more keywords; retrieving one or more snippets based on the one or more topics, wherein each of the snippets corresponds to a portion of a corresponding document that is related to a topic associated with the snippet; and providing the one or more snippets in response to the query. 8 . The method of claim 7 , further comprising providing a representation of a corresponding document associated with each of the one or more snippets in response to the query. 9 . The method of claim 7 , wherein the determining comprises: matching each of the one or more keywords with the one or more topics; generating a score for each of the one or more topics based on the matching; and ranking the one or more topics based on their respective scores. 10 . The method of claim 7 , wherein the providing comprises: ranking the one or more snippets; and providing the ranked one or more snippets in response to the query. 11 . The method of claim 7 , wherein at least one of the one or more snippets is associated with at least one of: the corresponding document; a URL associated with the corresponding document; the portion of the corresponding document; one or more parameters representing a structure of the snippet; and a confidence score indicating how likely the snippet can represent the portion of the corresponding document. 12 . A system having at least one processor, storage, and a communication platform connected to a network for generating a snippet, comprising: a document obtaining unit configured for obtaining a document; an entity detector configured for identifying one or more keywords in the document; a use case matching unit configured for determining one or more topics based on the one or more keywords, wherein each of the one or more topics is related to at least one of the one or more keywords residing in one or more portions of the document; an indexed snippet generator configured for generating a snippet for each of the portions associated with a corresponding topic based on content in the portion of the document. 13 . The system of claim 12 , wherein the indexed snippet generator comprises an index generator configured for generating an index for each of the snippets based on the corresponding topic associated with the snippet. 14 . The system of claim 12 , wherein the use case matching unit is further configured for: matching each of the one or more keywords with the one or more topics; and generating a score for each of the one or more topics based on the matching, wherein the one or more topics are ranked based on their respective scores. 15 . The system of claim 12 , wherein the indexed snippet generator comprises: a snippet parameter determiner configured for obtaining one or more parameters associated with the corresponding topic; a structured text extractor configured for extracting information from the portion of the document according to the one or more parameters; and a snippet generator/updater configured for generating the snippet based on the extracted information. 16 . The system of claim 12 , wherein the snippet is also associated with at least one of: the document; a URL associated with the document; the portion of the document; one or more parameters representing a structure of the snippet; and a confidence score indicating how likely the snippet can represent the portion of the document. 17 . A system having at least one processor, storage, and a communication platform connected to a network for providing a search result, comprising: a search request analyzer configured for receiving a query; an entity type identifier configured for identifying one or more keywords from the query; a use case determiner configured for determining one or more topics associated with the query based on the one or more keywords; a snippet retriever configured for retrieving one or more snippets based on the one or more topics, wherein each of the snippets corresponds to a portion of a corresponding document that is related to a topic associated with the snippet; and a search result provider configured for providing the one or more snippets in response to the query. 18 . The system of claim 17 , wherein the search result provider is further configured for providing a representation of a corresponding document associated with each of the one or more snippets in response to the query. 19 . The system of claim 17 , wherein the use case determiner is further configured for: matching each of the one or more keywords with the one or more topics; generating a score for each of the one or more topics based on the matching; and ranking the one or more topics based on their respective scores. 20 . The system of claim 17 , further comprising a snippet ranking unit configured for ranking the one or more snippets. 21 . The system of claim 17 , wherein at least one of the one or more snippets is associated with at least one of: the corresponding document; a URL associated with the corresponding document; the portion of the corresponding document; one or more parameters representing a structure of the snippet; and a confidence score indicating how likely the snippet can represent the portion of the corresponding document. 22 .

Assignees

Inventors

Classifications

  • Presentation of query results · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016171111A1 cover?
The present teaching relates to providing structured text. In one example, a document is obtained. One or more keywords are identified in the document. One or more topics are determined based on the one or more keywords. Each of the one or more topics is related to at least one of the one or more keywords residing in one or more portions of the document. A snippet is generated for each of the p…
Who is the assignee on this patent?
Yahoo Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/9535. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).