Internet text mining-based method and apparatus for judging validity of point of interest

US2020081908A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020081908-A1
Application numberUS-201916508257-A
CountryUS
Kind codeA1
Filing dateJul 10, 2019
Priority dateSep 10, 2018
Publication dateMar 12, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure disclose an Internet text mining-based method and apparatus for judging the validity of a point of interest. An implementation of the method includes: determining a search word set for indicating a to-be-detected point of interest; performing a search by using a determined search word as a search keyword, to obtain a description information set for describing the to-be-detected point of interest; and inputting a name of the to-be-detected point of interest and description information in the description information set into a pre-established validity discriminant model, to obtain a status label for indicating validity of the to-be-detected point of interest. This implementation enables timely discovery of invalid POI information. Thus, more accurate information are provided for users, user needs are met, and user experience is improved.

First claim

Opening claim text (preview).

What is claimed is: 1 . An Internet text mining-based method for judging validity of a point of interest, comprising: determining a search word set for indicating a to-be-detected point of interest; performing a search by using a determined search word as a search keyword, to obtain a description information set for describing the to-be-detected point of interest; and inputting a name of the to-be-detected point of interest and description information in the description information set into a pre-established validity discriminant model, to obtain a status label for indicating validity of the to-be-detected point of interest. 2 . The method according to claim 1 , wherein the determining a search word set for indicating a to-be-detected point of interest comprises: using the name of the to-be-detected point of interest and a synonym of the name of the to-be-detected point of interest as search words in the search word set. 3 . The method according to claim 1 , wherein the pre-established validity discriminant model is an attention model; and the validity discriminant model is trained and obtained by following training: training an initial attention model by using a name of a sample point of interest and description information of the sample point of interest as inputs, and using a status label of the sample point of interest as a target, to obtain the validity discriminant model. 4 . The method according to claim 3 , wherein the attention model comprises a semantic recognition sub-model and a feature extraction sub-model, and the training further comprises: for one of sample points of interest, inputting a name of the sample point of interest and one piece of description information of the sample point of interest into the semantic recognition sub-model, inputting the piece of description information into the feature extraction sub-model, and splicing feature vectors output by the semantic recognition sub-model and the feature extraction sub-model to obtain a feature vector of the piece of description information for describing the sample point of interest; determining a weighted sum of the feature vectors of respective description information of the sample point of interest; determining, based on the weighted sum, a probability value belonging to the status label of the sample point of interest; and determining, based on a preset loss function, loss values of probability values of respective sample points of interest under the ground truths thereof, and propagating the determined loss values back in the attention model to adjust a model parameter of the attention model, so as to obtain the validity discriminant model. 5 . The method according to claim 3 , wherein the description information of the sample point of interest is obtained by: determining a first synonym set consisting of the name of the sample point of interest and a synonym of the name of the sample point of interest; determining a second synonym set consisting of the status label of the sample point of interest and a synonym of the status label of the sample point of interest; and performing a search by using a first synonym determined from the first synonym set and a second synonym determined from the second synonym set as a search word, and in the search results, using a statement in which the first synonym and the second synonym appear together as the description information of the sample point of interest. 6 . The method according to claim 5 , wherein the synonym of the status label of the sample point of interest is determined based on at least one of the following: determining the synonym of the status label of the sample point of interest from a preset synonym database; or determining a preset number of target search statements from historical search statements comprising the name of the sample point of interest, and using a word determined from the determined target search statements and having a semantic similarity to the status label of the sample point of interest exceeding a preset similarity threshold as a synonym of the status label of the sample point of interest. 7 . The method according to claim 2 , wherein the synonym of the name of the point of interest is determined based on at least one of the following: determining the synonym of the name of the point of interest from a preset encyclopedia database; performing a search by using the name of the point of interest as a search word, and using a matching entity obtained by the search as a synonym of the name of the point of interest, wherein the matching entity is an entity, the ratio of a longest common substring between the name of the point of interest and the name of the entity to the name of the entity exceeding a preset ratio threshold, among the entities included in a preset number of search results; or performing a search by using the name of the point of interest as a search word, extracting statements comprising the name of the sample point of interest from a preset number of search results, and determining from the extracted statements, by using a co-reference resolution tool, a word for indicating the name of the point of interest as a synonym; wherein the point of interest is one of the to-be-detected point of interest and the sample point of interest. 8 . The method according to claim 5 , wherein the synonym of the name of the point of interest is determined based on at least one of the following: determining the synonym of the name of the point of interest from a preset encyclopedia database; performing a search by using the name of the point of interest as a search word, and using a matching entity obtained by the search as a synonym of the name of the point of interest, wherein the matching entity is an entity, the ratio of a longest common substring between the name of the point of interest and the name of the entity to the name of the entity exceeding a preset ratio threshold, among the entities included in a preset number of search results; or performing a search by using the name of the point of interest as a search word, extracting statements comprising the name of the sample point of interest from a preset number of search results, and determining from the extracted statements, by using a co-reference resolution tool, a word for indicating the name of the point of interest as a synonym; wherein the point of interest is one of the to-be-detected point of interest and the sample point of interest. 9 . An Internet text mining-based apparatus for judging validity of a point of interest, comprising: at least one processor; and a memory storing instructions, the instructions when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising: determining a search word set for indicating a to-be-detected point of interest; performing a search by using a determined search word as a search keyword, to obtain a description information set for describing the to-be-detected point of interest; and inputting a name of to-be-detected point of interest and the description information in the description information set into a pre-established validity discriminant model, to obtain a status label for indicating validity of the to-be-detected point of interest. 10 . The apparatus according to claim 9 , wherein the determining a search word set for indicating a to-be-detected point of interest comprises: using the name of the to-be-detected point of interest and a synonym of the name of the to-be-detected point of interest as search words in the search word set. 11 . The apparatus according to claim 9 , wherein the pre-established validity discriminant

Assignees

Inventors

Classifications

  • Semantic analysis · CPC title

  • Spatial or temporal dependent retrieval, e.g. spatiotemporal queries · CPC title

  • Ensuring data consistency and integrity · CPC title

  • Thesauruses; Synonyms · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020081908A1 cover?
Embodiments of the present disclosure disclose an Internet text mining-based method and apparatus for judging the validity of a point of interest. An implementation of the method includes: determining a search word set for indicating a to-be-detected point of interest; performing a search by using a determined search word as a search keyword, to obtain a description information set for describi…
Who is the assignee on this patent?
Baidu online network technology beijing co ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/9537. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 12 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).