Determining descriptive attributes for listing locations

US10430730B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10430730-B2
Application numberUS-201514800369-A
CountryUS
Kind codeB2
Filing dateJul 15, 2015
Priority dateJul 16, 2014
Publication dateOct 1, 2019
Grant dateOct 1, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Listings and reviews of listings can be processed to identify descriptive attributes for locations associated with the listings. To do this, a corpus of words is generated for various locations based on listings in the locations and reviews of those listings. An expected frequency, and per-location frequency for each word is determined. These numbers are in turn used to determine a number of high frequency listing locations, and a number of below expected frequency listing locations for each word. Based on a comparison of the number of high frequency listing locations and the number of below expected frequency listing locations of a word with an attribute reference number, the word can be identified either as an attribute that is likely descriptive of the location, or not.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: retrieving listings and reviews of the listings, each listing associated with one of a plurality of locations; extracting one or more words from text of each listing and each review to generate an initial list of words; generating a corpus of words from the initial list of words by filtering a subset of words from the initial list of words; maintaining, for each location of the plurality of locations, a co-occurrence matrix storing frequencies of co-occurrences of pairs of words appearing in the listings and reviews associated with the location, each frequency of co-occurrences of a pair of words describing a likelihood that the pair of words describe a same attribute; for each of the words in the corpus: computing an expected frequency for a word to appear in the corpus by computing a ratio of a number of times the word appears in the corpus to a total number of words in the corpus, wherein, in response to a second word in the corpus having at least a threshold level of similarity to the word based on the co-occurrence matrices, a number of times the second word appears in the corpus counts toward the number of times the word appears in the corpus, determining, for each location in which the word appears in at least one of a listing or a review, a per-location frequency for the word, the per-location frequency for the word indicating a number of times the word appears in the listings and reviews for the location, determining a number of high frequency listing locations for the word, the high frequency listing locations comprising locations where the per-location frequency of the word is at least a first multiple greater than the expected frequency, determining a number of below expected frequency listing locations for the word, the below expected frequency listing locations comprising locations where the per-location frequency of the word is at most a second multiple smaller than the expected frequency, and determining a descriptiveness metric for the word by calculating a ratio of the number of high frequency listings locations for the word to the number of below expected frequency listings locations for the word; identifying, as attributes, words in the corpus of words having a descriptiveness metric within a threshold range of an attribute reference number, the descriptiveness metric measuring the uniqueness of a word to a location; and selecting, for a particular location of the plurality of locations, an attribute of the identified attributes based on a frequency of the attribute in listings and reviews of the listings associated with the particular location, wherein the selected attribute represents a characteristic of the particular location. 2. The method of claim 1 , wherein filtering the subset of words from the initial list of words comprises: filtering stop words from the initial list of words; and filtering proper nouns from the initial list of words. 3. The method of claim 1 , wherein the per-location frequency based on a total number of times the word occurs in listings associated with the location. 4. The method of claim 1 , wherein the descriptiveness metric is a ratio of the number of high frequency listings locations to the number of below expected frequency listings locations. 5. The method of claim 1 , wherein the descriptiveness metric is a numerical value that represents how descriptive a word is of a location relative to the other words in the corpus. 6. The method of claim 1 , wherein the attribute reference number is 1. 7. The method of claim 1 , further comprising: identifying bigrams and trigrams in the listings and reviews, each bigram comprising two component words, and each trigram comprising three component words; and adding the identified bigrams and trigrams to the corpus of words; wherein a number of times a component word of a bigram or trigram appears in an identified bigram or trigram in the corpus of words is not counted towards a number of times the component word appears in the corpus for computing the expected frequency for the component word to appear in the corpus. 8. The method of claim 1 , wherein the expected frequency is based on a total number of times the word occurs in the corpus, a total number of times other words semantically similar to the word occur in the corpus, and a total number of words in the corpus. 9. The method of claim 1 , further comprising: receiving a request for attributes of one of the locations; identifying a subset of the corpus comprising words present in listings and reviews of the listings associated with the location; comparing the attributes against the subset of words to determine a list of attributes for the location; and providing the list of attributes for the location in response to the request. 10. The method of claim 9 , wherein comparing the attributes against the subset of words to determine the list of attributes for the location comprises: identifying which of the attributes are present as words in the subset of the corpus. 11. A non-transitory computer readable storage medium comprising instructions that when executed by at least one processor causes the processor to: retrieve listings and reviews of the listings, each listing associated with one of a plurality of locations; extract one or more words from text of each listing and each review to generate an initial list of words; generate a corpus of words from the initial list of words by filtering a subset of words from the initial list of words; maintain, for each location of the plurality of locations, a co-occurrence matrix storing frequencies of co-occurrences of pairs of words appearing in the listings and reviews associated with the location, each frequency of co-occurrences of a pair of words describing a likelihood that the pair of words describe a same attribute; for each of the words in the corpus: compute an expected frequency for a word to appear in the corpus by computing a ratio of a number of times the word appears in the corpus to a total number of words in the corpus, wherein, in response to a second word in the corpus having at least a threshold level of similarity to the word based on the co-occurrence matrices, a number of times the second word appears in the corpus counts toward the number of times the word appears in the corpus, determine, for each location in which the word appears in at least one of a listing or a review, a per-location frequency for the word, the per-location frequency for the word indicating a number of times the word appears in the listings and reviews for the location, determine a number of high frequency listing locations for the word, the high frequency listing locations comprising locations where the per-location frequency of the word is at least a first multiple greater than the expected frequency, determine a number of below expected frequency listing locations for the word, the below expected frequency listing locations comprising locations where the per-location frequency of the word is at most a second multiple smaller than the expected frequency, and determine a descriptiveness metric for the word by calculating a ratio of the number of high frequency listings locations for the word to the number of below expected frequency listings locations for the word; identify, as attributes, words in the corpus of words having a descriptiveness metric within a threshold range of an attribute reference number, the descriptiveness metric measuring the uniqueness of a word to a location; and select, for a particular location of the plurality of locations, an attribute of the identified attributes based on a frequency of the attribute

Assignees

Inventors

Classifications

  • Selection or weighting of terms for indexing · CPC title

  • G06Q10/02Primary

    Reservations, e.g. for tickets, services or events · CPC title

  • Travel agencies · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10430730B2 cover?
Listings and reviews of listings can be processed to identify descriptive attributes for locations associated with the listings. To do this, a corpus of words is generated for various locations based on listings in the locations and reviews of those listings. An expected frequency, and per-location frequency for each word is determined. These numbers are in turn used to determine a number of hi…
Who is the assignee on this patent?
Airbnb Inc
What technology area does this patent fall under?
Primary CPC classification G06Q10/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 01 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).