Method and apparatus for determining domain of sentence

US10528666B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10528666-B2
Application numberUS-201715824743-A
CountryUS
Kind codeB2
Filing dateNov 28, 2017
Priority dateAug 14, 2017
Publication dateJan 7, 2020
Grant dateJan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses for determining a domain of a sentence are disclosed. The apparatus may generate, using an autoencoder, an embedded feature from an input feature indicating an input sentence, and determine a domain of the input sentence based on a location of the embedded feature in an embedding space where embedded features are distributed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of determining a domain of a sentence, the method comprising: generating, using an autoencoder, an embedded feature from an input feature indicating an input sentence; and determining a domain of the input sentence based on a location of the embedded feature in an embedding space where embedded features are distributed, wherein the determining of the domain comprises: determining whether the input sentence is an in-domain sentence or an out-of-domain sentence, based on a distance between the location of the embedded feature and a specified location. 2. The method of claim 1 , wherein the autoencoder is trained such that embedded features indicating out-of-domain sentences are closer to the specified location, and the determining of the domain comprises determining the input sentence to be the out-of-domain sentence, in response to the distance being less than a threshold distance. 3. The method of claim 2 , wherein the determining of the domain comprises: generating a reconstructed feature from the embedded feature using the autoencoder, in response to the distance being greater than the threshold distance; generating a reconstruction error based on the input feature and the reconstructed feature; and determining the input sentence to be the out-of-domain sentence, in response to the reconstruction error being greater than a threshold error. 4. The method of claim 1 , wherein the autoencoder is trained such that embedded features indicating in-domain sentences are closer to the specified location, and the determining of the domain comprises determining the input sentence to be the out-of-domain sentence, in response to the distance being greater than a threshold distance. 5. The method of claim 4 , wherein the determining of the domain comprises: generating a reconstructed feature from the embedded feature using the autoencoder, in response to the distance being less than the threshold distance; generating a reconstruction error based on the input feature and the reconstructed feature; and determining the input sentence to the out-of-domain sentence, in response to the reconstruction error being less than a threshold error. 6. The method of claim 1 , wherein the specified location is an original point in the embedding space. 7. The method of claim 1 , wherein the determining of the domain comprises: calculating an Lp-norm or a Kullback-Leibler divergence (KLD) based on the location of the embedded feature and the specified location; and determining the distance between the location of the embedded feature and the specified location based on the calculating of the Lp-norm or the KLD divergence. 8. The method of claim 1 , wherein the determining of the domain comprises: generating a reconstructed feature from the embedded feature using the autoencoder; generating a reconstruction error based on the input feature and the reconstructed feature; and determining the domain of the input sentence based on the reconstruction error and the location of the embedded feature. 9. The method of claim 1 , wherein the embedded feature is an activation value or a pre-activation value of a hidden layer in the autoencoder. 10. The method of claim 1 , wherein the input feature is an embedded feature generated from the input sentence by a neural network. 11. The method of claim 1 , wherein the input feature comprises any one or any combination of one-hot vector, a real vector, or a function corresponding to an input layer in the autoencoder. 12. The method of claim 3 , wherein the reconstructed feature comprises any one or any combination of one-hot vector, a real vector, or a function corresponding to an output layer in the autoencoder. 13. The method of claim 1 , wherein the determining of the domain comprises: determining the domain of the input sentence from among reference domains based on specified locations respectively corresponding to the reference domains and the location of the embedded feature. 14. The method of claim 13 , wherein the autoencoder is trained such that embedded features indicating in-domain sentences respectively belonging to the reference domains are closer to the specified locations, respectively, and the determining of the domain of the input sentence comprises: identifying a second location closest to the location of the embedded feature among the specified locations; and determining that the input sentence belongs to a second domain corresponding to the second location based on whether a distance between the location of the embedded feature and the second location is less than a threshold distance. 15. The method of claim 14 , wherein the determining that the input sentence belongs to the second domain comprises: generating a reconstructed feature from the embedded feature using the autoencoder, in response to the distance between the location of the embedded feature and the second location being less than the threshold distance; generating a reconstruction error based on the input feature and the reconstructed feature; and determining the input sentence to be an out-of-domain sentence, in response to the reconstruction error being less than a threshold error, wherein the out-of-domain sentence is a sentence not belonging to the reference domains. 16. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 . 17. A training method to determine a domain of a sentence, the training method comprising: applying, to an autoencoder, at least one training feature indicating a training sentence; and training the autoencoder such that a location of an embedded feature generated from the training feature is closer to a specified location in an embedding space where embedded features are distributed, wherein the trained autoencoder is configured to determine whether a domain of the training sentence is in-domain or out-of-domain, based on a determined distance between the location of the embedded feature and the specified location. 18. The training method of claim 17 , wherein the applying of the at least one training feature to the autoencoder comprises: applying a first training feature indicating an in-domain sentence to the autoencoder; and applying a second training feature indicating an out-of-domain sentence to the autoencoder, and the training of the autoencoder comprises: obtaining a first embedded feature generated from the first training feature; training the autoencoder such that a reconstruction error between a reconstructed feature generated from the first embedded feature and the first training feature is reduced; and training the autoencoder such that a location of a second embedded feature generated from the second training feature is closer to a second location in the embedding space. 19. The training method of claim 17 , wherein the applying of the at least one training feature to the autoencoder comprises: applying a first training feature indicating an in-domain sentence to the autoencoder; and applying a second training feature indicating an out-of-domain sentence to the autoencoder, and the training of the autoencoder comprises: training the autoencoder such that a location of a first embedded feature generated from the first training feature is closer to a first location in the embedding space; obtaining a second embedded feature generated from the second training feature; and training the autoencoder such that a reconstructio

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10528666B2 cover?
Methods and apparatuses for determining a domain of a sentence are disclosed. The apparatus may generate, using an autoencoder, an embedded feature from an input feature indicating an input sentence, and determine a domain of the input sentence based on a location of the embedded feature in an embedding space where embedded features are distributed.
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).