Dataset shift compensation in machine learning

US2016019883A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016019883-A1
Application numberUS-201414331230-A
CountryUS
Kind codeA1
Filing dateJul 15, 2014
Priority dateJul 15, 2014
Publication dateJan 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for inter-dataset variability compensation, the method comprising using at least one hardware processor for: receiving a heterogeneous development dataset comprising multiple samples and metadata associated with at least some of the multiple samples; dividing the multiple samples into multiple homogenous subsets, based on the metadata; averaging high-level features of each of the multiple homogenous subsets, to produce multiple central high-level features for the multiple homogenous subsets, respectively; computing an inter-dataset variability subspace spanned by the multiple central high-level features; removing the inter-dataset variability subspace from the high-level features of the multiple homogenous subsets, to produce denoised samples; and training a machine learning system using the denoised speech samples.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for inter-dataset variability compensation, the method comprising using at least one hardware processor for: receiving a heterogeneous development dataset comprising multiple samples and metadata associated with at least some of the multiple samples; dividing the multiple samples into multiple homogenous subsets, based on the metadata; computing a statistical measure of high-level features of each of the multiple homogenous subsets, to produce multiple central high-level features for the multiple homogenous subsets, respectively; computing an inter-dataset variability subspace spanned by the multiple central high-level features; removing the inter-dataset variability subspace from the high-level features of the multiple homogenous subsets, to produce denoised samples; and training a machine learning system using the denoised speech samples. 2 . The method according to claim 1 , wherein the high-level features are selected from the group consisting of: i-vectors, GMM (Gaussian Mixture Model) supervectors, HMM (Hidden Markov Model) supervectors, d-vectors, JFA (Joint Factor Analysis) supervectors, LBP (Local Binary Patterns), HOG (Histograms of Oriented Gradients), and EBIF (Early Biologically-Inspired Features). 3 . The method according to claim 2 , wherein the machine learning system is selected from the group consisting of: a PLDA (Probabilistic Linear Discriminant Analysis)-based system, an SVM (Support Vector Machine)-based system, a neural network-based system, a NAP (Nuisance Attribute Projection)-based system, a WCCN (Within-Speaker Covariance Matrix)-based system, and an LDA (Linear Discriminant Analysis)-based system. 4 . The method according to claim 3 , wherein the multiple samples are speech samples. 5 . The method according to claim 4 , wherein the heterogeneous development dataset is devoid of speech samples from a target domain of the speaker recognition. 6 . The method according to claim 4 , wherein the metadata comprises at least one parameter selected from the group consisting of: speaker gender, spoken language and recordation setting. 7 . The method according to claim 4 , wherein the computing of the inter-dataset variability subspace comprises PCA (Principal Component Analysis). 8 . A method for inter-dataset variability compensation for speaker recognition, the method comprising using at least one hardware processor for: receiving a heterogeneous development dataset comprising multiple speech samples; dividing the multiple speech samples into multiple homogenous subsets; for each subset i of the multiple homogenous subsets: (a) estimating PLDA (Probabilistic Linear Discriminant Analysis) hyper-parameters {μ i , B i , W i }, wherein p denotes a center of an i-vector space, B denotes a between-speaker covariance matrix and W denotes a within-speaker covariance matrix, and (b) computing an i-vector subspace S μ corresponding to {μ i }, an i-vector subspace S W corresponding to {W i }, and an i-vector subspace S B corresponding to {B i }; joining i-vector subspaces S μ , S W and S B into a single subspace S; removing subspace S from i-vectors of the multiple speech samples, to produce denoised speech samples; and training a PLDA speaker recognition system using the denoised speech samples. 9 . The method according to claim 8 , further comprising smoothing B by linear interpolation using an estimated diagonal of B. 10 . The method according to claim 8 , wherein: the heterogeneous development dataset further comprises metadata associated with at least some of the multiple speech samples; and the dividing is based on the metadata. 11 . The method according to claim 8 , wherein the computing of each of the i-vector subspaces S μ , S W and S B comprises PCA (Principal Component Analysis). 12 . The method according to claim 8 , further comprising computing an average of squared {W i }, and finding a k number of largest eigenvalues of the squared {W i }, wherein the k largest eigenvalues span the i-vector subspace S W . 13 . The method according to claim 12 , further comprising whitening the i-vector subspace S W with respect to W. 14 . The method according to claim 8 , further comprising computing an average of squared {B i }, and finding an m number of largest eigenvalues of the squared {B i }, wherein the k largest eigenvalues span the i-vector subspace S B . 15 . The method according to claim 14 , further comprising whitening the i-vector subspace S B with respect to B. 16 . A computer program product for inter-dataset variability compensation for speaker recognition, the computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by at least one hardware processor to: receive a heterogeneous development dataset comprising multiple speech samples; divide the multiple speech samples into multiple homogenous subsets; for each subset i of the multiple homogenous subsets: (a) estimate PLDA (Probabilistic Linear Discriminant Analysis) hyper-parameters {μ i , B i , W i }, wherein μ denotes a center of an i-vector space, B denotes a between-speaker covariance matrix and W denotes a within-speaker covariance matrix, and (b) compute an i-vector subspace S μ corresponding to {μ i }, an i-vector subspace S W corresponding to {W i }, and an i-vector subspace S B corresponding to {B i }; join i-vector subspaces S μ , S W and S B into a single subspace S; remove subspace S from i-vectors of the multiple speech samples, to produce denoised speech samples; and train a PLDA speaker recognition system using the denoised speech samples. 17 . The computer program product according to claim 16 , wherein the program code is further executable by the at least one hardware processor to smooth B by linear interpolation using an estimated diagonal of B. 18 . The computer program product according to claim 16 , wherein: the heterogeneous development dataset further comprises metadata associated with at least some of the multiple speech samples; and the dividing is based on the metadata. 19 . The computer program product according to claim 16 , wherein the computing of each of the i-vector subspaces S μ , S W and S B comprises PCA (Principal Component Analysis). 20 . The computer program product according to claim 16 , wherein the program code is further executable by the at least one hardware processor to: compute an average of squared {W,}; find a k number of largest eigenvalues of the squared {W i }, wherein the k largest eigenvalues span the i-vector subspace S W ; whiten the i-vector subspace S W with respect to W; compute an average of squared {B i }; find an m number of largest eigenvalues of the squared {B i }, wherein the m largest eigenvalues span the i-vector subspace S b ; and whiten the i-vector subspace S B with respect to B.

Assignees

Inventors

Classifications

  • Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices · CPC title

  • Noise filtering · CPC title

  • Training, enrolment or model building · CPC title

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016019883A1 cover?
A method for inter-dataset variability compensation, the method comprising using at least one hardware processor for: receiving a heterogeneous development dataset comprising multiple samples and metadata associated with at least some of the multiple samples; dividing the multiple samples into multiple homogenous subsets, based on the metadata; averaging high-level features of each of the multi…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).