What technology area does this patent fall under?

Primary CPC classification G10L25/06. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 20 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Pitch marking in speech processing

Patent metadata
Field	Value
Publication number	US-9685170-B2
Application number	US-201514918601-A
Country	US
Kind code	B2
Filing date	Oct 21, 2015
Priority date	Oct 21, 2015
Publication date	Jun 20, 2017
Grant date	Jun 20, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized method for receiving and processing continuous speech signals for generating therefrom one or more pitch mark combinations for speech processing, comprising: receiving a continuous speech signal representing audible speech recorded by a microphone, wherein a sequence of pitch values and a plurality of pitch mark temporal values are computed from said continuous speech signal, each of said plurality of pitch mark temporal values associated with one element of said sequence; using at least one hardware processor for executing a code for processing said continuous speech signal and generating at least one pitch mark combination, said processing comprises: computing for each of said plurality of pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of said continuous speech signal around said pitch mark temporal values associated with pairs of elements in said sequence; computing at least one new temporal value between said lower limit temporal value and said upper limit temporal value; automatically generating said at least one pitch mark combination by replacing at least one of said plurality of pitch mark temporal values with said at least one new temporal value; outputting said at least one pitch mark combination of said plurality of pitch mark temporal values to a speech processor for at least one of speech processing, modification, and conversion to an audible output sound signal; wherein elements of said at least one combination are between said lower limit temporal value and said upper limit temporal value. 2. The method of claim 1 , wherein said cross-correlation is a normalized linear cross-correlation function. 3. The method of claim 1 , wherein said continuous speech signal is preprocessed by a zero-phase, low-pass filter to reduce its high-band noise components prior to said computing of said cross-correlation function. 4. The method of claim 1 , wherein said cross-correlation function is computed using a formula r ⁡ ( Δ ) = x ⁡ ( Δ ) T ⁢ y ⁡ ( 0 ) 0.5 ⁢ (  x ⁡ ( Δ )  2 +  y ⁡ ( 0 )  2 ) , where Δ denotes a temporal offset value from one of said plurality of pitch mark temporal values, x(Δ) denotes an input section of said continuous speech signal shifted by Δ samples relative to a first pitch mark temporal value and y(0) denotes an unshifted input section of said continuous speech signal associated with a second pitch mark temporal value. 5. The method of claim 1 , wherein said lower limit temporal value and said upper limit temporal value are determined by a plurality of input values of said cross-correlation function, associated with respective output values of said cross-correlation function that are a predefined ratio of a peak output value of said cross-correlation function. 6. The method of claim 5 , wherein said predefined ratio is 0.97 of said peak output value. 7. The method of claim 5 , wherein said predefined ratio is a value between 0.8 and 0.999 of said peak output value. 8. The method of claim 4 , wherein said first input section of said continuous speech signal is temporally preceding said unshifted input section of said continuous speech signal. 9. The method of claim 4 , wherein said unshifted input section of said continuous speech signal is temporally preceding said input section of said continuous speech signal. 10. The method of claim 1 , further comprising selecting a preferred pitch mark sequence from said at least one pitch mark combination, wherein said preferred pitch mark sequence is selected by minimization of a sequence global consistency criterion, wherein said sequence global consistency criterion is a sum of individual global consistency criteria of each said element in said at least one pitch mark combination. 11. The method of claim 10 , wherein each said individual global consistency criteria is derived from a temporal drift of each said element, relative to a certain reference pitch mark. 12. The method of claim 11 , wherein said continuous speech signal is preprocessed by a zero-phase, low-pass filter to reduce its high-band noise components prior to said computing of said pitch mark drift function. 13. The method of claim 1 , wherein said continuous speech signal is digitized by said at least one hardware processor. 14. The method of claim 1 , wherein said sequence of pitch values are computed from said continuous speech signal by said at least one hardware processor. 15. The method of claim 1 , wherein said plurality of pitch mark temporal values are computed from said continuous speech signal by said at least one hardware processor. 16. The method of claim 1 , wherein said a sequence of pitch values are non-zero pitch mark values. 17. A computer program product for receiving and processing continuous speech signals for generating therefrom one or more pitch mark combinations for speech processing, said computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a hardware processor to cause said hardware processor to: perform a signal processing of a continuous speech signal representing audible speech recorded by a microphone for generating at least one pitch mark combination, wherein a sequence of pitch values and a plurality of pitch mark temporal values are computed from sai

Assignees

IBM

Inventors

Shechtman Slava

Classifications

G10L25/06Primary
the extracted parameters being correlation coefficients · CPC title
G10L25/90
Pitch determination of speech signals · CPC title
G10L21/01Primary
Correction of time axis · CPC title
G10L25/09
the extracted parameters being zero crossing rates · CPC title
G10L21/013
Adapting to target pitch · CPC title

Patent family

Related publications grouped by family.

View patent family 58558714

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9685170B2 cover?: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from th…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G10L25/06. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 20 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).