Pitch marking in speech processing

US9685170B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9685170-B2
Application numberUS-201514918601-A
CountryUS
Kind codeB2
Filing dateOct 21, 2015
Priority dateOct 21, 2015
Publication dateJun 20, 2017
Grant dateJun 20, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized method for receiving and processing continuous speech signals for generating therefrom one or more pitch mark combinations for speech processing, comprising: receiving a continuous speech signal representing audible speech recorded by a microphone, wherein a sequence of pitch values and a plurality of pitch mark temporal values are computed from said continuous speech signal, each of said plurality of pitch mark temporal values associated with one element of said sequence; using at least one hardware processor for executing a code for processing said continuous speech signal and generating at least one pitch mark combination, said processing comprises: computing for each of said plurality of pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of said continuous speech signal around said pitch mark temporal values associated with pairs of elements in said sequence; computing at least one new temporal value between said lower limit temporal value and said upper limit temporal value; automatically generating said at least one pitch mark combination by replacing at least one of said plurality of pitch mark temporal values with said at least one new temporal value; outputting said at least one pitch mark combination of said plurality of pitch mark temporal values to a speech processor for at least one of speech processing, modification, and conversion to an audible output sound signal; wherein elements of said at least one combination are between said lower limit temporal value and said upper limit temporal value. 2. The method of claim 1 , wherein said cross-correlation is a normalized linear cross-correlation function. 3. The method of claim 1 , wherein said continuous speech signal is preprocessed by a zero-phase, low-pass filter to reduce its high-band noise components prior to said computing of said cross-correlation function. 4. The method of claim 1 , wherein said cross-correlation function is computed using a formula r ⁡ ( Δ ) = x ⁡ ( Δ ) T ⁢ y ⁡ ( 0 ) 0.5 ⁢ (  x ⁡ ( Δ )  2 +  y ⁡ ( 0 )  2 ) , where Δ denotes a temporal offset value from one of said plurality of pitch mark temporal values, x(Δ) denotes an input section of said continuous speech signal shifted by Δ samples relative to a first pitch mark temporal value and y(0) denotes an unshifted input section of said continuous speech signal associated with a second pitch mark temporal value. 5. The method of claim 1 , wherein said lower limit temporal value and said upper limit temporal value are determined by a plurality of input values of said cross-correlation function, associated with respective output values of said cross-correlation function that are a predefined ratio of a peak output value of said cross-correlation function. 6. The method of claim 5 , wherein said predefined ratio is 0.97 of said peak output value. 7. The method of claim 5 , wherein said predefined ratio is a value between 0.8 and 0.999 of said peak output value. 8. The method of claim 4 , wherein said first input section of said continuous speech signal is temporally preceding said unshifted input section of said continuous speech signal. 9. The method of claim 4 , wherein said unshifted input section of said continuous speech signal is temporally preceding said input section of said continuous speech signal. 10. The method of claim 1 , further comprising selecting a preferred pitch mark sequence from said at least one pitch mark combination, wherein said preferred pitch mark sequence is selected by minimization of a sequence global consistency criterion, wherein said sequence global consistency criterion is a sum of individual global consistency criteria of each said element in said at least one pitch mark combination. 11. The method of claim 10 , wherein each said individual global consistency criteria is derived from a temporal drift of each said element, relative to a certain reference pitch mark. 12. The method of claim 11 , wherein said continuous speech signal is preprocessed by a zero-phase, low-pass filter to reduce its high-band noise components prior to said computing of said pitch mark drift function. 13. The method of claim 1 , wherein said continuous speech signal is digitized by said at least one hardware processor. 14. The method of claim 1 , wherein said sequence of pitch values are computed from said continuous speech signal by said at least one hardware processor. 15. The method of claim 1 , wherein said plurality of pitch mark temporal values are computed from said continuous speech signal by said at least one hardware processor. 16. The method of claim 1 , wherein said a sequence of pitch values are non-zero pitch mark values. 17. A computer program product for receiving and processing continuous speech signals for generating therefrom one or more pitch mark combinations for speech processing, said computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a hardware processor to cause said hardware processor to: perform a signal processing of a continuous speech signal representing audible speech recorded by a microphone for generating at least one pitch mark combination, wherein a sequence of pitch values and a plurality of pitch mark temporal values are computed from sai

Assignees

Inventors

Classifications

  • G10L25/06Primary

    the extracted parameters being correlation coefficients · CPC title

  • Pitch determination of speech signals · CPC title

  • G10L21/01Primary

    Correction of time axis · CPC title

  • the extracted parameters being zero crossing rates · CPC title

  • Adapting to target pitch · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9685170B2 cover?
According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from th…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L25/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 20 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).