Method and apparatus for extracting feature representation, device, medium, and program product
US-2024321289-A1 · Sep 26, 2024 · US
US9685170B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9685170-B2 |
| Application number | US-201514918601-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 21, 2015 |
| Priority date | Oct 21, 2015 |
| Publication date | Jun 20, 2017 |
| Grant date | Jun 20, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.
Opening claim text (preview).
What is claimed is: 1. A computerized method for receiving and processing continuous speech signals for generating therefrom one or more pitch mark combinations for speech processing, comprising: receiving a continuous speech signal representing audible speech recorded by a microphone, wherein a sequence of pitch values and a plurality of pitch mark temporal values are computed from said continuous speech signal, each of said plurality of pitch mark temporal values associated with one element of said sequence; using at least one hardware processor for executing a code for processing said continuous speech signal and generating at least one pitch mark combination, said processing comprises: computing for each of said plurality of pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of said continuous speech signal around said pitch mark temporal values associated with pairs of elements in said sequence; computing at least one new temporal value between said lower limit temporal value and said upper limit temporal value; automatically generating said at least one pitch mark combination by replacing at least one of said plurality of pitch mark temporal values with said at least one new temporal value; outputting said at least one pitch mark combination of said plurality of pitch mark temporal values to a speech processor for at least one of speech processing, modification, and conversion to an audible output sound signal; wherein elements of said at least one combination are between said lower limit temporal value and said upper limit temporal value. 2. The method of claim 1 , wherein said cross-correlation is a normalized linear cross-correlation function. 3. The method of claim 1 , wherein said continuous speech signal is preprocessed by a zero-phase, low-pass filter to reduce its high-band noise components prior to said computing of said cross-correlation function. 4. The method of claim 1 , wherein said cross-correlation function is computed using a formula r ( Δ ) = x ( Δ ) T y ( 0 ) 0.5 ( x ( Δ ) 2 + y ( 0 ) 2 ) , where Δ denotes a temporal offset value from one of said plurality of pitch mark temporal values, x(Δ) denotes an input section of said continuous speech signal shifted by Δ samples relative to a first pitch mark temporal value and y(0) denotes an unshifted input section of said continuous speech signal associated with a second pitch mark temporal value. 5. The method of claim 1 , wherein said lower limit temporal value and said upper limit temporal value are determined by a plurality of input values of said cross-correlation function, associated with respective output values of said cross-correlation function that are a predefined ratio of a peak output value of said cross-correlation function. 6. The method of claim 5 , wherein said predefined ratio is 0.97 of said peak output value. 7. The method of claim 5 , wherein said predefined ratio is a value between 0.8 and 0.999 of said peak output value. 8. The method of claim 4 , wherein said first input section of said continuous speech signal is temporally preceding said unshifted input section of said continuous speech signal. 9. The method of claim 4 , wherein said unshifted input section of said continuous speech signal is temporally preceding said input section of said continuous speech signal. 10. The method of claim 1 , further comprising selecting a preferred pitch mark sequence from said at least one pitch mark combination, wherein said preferred pitch mark sequence is selected by minimization of a sequence global consistency criterion, wherein said sequence global consistency criterion is a sum of individual global consistency criteria of each said element in said at least one pitch mark combination. 11. The method of claim 10 , wherein each said individual global consistency criteria is derived from a temporal drift of each said element, relative to a certain reference pitch mark. 12. The method of claim 11 , wherein said continuous speech signal is preprocessed by a zero-phase, low-pass filter to reduce its high-band noise components prior to said computing of said pitch mark drift function. 13. The method of claim 1 , wherein said continuous speech signal is digitized by said at least one hardware processor. 14. The method of claim 1 , wherein said sequence of pitch values are computed from said continuous speech signal by said at least one hardware processor. 15. The method of claim 1 , wherein said plurality of pitch mark temporal values are computed from said continuous speech signal by said at least one hardware processor. 16. The method of claim 1 , wherein said a sequence of pitch values are non-zero pitch mark values. 17. A computer program product for receiving and processing continuous speech signals for generating therefrom one or more pitch mark combinations for speech processing, said computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a hardware processor to cause said hardware processor to: perform a signal processing of a continuous speech signal representing audible speech recorded by a microphone for generating at least one pitch mark combination, wherein a sequence of pitch values and a plurality of pitch mark temporal values are computed from sai
Related publications grouped by family.
Answers are generated from the same data shown on this page.