Who is the assignee on this patent?

Dolby Laboratories Licensing Corp

What technology area does this patent fall under?

Primary CPC classification H04S7/301. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Personalized HRTFs via optical capture

US12096200B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12096200-B2
Application number	US-202318455565-A
Country	US
Kind code	B2
Filing date	Aug 24, 2023
Priority date	Jul 25, 2018
Publication date	Sep 17, 2024
Grant date	Sep 17, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method of generating personalized HRTFs. The system is prepared by calculating a model for HRTFs described as the relationship between a finite example set of input data, namely anthropometric measures and demographic information for a set of individuals, and a corresponding set of output data, namely HRTFs numerically simulated using a high-resolution database of 3D scans of the same set of individuals. At the time of use, the system queries the user for their demographic information, and then from a series of images of the user, the system detects and measures various anthropometric characteristics. The system then applies the prepared model to the anthropometric and demographic data as part of generating a personalized HRTF. In this manner, the personalized HRTF can be generated with more convenience than by performing a high-resolution scan or an acoustic measurement of the user, and with less computational complexity than by numerically simulating their HRTF.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating a personalized head-related transfer function (HRTF) on an electronic device, the method comprising: capturing video data of a user, wherein the video data includes a plurality of views of a head of the user; processing the video data to extract a plurality of images of the user; receiving the plurality of images of a user; processing the plurality of images to generate anthropometric data of the user, wherein processing the plurality of images to generate anthropometric data of the user includes: identifying key image frames of the plurality of images; identifying anthropometric features of the user using the key image frames; and generating the anthropometric data by determining measurements of the anthropometric features; and inputting the anthropometric data into a HRTF calculation system to obtain the personalized HRTF. 2. The method of claim 1 , wherein capturing the video data includes capturing an image of an object having a known size; and wherein processing the plurality of images to generate the anthropometric data includes using the known size to convert the anthropometric data from pixel measurements to absolute distance measurements. 3. The method of claim 1 , wherein the electronic device includes a camera; and wherein capturing the video data is performed using the camera. 4. The method of claim 1 , wherein a device at a given distance captures the video data of the user, the method further comprising: determining the given distance by measuring a time delay between outputting a sound from a headphone positioned in proximity to the user and receiving the sound at a microphone of the device, wherein processing the plurality of images to generate the anthropometric data includes using the given distance to convert the anthropometric data from pixel measurements to absolute distance measurements. 5. The method of claim 1 , wherein processing the plurality of images to generate the anthropometric data includes: converting the plurality of images to a three-dimensional point cloud model; and using the three-dimensional point cloud model to select key image frames of the plurality of images; and generating the anthropometric data based on the key image frames. 6. The method of claim 1 , wherein processing the plurality of images to generate anthropometric data of the user includes: identifying a first frame of the plurality of images, wherein the first frame is a view perpendicular to a face of the user, by minimizing an asymmetry of key points in the first frame; and identifying a second frame of the plurality of images, wherein the second frame is a view perpendicular to a first pinna of the user, according to a view 90 degrees from the first frame; and identifying a third frame of the plurality of images, wherein the third frame is a view perpendicular to a second pinna of the user, according to a view 180 degrees from the second frame. 7. The method of claim 1 , wherein the second frame is one of a plurality of second frames that are selected from the plurality of images within +45 and −45 degrees around the view perpendicular to the first pinna; and wherein the third frame is one of a plurality of third frames that are selected from the plurality of images within +45 and −45 degrees around the view perpendicular to the second pinna. 8. The method of claim 1 , wherein identifying or selecting key image frames of the plurality of images is based on frame content and one or more sharpness metrics. 9. The method of claim 1 , wherein generating the personalized HRTF includes: providing a HRTF model trained by performing a machine learning process, optionally including a lasso regression, on a high-resolution database of anthropometric data and measured magnitude/frequency responses; and generating the personalized HRTF by applying the HRTF model to the anthropometric data of the user. 10. The method of claim 1 , further comprising: generating the personalized HRTF on a server device; and transmitting the personalized HRTF from the server device to a user device. 11. The method of claim 1 , wherein the electronic device is a user device; and further comprising: generating the personalized HRTF on the user device. 12. The method of claim 11 , wherein the user device generates audio output by applying the personalized HRTF to an audio signal, wherein the user device includes one of a headset, a pair of earbuds, and a pair of hearables. 13. The method of claim 12 , wherein the audio signal comprises a plurality of audio objects that include position information, wherein generating the audio output corresponds to generating a binaural audio output by applying the personalized HRTF to the plurality of audio objects. 14. The method of claim 1 , wherein processing the plurality of images to generate the anthropometric data includes using at least one of a photogrammetry component, contextual transformation component, a landmark detection component, and anthropometry component of the electronic device. 15. The method of claim 1 , wherein processing the plurality of images to generate the anthropometric data includes using a landmark detection component, a 3D projection component, and an angle and distance measurement component, wherein the landmark detection component receives a cropped image set of anthropometric landmarks of the user, and generates a set of 2D coordinates of a set of anthropometric landmarks of the user from the cropped image set, wherein the 3D projection component receives the set of 2D coordinates and a plurality of camera transforms, and generates a set of 3D coordinates that correspond to the set of 2D coordinates of each of the anthropometric landmarks in 3D space using the camera transforms, wherein the angle and distance measurement component receives the set of 3D coordinates, and generates anthropometric data from the set of 3D coordinates, wherein the anthropometric data correspond to angles and distances of the anthropometric landmarks in the set of 3D coordinates, and wherein the electronic device generates the personalized HRTF for the user by inputting the anthropometric data into the HRTF calculation system. 16. The method of claim 1 , wherein the anthropometric data includes at least one of a shoulder width of the user, a neck width of the user, a neck height of the user, a face height of the user, an interpupillary distance of the user, and a bizygomatic breadth of the user. 17. The method of claim 1 , wherein the anthropometric data includes, for each pinna of the user, at least one of a pinna flare angle, a pinna rotation angle, a pinna cleft angle, a pinna offset back, a pinna offset down, a pinna height, a pinna width, a first intertragic width, a second intertragic width, a fossa height, a concha width, a concha height, and a cymba concha height. 18. The method of claim 1 , wherein the anthropometric data further includes other data, wherein the other data includes at least one of an age of the user, a weight of the user, a gender of the user, and a height of the user, and wherein the other data is obtained from a source other than processing the plurality of images. 19. A non-transitory computer readable medium storing one or more computer programs that, when executed by one or more processors, controls an apparatus to execute processing for: capturing video data of a user, wherein the video data includes a plurality of views of a head of the user; processing the video data to extract a plurality of images of the

Assignees

Dolby Laboratories Licensing Corp

Inventors

Classifications

G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V40/10
Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands · CPC title
H04S2420/01
Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD] · CPC title
H04S2400/15
Aspects of sound capture and related signal processing for recording or reproduction · CPC title
H04S7/303
Tracking of listener position or orientation · CPC title

Patent family

Related publications grouped by family.

View patent family 67539645

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12096200B2 cover?: An apparatus and method of generating personalized HRTFs. The system is prepared by calculating a model for HRTFs described as the relationship between a finite example set of input data, namely anthropometric measures and demographic information for a set of individuals, and a corresponding set of output data, namely HRTFs numerically simulated using a high-resolution database of 3D scans of t…
Who is the assignee on this patent?: Dolby Laboratories Licensing Corp
What technology area does this patent fall under?: Primary CPC classification H04S7/301. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).