Scene classification prediction
US-2020086879-A1 · Mar 19, 2020 · US
US10963741B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10963741-B2 |
| Application number | US-201616307813-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 7, 2016 |
| Priority date | Jun 7, 2016 |
| Publication date | Mar 30, 2021 |
| Grant date | Mar 30, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention relates to a control device ( 1 ) for a vehicle for determining the perceptual load of a visual and dynamic driving scene. The control device is configured to: receive a sensor output ( 101 ) of a sensor ( 3 ), the sensor ( 3 ) sensing the visual driving scene, extract a set of scene features ( 102 ) from the sensor output ( 101 ), the set of scene features ( 102 ) representing static and/or dynamic information of the visual driving scene, and determine the perceptual load ( 104 ) of the set of extracted scene features ( 102 ) based on a predetermined load model ( 103 ), wherein the load model ( 103 ) is predetermined based on reference video scenes each being labelled with a load value The invention further relates to a system and a method.
Opening claim text (preview).
The invention claimed is: 1. A control device for a vehicle for determining a perceptual load of a visual and dynamic driving scene, the control device being configured to: receive a sensor output of a sensor, the sensor sensing the visual driving scene, extract a set of scene features from the sensor output, the set of scene features representing static and/or dynamic information of the visual driving scene, and determine the perceptual load of the set of extracted scene features based on a predetermined load model, wherein the load model is predetermined based on reference video scenes each being labelled with a load value. 2. The control device according to claim 1 , wherein the load model comprises a mapping function between sets of scene features extracted from the reference video scenes and the load values. 3. The control device according to claim 1 , wherein the load model is configured to map a set of scene features to a perceptual load value. 4. The control device according to claim 1 , wherein the load model is a regression model and/or a classification model between the sets of scene features extracted from the reference video scenes and the load values. 5. The control device according to claim 1 , wherein the determination of the load values of the reference video scenes is human based, in particular based on crowdsourcing. 6. The control device according to claim 1 , wherein the determination of the load values is based on a pairwise ranking procedure, in particular based on the TrueSkill algorithm. 7. The control device according to claim 1 , configured to continuously train the load model by monitoring the driver during the driving scene, wherein a monitored behavior of the driver during the driving scene not matching the determined perceptual load serves to on-line up-date said mapping function. 8. The control device according to claim 1 , wherein the set of scene features comprises a range of spatio-temporal features, the set of scene features being in particular described in vector form. 9. The control device according to claim 1 , wherein the set of scene features comprises improved dense trajectory (iDT) features and/or 3-dimensional convolutional neural network (C3D) features. 10. The control device according to claim 1 , wherein the load model is a linear regression model, wherein the set of scene features being an input scene feature vector x is mapped to the perceptual load being an output perceptual load value y=f(x) through a linear mapping function f(x)=w T x+b=w 1 *x 1 +w 2 *x 2 w 3 *x 3 . . . +b, the function being a weighted sum of the input dimension values of the feature vector x, wherein weighted parameters w are assigned to each dimension value in the feature vector x and a bias term b centers the output at a particular value, or the load model is a multi-channel non-linear kernel regression model, where the mapping function is f(x)=w T Φ(x)+b, wherein Φ(x) is a transformation function of the input feature vectors to a non-linear kernel space. 11. A vehicle comprising: a control device according to claim 1 . 12. The vehicle according to claim 11 , further comprising: a sensor configured to sense the visual driving scene, the sensor being in particular an optical sensor, more in particular at least one digital camera. 13. A system for a vehicle for determining the perceptual load of a visual and dynamic driving scene, the system comprising: a control device according to claim 1 , and a server, configured to determine the load model. 14. The system according to claim 13 , wherein the server is configured to: store a plurality of reference video scenes, provide means for labelling the reference video scenes with load values, extract a set of scene features from each reference video scene, and determine the load model based on a regression analysis configured to determine a mapping function between the sets of scene features extracted from the respective reference video scenes and the load values. 15. The system according to claim 13 , wherein the server is configured to: provide means for a human based load rating of the reference video scenes, in particular the load rating being based on crowdsourcing, wherein the load values are determined based on the human based load rating. 16. The system according to claim 13 , wherein the server is configured such that the load rating is based on a pairwise ranking procedure, in particular based on the TrueSkill algorithm. 17. A method of determining the perceptual load of a visual and dynamic driving scene, the method comprising the steps of: receiving a sensor output of a sensor, the sensor sensing the visual driving scene, extracting a set of scene features from the sensor output, the set of scene features representing static and/or dynamic information of the visual driving scene, and determining the perceptual load of the set of extracted scene features based on a predetermined load model, wherein the load model is predetermined based on reference video scenes each being labelled with a load value. 18. The method according to claim 17 , wherein the load model comprises a mapping function between sets of scene features extracted from the reference video scenes and the load values. 19. The method according to claim 17 , wherein the load model maps a set of scene features to a perceptual load value. 20. The method according to claim 17 , wherein the load model is a regression model or a classification model between the sets of scene features extracted from the reference video scenes and the load values. 21. The method according to claim 17 , wherein the determination of the load values of the reference video scenes is human based, in particular based on crowdsourcing. 22. The method according to claim 17 , wherein the determination of the load values is based on a pairwise ranking procedure, in particular based on the TrueSkill algorithm. 23. The method according to claim 17 , wherein the load model is continuously trained by monitoring the driver during the driving scene, wherein a monitored behavior of the driver during the driving scene not matching the determined perceptual load serves to on-line up-date said mapping function. 24. The method according to claim 17 , wherein the set of scene features comprises a range of spatio-temporal features, the set of scene features being in particular described in vector form. 25. The method according to claim 17 , wherein the set of scene features comprises improved dense trajectory (iDT) features and/or 3-dimensional convolutional neural network (C3D) features. 26. The method according to claim 17 , wherein the load model is a linear regression model, wherein the set of scene features being an input scene feature vector x is mapped to the perceptual load being an output perceptual load value y=f(x) through a linear mapping function f(x)=w T x+b=w 1 *x 1 +w 2 *x 2 +w 3 *x 3 . . . +b, the function being a weighted sum of the input dimension values of the feature vector x, wherein weighted parameters w are assigned to each dimension value in the feature vector x and a bias term b centres the output at a particular value, or the load model is a multi-channel non-linear kernel regression model, where the mapping function is f(x)=w T Φ(x)+b, wherein Φ(x) is a transformation function of the input feature
using classification, e.g. of video objects · CPC title
characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title
Recognising the driver's state or behaviour, e.g. attention or drowsiness · CPC title
based on distances to training or reference patterns · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.