Training a neural network for action recognition

US2022398832A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022398832-A1
Application numberUS-202217752614-A
CountryUS
Kind codeA1
Filing dateMay 24, 2022
Priority dateJun 11, 2021
Publication dateDec 15, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for training a neural network for action recognition based on unlabeled action sequences includes a first neural network (NN1) and a second neural network (NN2). A first updating module is arranged to update parameters of NN1 to minimize a difference between representation data generated by NN1 and representation data generated by NN2. A second updating module is arranged to update parameters of NN2 as a function of the parameters of NN1. An augmentation module includes first and second sub-modules and is configured to include augmented versions of incoming action sequences in first and second input data. The first and second sub-modules are configured to apply at least partly different augmentation to the incoming action sequences. After NN1 and NN2 have been operated on one or more instances of the first and second input data, NN1 comprises a parameter definition of a pre-trained neural network.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for training of a neural network, said system comprising: a first neural network, which is configured to operate on first input data to generate first representation data; a second neural network, which is configured to operate on second input data to generate second representation data; a first updating module, which is configured to update parameters of the first neural network to minimize a difference between the first representation data and the second representation data; a second updating module, which is configured to update parameters of the second neural network as a function of the parameters of the first neural network; and an augmentation module, which is configured to retrieve a plurality of corresponding first and second action sequences, each depicting a respective object performing a respective activity, generate the first and second input data to include augmented versions of the first and second action sequences; wherein the system is configured to operate the first and second neural networks on one or more instances of the first and second input data generated by the augmentation module and provide at least a subset of the parameters of the first neural network as a parameter definition of a pre-trained neural network, and wherein the augmentation module comprises a first sub-module which is configured to generate a first augmented version based on a respective first action sequence, and a second sub-module which is configured to generate a second augmented version based on a respective second action sequence, wherein the second sub-module differs from the first sub-module. 2 . The system of claim 1 , wherein the augmentation module is configured to include the corresponding first and second augmented versions in the first and second input data such that the first and second networks operate concurrently on the corresponding first and second augmented versions. 3 . The system of claim 1 , wherein the first sub-module comprises a first set of augmentation functions which are operable on the respective first action sequence to generate the first augmented version, wherein the second sub-module comprises a second set of augmentation functions which are operable on the respective second action sequence to generate the second augmented version, wherein the first and second sets of augmentation functions differ by at least one augmentation function. 4 . The system of claim 1 , wherein the second sub-module is operable to apply more augmentation than the first sub-module. 5 . The system of claim 1 , wherein each of the first and second action sequences comprise a time sequence of object representations, and wherein each of the object representations comprises locations of predefined features on the respective object. 6 . The system of claim 5 , wherein the second sub-module, to generate the second augmented version, is operable to randomly select a coherent subset of the object representations in the respective second action sequence. 7 . The system of claim 5 , wherein the second sub-module, to generate the second augmented version is operable to distort the object representations in the respective second action sequence in a selected direction. 8 . The system of claim 5 , wherein the second sub-module, to generate the second augmented version, is operable to hide a subset of the respective object in the object representations in the respective second action sequence. 9 . The system of claim 8 , wherein the subset corresponds to said predefined features on one side of a geometric plane with a predefined arrangement through the respective object. 10 . The system of claim 5 , wherein the second sub-module, to generate the second augmented version, is operable to perform a temporal smoothing of the object representations in the respective second action sequence. 11 . The system of claim 5 , wherein the second sub-module, to generate the second augmented version, is operable to randomly select an object representation in the respective second action sequence and rearrange the respective second action sequence with the selected object representation as starting point. 12 . The system of claim 5 , wherein the second sub-module, to generate the second augmented version, is operable to flip the respective object in the object representations in the respective second action sequence through a mirror plane. 13 . The system of claim 5 , wherein the first sub-module, to generate the first augmented version, is operable to change a time distance between the object representations in the respective first action sequence. 14 . The system of claim 1 , wherein the augmentation module is configured to retrieve the first and second action sequences so as to correspond to different viewing angles onto the respective object performing the respective activity. 15 . The system of claim 1 , which further comprises a training sub-system, which comprises: a third neural network, which is configured to operate on third input data to generate third representation data, the third neural network being initialized by use of the parameter definition, and a third updating module, which is configured to update parameters of the third network to minimize a difference between the third representation data and activity label data associated with the third input data, wherein the training sub-system is configured to, by the third updating module, train the third network to recognize one or more activities represented by the activity label data. 16 . The system of claim 15 , wherein the training sub-system comprises a further augmentation module which is configured to retrieve third action sequences of one or more objects performing one or more activities, generate the third input data to include third augmented versions of the third action sequences, wherein the further augmentation module is configured in correspondence with the first sub-module. 17 . The system of claim 15 , which comprises a fourth neural network, which is configured to operate on fourth input data to generate fourth representation data, and a fourth updating module, which is configured to update parameters of the fourth network to minimize a difference between the fourth representation data and fifth representation data, wherein the fifth representation data is generated by the third neural network, when trained, being operated on the fourth input data. 18 . The system of clause 17 , wherein the fourth neural network has a smaller number of channels than the third neural network. 19 . A computer-implemented method for use in training of a neural network, said method comprising: retrieving first and second action sequences of an object performing an activity; generating the first and second input data to include first and second augmented versions of the first and second action sequences; operating a first neural network on the first input data to generate first representation data; operating a second neural network on the second input data to generate second representation data; updating parameters of the first neural network to minimize a difference between the first representation data and the second representation data; updating parameters of the second neural network as a function of the parameters of the first neural network; and providing, after operating the first and second neural networks on one or more instances of the first and second input data, at least a subset of the parameters of the first neural network as a parameter definition of

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Organisation of the process, e.g. bagging or boosting · CPC title

  • Recognition of whole body movements, e.g. for sport training · CPC title

  • using neural networks · CPC title

  • G06V10/34Primary

    Smoothing or thinning of the pattern; Morphological operations; Skeletonisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022398832A1 cover?
A system for training a neural network for action recognition based on unlabeled action sequences includes a first neural network (NN1) and a second neural network (NN2). A first updating module is arranged to update parameters of NN1 to minimize a difference between representation data generated by NN1 and representation data generated by NN2. A second updating module is arranged to update par…
Who is the assignee on this patent?
Sony Group Corp
What technology area does this patent fall under?
Primary CPC classification G06V10/7747. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).